معرفی شرکت ها


bof-0.3.5


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Bag of Factors allow you to analyze a corpus from its self_factors.
ویژگی مقدار
سیستم عامل -
نام فایل bof-0.3.5
نام bof
نسخه کتابخانه 0.3.5
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Fabien Mathieu
ایمیل نویسنده loufab@gmail.com
آدرس صفحه اصلی https://github.com/balouf/bof
آدرس اینترنتی https://pypi.org/project/bof/
مجوز GNU General Public License v3
============== Bag of Factors ============== .. image:: https://img.shields.io/pypi/v/bof.svg :target: https://pypi.python.org/pypi/bof :alt: PyPI Status .. image:: https://github.com/balouf/bof/workflows/build/badge.svg?branch=master :target: https://github.com/balouf/bof/actions?query=workflow%3Abuild :alt: Build Status .. image:: https://github.com/balouf/bof/workflows/docs/badge.svg?branch=master :target: https://github.com/balouf/bof/actions?query=workflow%3Adocs :alt: Documentation Status .. image:: https://codecov.io/gh/balouf/bof/branch/master/graphs/badge.svg :target: https://codecov.io/gh/balouf/bof/branch/master/graphs :alt: Code Coverage Bag of Factors allow you to analyze a corpus from its factors. * Free software: GNU General Public License v3 * Documentation: https://balouf.github.io/bof/. -------- Features -------- Feature Extraction ------------------- The `feature_extraction` module mimicks the module https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text with a focus on character-based extraction. The main differences are: - it is slightly faster; - the features can be incrementally updated; - it is possible to fit only a random sample of factors to reduce space and computation time. The main entry point for this module is the `CountVectorizer` class, which mimicks its *scikit-learn* counterpart (also named `CountVectorizer`). It is in fact very similar to sklearn's `CountVectorizer` using `char` or `char_wb` analyzer option from that module. Fuzz -------- The `fuzz` module mimicks the fuzzywuzzy-like packages like - fuzzywuzzy (https://github.com/seatgeek/fuzzywuzzy) - rapidfuzz (https://github.com/maxbachmann/rapidfuzz) The main difference is that the Levenshtein distance is replaced by the Joint Complexity distance. The API is also slightly change to enable new features: - The list of possible choices can be pre-trained (`fit`) to accelerate the computation in the case a stream of queries is sent against the same list of choices. - Instead of one single query, a list of queries can be used. Computations will be parallelized. The main `fuzz` entry point is the `Process` class. ---------------- Getting Started ---------------- Look at examples from the reference_ section. ------- Credits ------- This package was created with Cookiecutter_ and the `francois-durand/package_helper_2`_ project template. .. _Cookiecutter: https://github.com/audreyr/cookiecutter .. _`francois-durand/package_helper_2`: https://github.com/francois-durand/package_helper_2 .. _reference: https://balouf.github.io/bof/reference/index.html ======= History ======= --------------------------------------------------- 0.3.5 (2021-04-08): ARM64 --------------------------------------------------- * Attempt to update PyPi with Mac M1 compatible wheels. --------------------------------------------------- 0.3.4 (2021-01-05): Cleaning --------------------------------------------------- * Renaming process.py to fuzz.py to emphasize that the module aims at being an alternative to the fuzzywuzzy package. * Removed modules FactorTree and JC. What they did is now essentially covered by the feature_extraction and fuzz modules. * General cleaning / rewriting of the documentation. --------------------------------------------------- 0.3.3 (2021-01-01): Cython/Numba balanced --------------------------------------------------- * All core CountVectorizer methods ported to Cython. Roughly 2.5X faster than sklearn counterpart (mainly because some features like min_df/max_df are not implemented). * Process numba methods NOT converted to Cython as Numba seems to be 20% faster for csr manipulation. * Numba functions are cached to avoid compilation lag. --------------------------------------------------- 0.3.2 (2020-12-30): Going Cython --------------------------------------------------- * First attempt to use Cython * Right now only the fit_transform method of CountVectorizer has been cythonized, for testing wheels. * If all goes well, numba will probably be abandoned and all the heavy-lifting will be in Cython. ----------------------------------------------------- 0.3.1 (2020-12-28): Simplification of core algorithm ----------------------------------------------------- * Attributes of the CountVectorizer have been reduced to the minimum: one dict! * Now faster than sklearn counterpart! (The reason been only one case is considered here so we can ditch a lot of checks and attributes). --------------------------------------------------- 0.3.0 (2020-12-15): CountVectorizer and Process --------------------------------------------------- * The core is now the CountVectorizer class. Lighter and faster. Only features are kept inside. * New process module inspired by fuzzywuzzy! --------------------------------- 0.2.0 (2020-12-15): Fit/Transform --------------------------------- * Full refactoring to make the package fit/transform compliant. * Add a fit_sampling method that allows to fit only a (random) subset of factors --------------------------------- 0.1.1 (2020-12-12): Upgrades --------------------------------- * Docstrings added * Common module (feat. save/load capabilities) * Joint Complexity module --------------------------------- 0.1.0 (2020-12-12): First release --------------------------------- * First release on PyPI. * Core FactorTree class added.


نیازمندی

مقدار نام
- dill
- numba
- numpy
- scipy


زبان مورد نیاز

مقدار نام
>=3.6 Python


نحوه نصب


نصب پکیج whl bof-0.3.5:

    pip install bof-0.3.5.whl


نصب پکیج tar.gz bof-0.3.5:

    pip install bof-0.3.5.tar.gz