معرفی شرکت ها


extremetext-0.8.4


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

A Python interface for extremeText library
ویژگی مقدار
سیستم عامل -
نام فایل extremetext-0.8.4
نام extremetext
نسخه کتابخانه 0.8.4
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Marek Wydmuch
ایمیل نویسنده mwydmuch@cs.put.poznan.pl
آدرس صفحه اصلی https://github.com/mwydmuch/extremeText
آدرس اینترنتی https://pypi.org/project/extremetext/
مجوز BSD
extremeText =========== `extremeText <https://github.com/mwydmuch/extremeText>`__ is an extension of `fastText <https://github.com/facebookresearch/fastText>`__ library for multi-label classification including extreme cases with hundreds of thousands and millions of labels. `extremeText <https://github.com/mwydmuch/extremeText>`__ implements: - Probabilistic Labels Tree (PLT) loss for extreme multi-Label classification with top-down hierarchical clustering (k-means) for tree building, - sigmoid loss for multi-label classification, - L2 regularization and FOBOS update for all losses, - ensemble of loss layers with bagging, - calculation of hidden (document) vector as a weighted average of the word vectors, - calculation of TF-IDF weights for words. Requirements ------------ `extremeText <https://github.com/mwydmuch/extremeText>`__ builds on modern Mac OS and Linux distributions. Since it uses C++11 features, it requires a compiler with good C++11 support. These include: - (gcc-4.8 or newer) or (clang-3.3 or newer) You will need: - `Python <https://www.python.org/>`__ version 2.7 or >=3.4 - `NumPy <http://www.numpy.org/>`__ & `SciPy <https://www.scipy.org/>`__ - `pybind11 <https://github.com/pybind/pybind11>`__ Installing extremeText ---------------------- The easiest way to get `extremeText <https://github.com/mwydmuch/extremeText>`__ is to use `pip <https://pip.pypa.io/en/stable/>`__. :: $ pip install extremetext Installing on MacOS may require setting ``MACOSX_DEPLOYMENT_TARGET=10.9`` first: :: $ export MACOSX_DEPLOYMENT_TARGET=10.9 $ pip install extremetext The latest version of `extremeText <https://github.com/mwydmuch/extremeText>`__ can be build from sources using pip or alternatively setuptools. :: $ git clone https://github.com/mwydmuch/extremeText.git $ cd extremeText $ pip install . (or) $ python setup.py install Now you can import this library with: :: import extremeText Examples -------- In general it is assumed that the reader already has good knowledge of fastText/extremeText. For this consider the main `README <https://github.com/mwydmuch/extremeText/blob/master/README.md>`__ and `the tutorials on fastText website <https://fasttext.cc/docs/en/supervised-tutorial.html>`__. We recommend you look at the `examples within the doc folder <https://github.com/mwydmuch/extremeText/tree/master/python/doc/examples>`__. As with any package you can get help on any Python function using the help function. For example: :: +>>> import extremeText +>>> help(extremeText.ExtremeText) Help on module extremeText.ExtremeText in extremeText: NAME extremeText.ExtremeText DESCRIPTION # Copyright (c) 2017-present, Facebook, Inc. # All rights reserved. # # This source code is licensed under the BSD-style license found in the # LICENSE file in the root directory of this source tree. An additional grant # of patent rights can be found in the PATENTS file in the same directory. FUNCTIONS load_model(path) Load a model given a filepath and return a model object. tokenize(text) Given a string of text, tokenize it and return a list of tokens [...] IMPORTANT: Preprocessing data / enconding conventions ----------------------------------------------------- In general it is important to properly preprocess your data. Example scripts in the `root folder <https://github.com/mwydmuch/extremeText/extremeText>`__ do this. extremeText like fastText assumes UTF-8 encoded text. All text must be `unicode for Python2 <https://docs.python.org/2/library/functions.html#unicode>`__ and `str for Python3 <https://docs.python.org/3.5/library/stdtypes.html#textseq>`__. The passed text will be `encoded as UTF-8 by pybind11 <https://pybind11.readthedocs.io/en/master/advanced/cast/strings.html?highlight=utf-8#strings-bytes-and-unicode-conversions>`__ before passed to the extremeText C++ library. This means it is important to use UTF-8 encoded text when building a model. On Unix-like systems you can convert text using `iconv <https://en.wikipedia.org/wiki/Iconv>`__. extremeText will tokenize (split text into pieces) based on the following ASCII characters (bytes). In particular, it is not aware of UTF-8 whitespace. We advice the user to convert UTF-8 whitespace / word boundaries into one of the following symbols as appropiate. - space - tab - vertical tab - carriage return - formfeed - the null character The newline character is used to delimit lines of text. In particular, the EOS token is appended to a line of text if a newline character is encountered. The only exception is if the number of tokens exceeds the MAX\_LINE\_SIZE constant as defined in the `Dictionary header <https://github.com/mwydmuch/extremeText/blob/master/src/dictionary.h>`__. This means if you have text that is not separate by newlines, such as the `fil9 dataset <http://mattmahoney.net/dc/textdata>`__, it will be broken into chunks with MAX\_LINE\_SIZE of tokens and the EOS token is not appended. The length of a token is the number of UTF-8 characters by considering the `leading two bits of a byte <https://en.wikipedia.org/wiki/UTF-8#Description>`__ to identify `subsequent bytes of a multi-byte sequence <https://github.com/mwydmuch/extremeText/blob/master/src/dictionary.cc>`__. Knowing this is especially important when choosing the minimum and maximum length of subwords. Further, the EOS token (as specified in the `Dictionary header <https://github.com/mwydmuch/extremeText/blob/master/src/dictionary.h>`__) is considered a character and will not be broken into subwords. Reference --------- Please cite below work if using this package for extreme classification. M. Wydmuch, K. Jasinska, M. Kuznetsov, R. Busa-Fekete, K. Dembczyński. `*A no-regret generalization of hierarchical softmax to extreme multi-label classification* <http://papers.nips.cc/paper/7872-a-no-regret-generalization-of-hierarchical-softmax-to-extreme-multi-label-classification>`__. Advances in Neural Information Processing Systems 31, 2018.


نحوه نصب


نصب پکیج whl extremetext-0.8.4:

    pip install extremetext-0.8.4.whl


نصب پکیج tar.gz extremetext-0.8.4:

    pip install extremetext-0.8.4.tar.gz