معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

A word2vec preprocessing and training package

ویژگی	مقدار
سیستم عامل	-
نام فایل	embeddings-prep-0.1.0
نام	embeddings-prep
نسخه کتابخانه	0.1.0
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	sally14
ایمیل نویسنده	sally14.doe@gmail.com
آدرس صفحه اصلی	https://github.com/sally14/embeddings
آدرس اینترنتی	https://pypi.org/project/embeddings-prep/
مجوز	-

# Embeddings Embedding generation with text preprocessing. ## Preprocessor For Word2Vec, we want a soft yet important preprocessing. We want to denoise the text while keeping as much variety and information as possible. Preprocesses the text/set of text in the following way : 1. Detects and replaces numbers/float by a generic token 'FLOAT', 'INT' 2. Add spaces in between punctuation so that tokenisation avoids adding 'word.' to the vocabulary instead of 'word', '.' 3. Lowers words 4. Recursive word phrases detection : with a simple probabilistic rule, gathers the tokens 'new', york' to a single token 'new_york'. 5. Frequency Subsampling : discards unfrequent words with a probability depending on their frequency. Outputs a vocabulary file and the modified files. Usage example : ```python from preprocessing.preprocessor import PreprocessorConfig, Preprocessor config = PreprocessorConfig('/tmp/logdir') config.set_config(writing_dir='/tmp/outputs') config.save_config() prep = Preprocessor('/tmp/logdir') prep.fit('~/mydata/') prep.filter() prep.transform('~/mydata') ``` ## Word2Vec For the Word2Vec, we just wrote a simple cli wrapper that takes the preprocessed files as an input, trains a Word2Vec model with gensim and writes the vocab, embeddings .tsv files that can be visualized with tensorflow projector (http://projector.tensorflow.org/) Usage example: ```bash python training_word2vec.py file_dir writing_dir ``` ## Documentation Documentation is available here : https://sally14.github.io/embeddings/ TODO : - [ ] Clean code for CLI wrapper - [X] Also write a python Word2Vec model class so that user doesn't have to switch from python to cli - [ ] Also write a cli wrapper for preprocessing - [X] Memory leak in preprocessor.transform

نیازمندی

مقدار	نام
>=0.6.2	docopt
>=3.8.1	gensim
>=3.4.5	nltk

زبان مورد نیاز

مقدار	نام
>=3.6	Python

نحوه نصب

نصب پکیج whl embeddings-prep-0.1.0:

pip install embeddings-prep-0.1.0.whl

نصب پکیج tar.gz embeddings-prep-0.1.0:

pip install embeddings-prep-0.1.0.tar.gz