معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Content extractor for files containing text

ویژگی	مقدار
سیستم عامل	-
نام فایل	content-extractor-pi-0.1.0
نام	content-extractor-pi
نسخه کتابخانه	0.1.0
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Paolo Italiani
ایمیل نویسنده	paoita@hotmail.it
آدرس صفحه اصلی	https://github.com/paoloitaliani/content-extractor-pi
آدرس اینترنتی	https://pypi.org/project/content-extractor-pi/
مجوز	MIT

**content-extractor-pi** is a Python module which aims to extract a certain piece of content defined by the user in a set of documents. This piece of content can be a paragraph that deals with a certain topic, headers, page numbers et cetera. **content-extractor-pi** does need some examples of the desired content, supplied by a domain expert, but our focus on few shot learning means ~10 examples is usually enough out a corpus that may contain 1000s of documents. Installation ------------ The easiest way to install content-extractor-pi is using pip: pip install content-extractor-pi Documentation ------------ The main object of content-extractor-pi is ContentExtractor and its only attribute that it expects is a pre-trained word embedding model. In the example I'm using the pre-trained google news word-2-vec model available [here](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing). ### ContentExtractor.train_model method The **train_model** method extracts and scales features for the provided text examples contained in train_df, creates synthetic samples of the target class, and trains the model at the core of content_extractor. #### Parameters - **train_df**: pandas DataFrame containing the text examples in one column and the corresponding labels in the other one - **train_additional_features, default=None**: pandas DataFrame containing additional features describing the text examples contained in train_df - **y_name, default="label"**: column name of train_df where the labels are stored - **text_name, default="text"**: column name of train_df where the text examples are stored - **use_pca, default=False**: apply Principal component analysis to the scaled extracted features, more info can be find [here](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html). - **gamma, default=1**: Kernel coefficient for [sklearn.svm.SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) - **C, default=0.1**: Regularization parameter for [sklearn.svm.SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) ### ContentExtractor.extract_content method The **extract_content** method extracts and scales features for the provided text examples contained in target_df and returns the ones labeled as 1 by the model. #### Parameters - **target_df**: pandas DataFrame containing all the text examples that we have at disposal - **target_additional_features, default=None**: pandas DataFrame containing additional features describing the text examples contained in target_df - **text_name, default="text"**: column name of target_df where the text examples are stored Example ------------ In the following example we have a full implementation that leads to extracting the desired content from a set of more than 62000 paragraphs originated from the World Bank Loan Agreements Corpus, using just 11 examples manually labeled as 1. We aim to extract the *objective paragraph* that is a short segment describing how the the loan it's going to be spent, below you can find an example. <p align="center"><img src="https://github.com/paoloitaliani/content-extractor-pi/raw/master/images/image1.png" width=600></p> The desired paragraphs are stored in the **target_examples** list. You can find train_df.csv and target_df.csv [here](https://github.com/paoloitaliani/content-extractor-pi/tree/master/data) ```python from content_extractor import contextractor as cte from gensim.models import KeyedVectors import pandas as pd W2V_MODEL = KeyedVectors.load_word2vec_format('/your/path/to/GoogleNews-vectors-negative300.bin.gz', binary=True) train_df = pd.read_csv("data/train_df.csv") target_df = pd.read_csv("data/target_df.csv") cont_ext = cte.ContentExtractor(W2V_MODEL) cont_ext.train_model(train_df) target_examples = cont_ext.extract_content(target_df) ```

نحوه نصب

نصب پکیج whl content-extractor-pi-0.1.0:

pip install content-extractor-pi-0.1.0.whl

نصب پکیج tar.gz content-extractor-pi-0.1.0:

pip install content-extractor-pi-0.1.0.tar.gz