معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

ویژگی	مقدار
سیستم عامل	-
نام فایل	bunkatopics-0.8
نام	bunkatopics
نسخه کتابخانه	0.8
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Charles De Dampierre
ایمیل نویسنده	charles.de-dampierre@hec.edu
آدرس صفحه اصلی	-
آدرس اینترنتی	https://pypi.org/project/bunkatopics/
مجوز	-

# BunkaTopics BunkaTopics is a Topic Modeling package that leverages Embeddings and focuses on Topic Representation to extract meaningful and interpretable topics from a list of documents. ## Installation Before installing bunkatopics, please install the following packages: Load the spacy language models ```bash python -m spacy download fr_core_news_lg ``` ```bash python -m spacy download en_core_web_sm ``` Eventually, install bunkatopic using pip ```bash pip install bunkatopics ``` ## Quick Start with BunkaTopics ```python from bunkatopics import BunkaTopics import pandas as pd data = pd.read_csv('data/imdb.csv', index_col = [0]) data = data.sample(2000, random_state = 42) # Instantiate the model, extract ther terms and Embed the documents model = BunkaTopics(data, # dataFrame text_var = 'description', # Text Columns index_var = 'imdb', # Index Column (Mandatory) extract_terms=True, # extract Terms ? terms_embeddings=True, # extract terms Embeddings? docs_embeddings=True, # extract Docs Embeddings? embeddings_model="distiluse-base-multilingual-cased-v1", # Chose an embeddings Model multiprocessing=True, # Multiprocessing of Embeddings language="en", # Chose between English "en" and French "fr" sample_size_terms = len(data), terms_limit=10000, # Top Terms to Output terms_ents=True, # Extract entities terms_ngrams=(1, 2), # Chose Ngrams to extract terms_ncs=True, # Extract Noun Chunks terms_include_pos=["NOUN", "PROPN", "ADJ"], # Include Part-of-Speech terms_include_types=["PERSON", "ORG"]) # Include Entity Types # Extract the topics topics = model.get_clusters(topic_number= 15, # Number of Topics top_terms_included = 1000, # Compute the specific terms from the top n terms top_terms = 5, # Most specific Terms to describe the topics term_type = "lemma", # Use "lemma" of "text" ngrams = [1, 2], # N-grams for Topic Representation clusterer = 'hdbscan') # Chose between Kmeans and HDBSCAN # Visualize the clusters. It is adviced to choose less that 5 terms - top_terms = 5 - to avoid overchanging the Figure fig = model.visualize_clusters(search = None, width=1000, height=1000, fit_clusters=True, # Fit Umap to well visually separate clusters density_plot=False) # Plot a density map to get a territory overview fig.show() centroid_documents = model.get_centroid_documents(top_elements=2) ```

نیازمندی

مقدار	نام
>=2.28.1,<3.0.0	requests
==1.21.5	numpy
==1.4.1	pandas
==5.6.0	plotly
==1.1.3	scikit-learn
==2.2.0	sentence-transformers
>=3,<4	spacy
==0.12.0	textacy
==4.63.0	tqdm
==0.5.3	umap-learn
==1.6.2	xgboost
>=3.5.1,<4.0.0	jupyterlab

زبان مورد نیاز

مقدار	نام
>=3.8,<3.11	Python

نحوه نصب

نصب پکیج whl bunkatopics-0.8:

pip install bunkatopics-0.8.whl

نصب پکیج tar.gz bunkatopics-0.8:

pip install bunkatopics-0.8.tar.gz