معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

GroupHug is a library with extensions to 🤗 transformers for multitask language modelling.

ویژگی	مقدار
سیستم عامل	-
نام فایل	chatdesk-grouphug-0.8.1
نام	chatdesk-grouphug
نسخه کتابخانه	0.8.1
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Sander Land
ایمیل نویسنده	-
آدرس صفحه اصلی	https://github.com/sanderland/grouphug
آدرس اینترنتی	https://pypi.org/project/chatdesk-grouphug/
مجوز	Apache2

# grouphug GroupHug is a library with extensions to 🤗 transformers for multitask language modelling. In addition, it contains utilities that ease data preparation, training, and inference. ## Project Moved Grouphug maintenance and future versions have moved to [my personal repository](https://github.com/sanderland/grouphug). ## Overview The package is optimized for training a single language model to make quick and robust predictions for a wide variety of related tasks at once, as well as to investigate the regularizing effect of training a language modelling task at the same time. You can train on multiple datasets, with each dataset containing an arbitrary subset of your tasks. Supported tasks include: * A single language modelling task (Masked language modelling, Masked token detection, Causal language modelling). * The default collator included handles most preprocessing for these heads automatically. * Any number of classification tasks, including single- and multi-label classification and regression * A utility function that automatically creates a classification head from your data. * Additional options such as hidden layer size, additional input variables, and class weights. * You can also define your own model heads. ## Quick Start The project is based on Python 3.8+ and PyTorch 1.10+. To install it, simply use: `pip install grouphug` ### Documentation Documentation can be generated from docstrings using `make html` in the `docs` directory, but this is not yet on a hosted site. ### Example usage ```python import pandas as pd from datasets import load_dataset from transformers import AutoTokenizer from grouphug import AutoMultiTaskModel, ClassificationHeadConfig, DatasetFormatter, LMHeadConfig, MultiTaskTrainer # load some data. 'label' gets renamed in huggingface, so is better avoided as a feature name. task_one = load_dataset("tweet_eval",'emoji').rename_column("label", "tweet_label") both_tasks = pd.DataFrame({"text": ["yay :)", "booo!"], "sentiment": ["pos", "neg"], "tweet_label": [0,14]}) # create a tokenizer base_model = "prajjwal1/bert-tiny" tokenizer = AutoTokenizer.from_pretrained(base_model) # preprocess your data: tokenization, preparing class variables formatter = DatasetFormatter().tokenize().encode("sentiment") # data converted to a DatasetCollection: essentially a dict of DatasetDict data = formatter.apply({"one": task_one, "both": both_tasks}, tokenizer=tokenizer, test_size=0.05) # define which model heads you would like head_configs = [ LMHeadConfig(weight=0.1), # default is BERT-style masked language modelling ClassificationHeadConfig.from_data(data, "sentiment"), # detects dimensions and type ClassificationHeadConfig.from_data(data, "tweet_label"), # detects dimensions and type ] # create the model, optionally saving the tokenizer and formatter along with it model = AutoMultiTaskModel.from_pretrained(base_model, head_configs, formatter=formatter, tokenizer=tokenizer) # create the trainer trainer = MultiTaskTrainer( model=model, tokenizer=tokenizer, train_data=data[:, "train"], eval_data=data[["one"], "test"], eval_heads={"one": ["tweet_label"]}, # limit evaluation to one classification task ) trainer.train() ``` ### Tutorials See [examples](./examples) for a few notebooks that demonstrate the key features. ## Supported Models The package has support for the following base models: * Bert, DistilBert, Roberta/DistilRoberta, XLM-Roberta * Deberta/DebertaV2 * Electra * GPT2, GPT-J, GPT-NeoX, OPT Extending it to support other models is possible by simply inheriting from `_BaseMultiTaskModel`, although language modelling head weights may not always load. ## Limitations * The package only supports PyTorch, and will not work with other frameworks. There are no plans to change this. * Grouphug was developed and tested with 🤗 transformers 4.19-4.22. We will aim to test and keep compatibility with the latest version, but it is still recommended to lock the latest working versions. See the [contributing page](CONTRIBUTING.md) if you are interested in contributing. ## License grouphug was developed by [Chatdesk](http://www.chatdesk.com) and is licensed under the Apache 2 [license](LICENSE).

نیازمندی

مقدار	نام
>=1.3.4,<2.0.0	Unidecode
>=2.0.0,<3.0.0	datasets
>=1.1.0,<2.0.0	demoji
>=0.3.0,<0.4.0	evaluate
>=1.21,<2.0	numpy
>=2022.3.15,<2023.0.0	regex
>=0.1.96,<0.2.0	sentencepiece
>=1.10.0,<2.0.0	torch
>=4.20.0,<5.0.0	transformers

زبان مورد نیاز

مقدار	نام
>=3.8,<4.0	Python

نحوه نصب

نصب پکیج whl chatdesk-grouphug-0.8.1:

pip install chatdesk-grouphug-0.8.1.whl

نصب پکیج tar.gz chatdesk-grouphug-0.8.1:

pip install chatdesk-grouphug-0.8.1.tar.gz