معرفی شرکت ها


anuvaad-1.0.6


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

State of the art translation for Indic languages.
ویژگی مقدار
سیستم عامل -
نام فایل anuvaad-1.0.6
نام anuvaad
نسخه کتابخانه 1.0.6
نگهدارنده []
ایمیل نگهدارنده []
نویسنده BEDAPUDI PRANEETH
ایمیل نویسنده praneeth@bpraneeth.com
آدرس صفحه اصلی https://github.com/notAI-tech/Anuvaad
آدرس اینترنتی https://pypi.org/project/anuvaad/
مجوز GPLv3
# Anuvaad State of the art open-source translation models for Indic languages. # Installation ```bash # CPU pytorch will be installed if torch is not installed pip install --upgrade anuvaad ``` # Usage **As a Python module** ```python from anuvaad import Anuvaad anu = Anuvaad("english-telugu") # Single sentence translation # beam_size is optional and defaults to 4 anu.anuvaad("YS Jagan is the chief minister of Andhra Pradesh.") # "వైఎస్ జగన్ ఆంధ్రప్రదేశ్ ముఖ్యమంత్రి." # Batch translation anu.anuvaad(["YS Jagan is the chief minister of Andhra Pradesh.", "Nara Lokesh suffered a humiliating defeat in Mangalagiri."]) # ['వైఎస్ జగన్ ఆంధ్రప్రదేశ్ ముఖ్యమంత్రి.', 'మంగళగిరిలో నారా లోకేష్కు అవమానకరమైన ఓటమి ఎదురైంది.'] ``` **As a service** ```bash # Starting the api service docker run -it -e BATCH_SIZE=1 -p 8080:8080 notaitech/anuvaad:english-telugu # Running a prediction curl -d '{"data": ["YS Jagan is the chief minister of Andhra Pradesh."]}' -H "Content-Type: application/json" -X POST http://localhost:8080/sync ``` |Available Models | Anuvaad BLEU | Google BLEU | |--------|:--------------:|--------| |english-telugu | 12.721173743764009 | 6.841437460383768 | |english-tamil | 12.737036149214694 | 5.558450942590664 |english-malayalam | 17.785746646721996 | 19.569069412553812 | |english-kannada | 7.888886041933815 | 3.2803251953567893 | |english-marathi | 23.02755955392518 | 12.888112016722792 | |english-hindi | 29.175892213216954 | 18.130893478614375 | |english-bengali | | | |english-punjabi | | | |english-gujarati | | | - Google BLEU is calculated from translations generated by the [GOOGLETRANSLATE() function](https://support.google.com/docs/answer/3093331?hl=en) on google sheets. - The testing scripts and data from Tatoeba is present at https://github.com/notAI-tech/Anuvaad-testing-scripts - https://docs.google.com/spreadsheets/d/1tYYZObELj-k6mJCnM6uf7xg3JChbSkjs8YvZsOgHacQ/edit?usp=sharing is the sheet containing the predictions from Anuvaad and GOOGLETRANSLATE function on the Tatoeba data from which the above scores are calculated. # My thoughts on the evaluation/accuracy of the model(s): 1. Unlike classification/ sequence labelling tasks, for open-domain translation or summarization systems it is very hard to quantify the accuracy through numbers. 2. This is because, most accuracy measurements actually measure the overlap of character/word n-grams between the expected output and predicted output. 3. These scores definitely help when evaluating/comparing multiple models on a particular dataset, but the number don't translate well for open-domain models. 4. For example, Anuvaad translates the sentence ***An advance is placed with the Medical Superintendents of such hospitals who then provide assistance on a case to case basis.*** (taken from http://data.statmt.org/pmindia/v1/parallel corpus) to ***ऐसे अस्पतालों के चिकित्सा अधीक्षकों के साथ एडवांस रखा जाता है, जिसके बाद मामले के आधार पर सहायता प्रदान की जाती है।*** where as the expected translation of the sentence from the dataset is ***अग्रिम धन राशि इन अस्पतालों को चिकित्सा निरीक्षकों को दी जाएगी, जो हर मामले को देखते हुए सहायता प्रदान करेंगे।***. 5. In the above example, Although Anuvaad's translation is correct (in the sense that translation conveys the same thing as the original sentence), the BLEU score with n=3 will be 0. 6. Similarly, a model trained on the pmindia dataset will have bad score on a different dataset which uses a different style of writing, even if the translation is semantically correct. 7. Our aim in building Anuvaad is to build a general purpose, open-domain translation module that can flexibly translate text from various domains. 8. https://docs.google.com/spreadsheets/d/1_TTtBEvVgemQfGbRBSZYkECMMt5r7L9-dt0FGVUbmOY/edit?usp=sharing is a sheet comparing translations from Anuvaad, ilmulti (https://github.com/jerinphilip/ilmulti) and Google Translate (=GOOGLETRANSLATE(text, "en", "language") function on google sheets) on 100 randomly selected English sentences from Tatoeba.


نیازمندی

مقدار نام
>=4.5.0 transformers
>=1.0.9 pydload
>=1.5.0 torch
- sentencepiece


زبان مورد نیاز

مقدار نام
>=3.6.0 Python


نحوه نصب


نصب پکیج whl anuvaad-1.0.6:

    pip install anuvaad-1.0.6.whl


نصب پکیج tar.gz anuvaad-1.0.6:

    pip install anuvaad-1.0.6.tar.gz