معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Symbol/symbol bigram/word/word bigram frequency analyzer with excel output.

ویژگی	مقدار
سیستم عامل	-
نام فایل	frequency-analysis-0.1.4.5
نام	frequency-analysis
نسخه کتابخانه	0.1.4.5
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	uqqu
ایمیل نویسنده	-
آدرس صفحه اصلی	https://github.com/uqqu/frequency_analysis
آدرس اینترنتی	https://pypi.org/project/frequency-analysis/
مجوز	-

Frequency analysis ------------------ Python package for symbol/word and their bigrams frequency analysis with excel output. What values can be counted: quantity, quantity in the first position, quantity in the last position, average position. For which data values can be counted: symbols, symbol bigrams, words, word bigrams. Additional possible data: ye-yo words table for Russian language (in excel output it can be cross-referenced with words quantity). Usage ----- 1. ``pip install frequency-analysis``; 2. Download data set of your choice; 3. Call ``Analysis`` class with context manager (take a look at the optional arguments); 4. Parse your data set to word list (one sentence in list for properly word position counting) and send it to one of three methods of ``Analysis``; 5. Call ``Result`` class with context manager (with optional name argument); 6. Call one or several of ``Result`` methods to create excel sheet(s) with appropriate data. Methods and arguments --------------------- ``Analysis`` class arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All arguments are optional * *name* – the name of the folder in which the analysis will be saved default ``frequency_analysis`` * *mode* – analysis operation mode (``[n]ew``, ``[a]ppend``, ``[c]ontinue``) default ``n`` * *word\_pattern* – regex pattern for matching inwords symbols default ``[a-zA-Zа-яА-ЯёЁ]+(?:(?:-?[a-zA-Zа-яА-ЯёЁ]+)+\|'?[a-zA-Zа-яА-ЯёЁ]+)\|[a-zA-Zа-яА-ЯёЁ]`` * *allowed\_symbols* – string of symbols or list with symbol unicode decimal values, which will be counted to analysis default ``[*range(32, 127), 1025, *range(1040, 1104), 1105]`` (base punctuation, base Latin, Russian Cyrillic) * *yo* – int for additional Russian word processing – compare words with word list to detect number of ye/yo misspelling. 0 – disabled; 1 – enabled; 2 with 'a' mode – update yo list with new data. default ``0`` To use the last one you should place two word files near the running script (``yo.txt`` for words with mandatory yo and ``ye-yo.txt`` for possibly yo writing). You can use your own or take it `here <https://github.com/uqqu/yo_dict>`__. ``Analysis`` class methods ~~~~~~~~~~~~~~~~~~~~~~~~~~ ``count_symbols(word_list: list, [pos: bool, bigrams: bool])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Method for counting symbol and symbol\_bigram frequency. Counted values: quantity, quantity in the first position, quantity in the last position, average position in word. Average position counted only with argument ``pos`` as ``True`` (default ``False``). Position for symbols, which matched with ``word_pattern`` counted as for "clear" word, for other – as for "raw". Example: in single word ``–Yes!`` with default ``word_pattern`` positions will be counted as ``(– 1), (Y 1), (e 2), (s 3), (! 5)``. Bigrams counting can be disabled with argument ``bigram`` as ``False`` (default ``True``). ``count_words(word_list: list, [pos: bool, bigrams: bool])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Method for counting word and word\_bigrams frequency. Counted values: quantity, quantity in the first position, quantity in the last position, average position in sentence. Average position counted only with argument ``pos`` as ``True`` (default ``False``). Bigrams counting can be disabled with argument ``bigram`` as ``False`` (default ``True``). ``count_all(word_list: list, [pos: bool, symbol_bigrams: bool, word_bigrams: bool])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Combined call of previous two methods. ``Result`` class arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~ The only argument is optional * *name* – the name of the folder in which the analysis was saved default ``frequency_analysis`` ``Result`` class methods ~~~~~~~~~~~~~~~~~~~~~~~~ First 6 methods can be called all it once with ``treat()`` method Many methods accept arguments ``limit``, ``chart_limit``, ``min_quantity`` and ``ignore_case``. * *limit* (default ``0``) it is a max number of elements, which will be added to the sheet. ``0`` – unlimited; * *chart_limit* (default ``20``) – a number of elements, which will be counted with graphical chart; * *min_quantity* (default ``1``) – a minimal appropriate value at with element will be added to the sheet; * *ignore_case* (default ``False``) – with this argument as ``True`` lower- and upper- case symbols will be united into a single element. With ``False`` – will be counted separately. ``Keyword-only`` ``sheet_stats()`` ^^^^^^^^^^^^^^^^^ Main result info – number of entries, total count and average position (if exists) for each data type. ``sheet_top_symbols([limit, chart_limit, min_quantity])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Top list of all analyzed symbols sorted by quantity. The next to it is also located the same one list, but with ignore-case. There is no need to create separate sheet, just use column of your choice. ``sheet_top_symbol_bigrams([limit, chart_limit, min_quantity])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Top list of symbol bigrams sorted by quantity with additional case insensitive top-list. ``sheet_all_symbol_bigrams([min_quantity, ignore_case])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2D sheet with all bigrams quantity. ``min_quantity`` argument works here for sum of row/column values instead of each separated bigram. ``sheet_top_words([limit, chart_limit, min_quantity])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Top list of analyzed words sorted by quantity. Word counting is always case insensitive, on the ``Analyze`` stage. ``sheet_top_word_bigrams([limit, chart_limit, min_quantity])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Top list of analyzed word bigrams sorted by quantity. ``treat([limits: tuple(four int), chart_limits: tuple(four int), min_quantities: tuple(five int)])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Single call of all ``Result`` methods above. Calling methods in order of tuple values: 1. ``sheet_top_symbols()`` 2. ``sheet_top_symbol_bigrams()`` 3. ``sheet_top_words()`` 4. ``sheet_top_word_bigrams()`` 5. ``sheet_all_symbol_bigrams()`` Please note – the last one (value for ``sheet_all_symbol_bigrams()``) there is only in the ``min_quantities`` argument. Default values as elsewhere: * *limits* – ``(0,)*4`` * *chart_limits* – ``(20,)*4`` * *min_quantities* – ``(1,)*5`` ``sheet_custom_top_symbols(symbols: str, [chart_limit, name='Custom symbols'])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create symbols top-list as ``sheet_top_symbols()``, but only with symbols of your choice. ``name`` – ``keyword-only`` ``sheet_en_top_symbols(symbols: str, [chart_limit])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create symbols top-list as ``sheet_top_symbols()``, but only with base Latin symbols. ``sheet_ru_top_symbols(symbols: str, [chart_limit])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create symbols top-list as ``sheet_top_symbols()``, but only with Russian Cyrillic symbols. ``sheet_custom_symbol_bigrams(symbols: str, [ignore_case, name='Custom symbol bigrams'])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create symbol bigrmas 2D sheet as ``sheet_all_symbol_bigrams()``, but only with symbols of your choice. Order of symbols on the sheet will be the same as in the input argument. ``name`` – ``keyword-only`` ``sheet_en_symbol_bigrams([ignore_case])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create symbol bigrams 2D sheet as ``sheet_all_symbol_bigrams()``, but only with base Latin symbols. ``sheet_ru_symbol_bigrams([ignore_case])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create symbol bigrams 2D sheet as ``sheet_all_symbol_bigrams()``, but only with Russian Cyrillic symbols. ``sheet_yo_words([limit, min_quantity])`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create cross-referenced sheet for all counted ye-yo words with their quantity and total misspells counter. Works only with analysis created with ``yo`` argument as ``1`` or ``2``. Performed analyses ------------------ * English analysis with `EuroMatrixPlus/MultiUN <http://www.euromatrixplus.net/multi-un/>`__ English data set (3.1Gb .xml, 2.4\*10\ :sup:`9` symbols, 379\*10\ :sup:`6` words) * https://github.com/uqqu/frequency\_analysis/tree/master/examples/en/multiUN * Russian analysis with `EuroMatrixPlus/MultiUN <http://www.euromatrixplus.net/multi-un/>`__ Russian data set (4.3Gb .xml, 2.2\*10\ :sup:`9` symbols, 270\*10\ :sup:`6` words) * https://github.com/uqqu/frequency\_analysis/tree/master/examples/ru/multiUN * Russian analysis with `OpenCorpora <http://opencorpora.org/>`__ data set (528Mb .xml, 11.7\*10\ :sup:`6` symbols, 1.6\*10\ :sup:`6` words) * https://github.com/uqqu/frequency\_analysis/tree/master/examples/ru/annot\_opcorpora

نحوه نصب

نصب پکیج whl frequency-analysis-0.1.4.5:

pip install frequency-analysis-0.1.4.5.whl

نصب پکیج tar.gz frequency-analysis-0.1.4.5:

pip install frequency-analysis-0.1.4.5.tar.gz