معرفی شرکت ها


flashlight-text-0.0.4.dev289


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Flashlight Text bindings for Python
ویژگی مقدار
سیستم عامل -
نام فایل flashlight-text-0.0.4.dev289
نام flashlight-text
نسخه کتابخانه 0.0.4.dev289
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Jacob Kahn
ایمیل نویسنده jacobkahn1@gmail.com
آدرس صفحه اصلی https://github.com/flashlight/text
آدرس اینترنتی https://pypi.org/project/flashlight-text/
مجوز BSD licensed, as found in the LICENSE file
# Flashlight Text Python Bindings ### Quickstart The Flashlight Text Python package containing beam search decoder and Dictionary components is available on PyPI: ```bash pip install flashlight-text ``` To enable optional KenLM support in Python with the decoder, KenLM must be installed via pip: ```bash pip install git+https://github.com.kpu/kenlm.git ``` #### Contents - [Installation](#installation) * [Dependencies](#dependencies) * [Build Instructions](#build-instructions) * [Advanced Options](#advanced-options) - [Python API Documentation](#python-api-documentation) * [Beam search decoder](#beam-search-decoder) * [Beam search decoding with your own language model](#decoding-with-your-own-language-model) - [Examples](#examples) ## Installation ### Dependencies We require `python >= 3.6` with the following packages installed: - [cmake](https://cmake.org/) >= 3.18, and `make` (installable via `pip install cmake`) - [KenLM](https://github.com/kpu/kenlm) (must be installed `pip install git+https://github.com/kpu/kenlm.git`) ### Build Instructions Once the dependencies are satisfied, from the project root, use: ``` pip install . ``` Using the environment variable `USE_KENLM=0` removes the KenLM dependency but precludes using the decoder with a language model unless you write C++/`pybind11` bindings for your own language model. Install in editable mode for development: ``` pip install -e . ``` (`pypi` installation coming soon) **Note:** if you encounter errors, you'll probably have to `rm -rf build dist` before retrying the install. ## Python API Documentation ### Beam Search Decoder Bindings for the lexicon and lexicon-free beam search decoders are supported for CTC/ASG models only (no seq2seq model support). Out-of-the-box language model support includes KenLM; users can define custom a language model in Python and use it for decoding; see the [documentation](#define-your-own-language-model-for-beam-search-decoding) below. To run decoder one first should define options: ```python from flashlight.lib.text.decoder import LexiconDecoderOptions, LexiconFreeDecoderOptions # for lexicon-based decoder options = LexiconDecoderOptions( beam_size, # number of top hypothesis to preserve at each decoding step token_beam_size, # restrict number of tokens by top am scores (if you have a huge token set) beam_threshold, # preserve a hypothesis only if its score is not far away from the current best hypothesis score lm_weight, # language model weight for LM score word_score, # score for words appearance in the transcription unk_score, # score for unknown word appearance in the transcription sil_score, # score for silence appearance in the transcription log_add, # the way how to combine scores during hypotheses merging (log add operation, max) criterion_type # supports only CriterionType.ASG or CriterionType.CTC ) # for lexicon free-based decoder options = LexiconFreeDecoderOptions( beam_size, # number of top hypothesis to preserve at each decoding step token_beam_size, # restrict number of tokens by top am scores (if you have a huge token set) beam_threshold, # preserve a hypothesis only if its score is not far away from the current best hypothesis score lm_weight, # language model weight for LM score sil_score, # score for silence appearance in the transcription log_add, # the way how to combine scores during hypotheses merging (log add operation, max) criterion_type # supports only CriterionType.ASG or CriterionType.CTC ) ``` Now, prepare a tokens dictionary (tokens for which a model returns probability for each frame) and a lexicon (mapping between words and their spellings within a tokens set). For further details on tokens and lexicon file formats, see the [Data Preparation](https://github.com/flashlight/flashlight/tree/master/flashlight/app/asr#data-preparation) documentation in [Flashlight](https://github.com/flashlight/flashlight). ```python from flashlight.lib.text.dictionary import Dictionary, load_words, create_word_dict tokens_dict = Dictionary("path/tokens.txt") # for ASG add used repetition symbols, for example # token_dict.add_entry("1") # token_dict.add_entry("2") lexicon = load_words("path/lexicon.txt") # returns LexiconMap word_dict = create_word_dict(lexicon) # returns Dictionary ``` To create a KenLM language model, use: ```python from flashlight.lib.text.decoder import KenLM lm = KenLM("path/lm.arpa", word_dict) # or "path/lm.bin" ``` Get the unknown and silence token indices from the token and word dictionaries to pass to the decoder: ```python sil_idx = token_dict.get_index("|") unk_idx = word_dict.get_index("<unk>") ``` Now, define the lexicon `Trie` to restrict the beam search decoder search: ```python from flashlight.lib.text.decoder import Trie, SmearingMode from flashlight.lib.text.dictionary import pack_replabels trie = Trie(token_dict.index_size(), sil_idx) start_state = lm.start(False) def tkn_to_idx(spelling: list, token_dict : Dictionary, maxReps : int = 0): result = [] for token in spelling: result.append(token_dict.get_index(token)) return pack_replabels(result, token_dict, maxReps) for word, spellings in lexicon.items(): usr_idx = word_dict.get_index(word) _, score = lm.score(start_state, usr_idx) for spelling in spellings: # convert spelling string into vector of indices spelling_idxs = tkn_to_idx(spelling, token_dict, 1) trie.insert(spelling_idxs, usr_idx, score) trie.smear(SmearingMode.MAX) # propagate word score to each spelling node to have some lm proxy score in each node. ``` Finally, we can run lexicon-based decoder: ```python import numpy from flashlight.lib.text.decoder import LexiconDecoder blank_idx = token_dict.get_index("#") # for CTC transitions = numpy.zeros((token_dict.index_size(), token_dict.index_size()) # for ASG fill up with correct values is_token_lm = False # we use word-level LM decoder = LexiconDecoder(options, trie, lm, sil_idx, blank_idx, unk_idx, transitions, is_token_lm) # emissions is numpy.array of emitting model predictions with shape [T, N], where T is time, N is number of tokens results = decoder.decode(emissions.ctypes.data, T, N) # results[i].tokens contains tokens sequence (with length T) # results[i].score contains score of the hypothesis # results is sorted array with the best hypothesis stored with index=0. ``` ### Decoding with your own language model One can define custom language model in python and use it for beam search decoding. To store language model state, derive from the `LMState` base class and define additional data corresponding to each state by creating `dict(LMState, info)` inside the language model class: ```python import numpy from flashlight.lib.text.decoder import LM class MyPyLM(LM): mapping_states = dict() # store simple additional int for each state def __init__(self): LM.__init__(self) def start(self, start_with_nothing): state = LMState() self.mapping_states[state] = 0 return state def score(self, state : LMState, token_index : int): """ Evaluate language model based on the current lm state and new word Parameters: ----------- state: current lm state token_index: index of the word (can be lexicon index then you should store inside LM the mapping between indices of lexicon and lm, or lm index of a word) Returns: -------- (LMState, float): pair of (new state, score for the current word) """ outstate = state.child(token_index) if outstate not in self.mapping_states: self.mapping_states[outstate] = self.mapping_states[state] + 1 return (outstate, -numpy.random.random()) def finish(self, state: LMState): """ Evaluate eos for language model based on the current lm state Returns: -------- (LMState, float): pair of (new state, score for the current word) """ outstate = state.child(-1) if outstate not in self.mapping_states: self.mapping_states[outstate] = self.mapping_states[state] + 1 return (outstate, -1) ``` LMState is a C++ base class for language model state. Its `compare` method (for comparing one state with another) is used inside the beam search decoder. It also has a `LMState child(int index)` method which returns a state obtained by following the token with this index from current state. All LM states are organized as a trie. We use the `child` method in python to properly create this trie (which will be used inside the decoder to compare states) and can store additional state data in `mapping_states`. This language model can be used as follows. Here, we print the state and its additional stored info inside `lm.mapping_states`: ```python custom_lm = MyLM() state = custom_lm.start(True) print(state, custom_lm.mapping_states[state]) for i in range(5): state, score = custom_lm.score(state, i) print(state, custom_lm.mapping_states[state], score) state, score = custom_lm.finish(state) print(state, custom_lm.mapping_states[state], score) ``` and for the decoder: ```python decoder = LexiconDecoder(options, trie, custom_lm, sil_idx, blank_inx, unk_idx, transitions, False) ``` ## Tests and Examples An integration test for Python decoder bindings can be found in `bindings/python/test/test_decoder.py`. To run, use: ```bash cd bindings/python/test python3 -m unittest discover -v . ```


زبان مورد نیاز

مقدار نام
>=3.6 Python


نحوه نصب


نصب پکیج whl flashlight-text-0.0.4.dev289:

    pip install flashlight-text-0.0.4.dev289.whl


نصب پکیج tar.gz flashlight-text-0.0.4.dev289:

    pip install flashlight-text-0.0.4.dev289.tar.gz