معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

A corpus-linguistic tool to extract and search for linguistic features

ویژگی	مقدار
سیستم عامل	OS Independent
نام فایل	LFExtractor-1.0.1
نام	LFExtractor
نسخه کتابخانه	1.0.1
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Jack Wang
ایمیل نویسنده	jackwang196531@gmail.com
آدرس صفحه اصلی	https://github.com/jaaack-wang/ling_feature_extractor
آدرس اینترنتی	https://pypi.org/project/LFExtractor/
مجوز	MIT License

# ling_feature_extractor ## Description - A corpus-linguistic tool to extract and search for linguistic features in a text or a corpus. - There are 95 built-in linguistic features in the main version versus 98 features in the Thesis_Project version. Deleted features are words per utterance, number of utterances, and number of overlaps, which are not deemed as generally accessible in a normal corpus. - Over 2/3 of these features come from Biber et al.(2006) with 42 features also present in Biber(1988). These features are generally known as part of the Multi-Dimensional (MD) analysis framework. - The program is mainly tested on two online accessible corpora, namely [British Academic Spoken Corpus](http://www.reading.ac.uk/AcaDepts/ll/base_corpus/) and [Michigan Corpus of Academic Englush](https://quod.lib.umich.edu/cgi/c/corpus/corpus?page=home;c=micase;cc=micase), but due to copyright concerns, here it is tested on the [test_sample](https://github.com/jaaack-wang/ling_feature_extractor/tree/main/test_sample). ## Prerequisites - `Computer Langauges`: - Python 3.6+: check with cmd: `python --version` or `python3 --version` ([Download Page](https://www.python.org/downloads/)); - Java 1.8+: check with cmd: 'java --version' ([Download Page](https://www.java.com/en/download/)). - `Python packages` | Package | Description | Pip download | | :---: | :---: | :---: | | [stanfordcorenlp](https://github.com/Lynten/stanford-corenlp) | A Python wrapper for StanforeCoreNLP | `pip/pip3 install stanfordcorenlp` | | [pandas](https://pandas.pydata.org) | Used for storing extracted feature frequencies | `pip/pip3 install pandas` | Besides, built-in packages are heavily employed in the program, especially the built-in `re` package for Regular Expression. ## Installation - Directly download from this page and cd to the project folder. - By pip: `pip/pip3 install LFExtractor` ## Usage ### path to StanfordCoreNLP **Please specify _the directory to StanfordCoreNLP_ in the text_processor.py under LFE folder when first using the program.** - [X] `nlp = StanfordCoreNLP("/path/to/StanfordCoreNLP/")` Example: nlp = StanfordCoreNLP("/Users/wzx/p_package/stanford-corenlp-4.1.0") ### Dealing with a corpus of files ```python from LFE.extractor import CorpusLFE lfe = CorpusLFE('/directory/to/the/corpus/under/analysis/') # get frequency data and tagged corpus and extracted features by default lfe.corpus_feature_fre_extraction() lfe.corpus_feature_fre_extraction() # lfe.corpus_feature_fre_extraction(normalized_rate=100, save_tagged_corpus=True, save_extracted_features=True, left=0, right=0). # change the normalized_rate, trun off tagged text and leave extracted text with specified context to display lfe.corpus_feature_fre_extraction(1000, False, True, 2, 3) # extract frequency data only, and the data are normalized at 1000 words. # get frequency data only lfe.corpus_feature_fre_extraction(save_tagged_corpus=False, save_extracted_features=False) # get tagged corpus only lfe.save_tagged_corpus() # get extracted feature only lfe.save_corpus_extracted_features() # lfe.save_corpus_extracted_features(left=0, right=0) # set how many words to display besides the target pattern lfe.save_corpus_extracted_features(2, 3) # extract and save specific linguistic feature by feature name # to see the built-in features' names, use `show_feature_names()` from LFE.extractor import * print(show_feature_names()) # Six letter words and longer, Contraction, Agentless passive, By passive... # specify which feature to extract and save lfe.save_corpus_one_extracted_feature_by_name('Six letter words and longer') # extract and save specific linguistic feature by feature regex, for example, 'you know' lfe.save_corpus_one_extracted_feature_by_regex(r'you_\S+ know_\S+', 2, 2, feature_name='You Know') # Extract phrase 'you know' along with 2 words spanning around. Also remember the '_\S+' at the end of each word since the corpus will be automatically POS tagged. # for more complex structure, the features_set.py can be ultilized, for example, to extract "article + adj + noun" structure from LFE import features_set as fs ART = fs.ART ADJ = fs.ADJ NOUN = fs.NOUN lfe.save_corpus_one_extracted_feature_by_regex(rf'{ART} {ADJ} {NOUN}', 2, 2, 'Noun phrase') # result example (use test_sample): away_RB by_IN 【 the_DT whole_JJ thing_NN 】 In_IN fact_NN ``` ### Dealing with a text ```python from LFE import extractor as ex # check the functionalities contained in ex by dir(ex) # show built-in feature names print(ex.show_feature_names()) # Six letter words and longer, Contraction, Agentless passive, By passive... # get built-in features' regex by its name print(ex.get_feature_regex_by_name('Contraction')) # (n't| '\S\S?)_[^P]\S+ # get built-in features' names by regex print(ex.get_feature_name_by_regex(r"(n't| '\S\S?)_[^P]\S+")) # Contraction # text processing # tagged file ex.save_single_tagged_text('/path/to/the/file') # cleaned file ex.save_single_cleaned_text('/path/to/the/file') # display extracted feature by name res = ex.display_extracted_feature_by_name('/path/to/the/file', 'Contraction', left=0, right=0) print(res) # 's_VBZ, n't_NEG, 've_VBP... # save the result ex.save_extracted_feature_by_name('/path/to/the/file', 'Contraction', left=0, right=0) # display extracted feature by regex, for example, noun phrase from LFE import features_set as fs ART = fs.ART ADJ = fs.ADJ NOUN = fs.NOUN res = ex.display_extracted_feature_by_regex(rf'{ART} {ADJ} {NOUN}', 2, 2, 'Noun phrase') print(res) # One_CD is_VBZ 【 the_DT extraordinary_JJ evidence_NN 】 of_IN human_JJ # save the result ex.save_extracted_feature_by_regex(rf'{ART} {ADJ} {NOUN}', 2, 2, 'Noun phrase') # get the frequency data of all the linguistic features for a file res = ex.get_single_file_feature_fre(file_path, normalized_rate=100, save_tagged_file=True, save_extracted_features=True, left=0, right=0) print(res) ``` ### Dealing with a part of a corpus ```python from LFE.extractor import * lfe = CorpusLFE('/directory/to/the/corpus/under/analysis/') # get_filepath_list and select the files you want to examine and construct a list fp_list = lfe.get_filepath_list() # loop through the list and use the functionalities mentioned above to get the results you want ```

زبان مورد نیاز

مقدار	نام
>=3.6	Python

نحوه نصب

نصب پکیج whl LFExtractor-1.0.1:

pip install LFExtractor-1.0.1.whl

نصب پکیج tar.gz LFExtractor-1.0.1:

pip install LFExtractor-1.0.1.tar.gz