معرفی شرکت ها


extract-sfm-2.0


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Knowledge Graph Extraction for SFM dataset
ویژگی مقدار
سیستم عامل -
نام فایل extract-sfm-2.0
نام extract-sfm
نسخه کتابخانه 2.0
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Yueen Ma
ایمیل نویسنده -
آدرس صفحه اصلی https://github.com/Panmani/KGE
آدرس اینترنتی https://pypi.org/project/extract-sfm/
مجوز -
# Knowledge Graph Extraction We designed a pipeline that can extract a special kind of knowledge graphs where a person's name will be recognized and his/her rank, role, title and organization will be related to him/her. It is not expected to perform perfectly so that all relevant persons will be recognized and all irrelevant persons will be excluded. Rather, it is seen as a first step to reduce the workload that is involved to manually extract such knowledge by combing through a large amount of documents. This pipeline consists of two major components: Name Entity Recognition and Relation Extraction. Name Entity Recognition uses a BiLSTM-CNNs-CRF model. It recognizes names, ranks, roles, titles and organizations from raw text files. Then the Relation Extraction relates names to his/her corresponding rank, role, title or organization. Example: ![Example](images/brat_stn.png) ## Dependencies Tensorflow 2.2.0 <br> Tensorflow-addons <br> SpaCy <br> NumPy <br> DyNet <br> Pathlib <br> ## Install Package: https://pypi.org/project/extract-sfm/ ```shell $ pip install extract_sfm ``` ## Usage ### Method 1 Create a python file and write: ```python import extract_sfm extract_sfm.extract("/PATH/TO/DIRECTORY/OF/INPUT/FILES") ``` Then run the python file. This may take a while to finish. ### Method 2 Download this Github repository Under the project root directory, run the python script ```shell $ python pipeline.py /PATH/TO/DIRECTORY/OF/INPUT/FILES ``` > Note: Use absolute path. ## Website 1. Copy NER_v2, RE, pipeline.py into the "SERVER/KGE" directory 2. Install npm dependencies under the "SERVER" directory: express, path, multer ``` $ npm install <package name> ``` 3. Run the server by typing in: ``` $ node server.js ``` ![Example](images/website.jpeg) ## Environment Setup ``` tensorflow 2.2.0 pip install tensorflow pip install tensorflow-addons spaCy (macOS) pip install -U spacy python3 -m spacy download en_core_web_sm DyNet pip install dynet pathlib pip install pathlib ``` ## NER Documentation ``` TRAINING Dataset: 1. SFM starter dataset: https://github.com/security-force-monitor/nlp_starter_dataset 2. CONLL2003: https://github.com/guillaumegenthial/tf_ner/tree/master/data/example 3. A set of known organizations from the starter dataset Note: Title and role were collapsed into one class Usage: 1) Prepare data $ python process.py $ cd SFM_STARTER $ python build_vocab.py $ python build_glove.py $ cd .. 2) Train model $ python train.py 3) Make predictions $ python pred.py 4) Evaluate model $ python eval.py $ python eval_class.py Files: process.py: 1) preprocess dataset by recording info in dicts, which are saved in two pickle files: dataset_labels.pickle, dataset_sentences.pickle 2) convert SFM starter dataset to a format that can be used by the model, which are in files: {}.words.txt and {}.tags.txt where {} could be train, valid or test. pred.py: generates predictions using the trained model eval.py: evaluate the predctions made by model, which are generated by running pred.py eval_class.py: get precision, recall and f1 score for each class Other files are from https://github.com/guillaumegenthial/tf_ner train.py, tf_metrics.py, SFM_STARTER/build_vocab.py, SFM_STARTER/build_glove.py PREDICTING Usage: $ python ner.py <doc_id>.txt File: ner.py: get BRAT format prediction for a text file. ``` ## RE Documentation ``` jPTDP: Before running the following 3 methods, you need to run a dependency parser first, which some methods relies on. Usage: Go to the jPTDP directory and run $ python fast_parse.py <path_to_txt>.txt The output will be put along side with the input text file in a directory whose name is same as the text file. --- METHOD 1: nearest person: Assign the non-person name entities to the nearest person that is behind the name entities. Usage: 1. To extraction relations in a single text file: (extracted relations will be appended to the .ann file) $ python relation_np.py <doc_id>.txt <doc_id>.ann 2. To generate annotations for a set of text file under <directory> Set "output_dir" in pipeline.sh to <directory> and run: $ source pipeline.sh --- METHOD 2: dependency parsing Assign the non-person name entities to the closest person where distance is the length of the dependency path between the name entity and the person Constraint: If we only choose from one of the two person that appear immediately on the left and the right side, the results could be improved but the drawbacks are also obvious Usage: 1. To extraction relations in a single text file: (extracted relations will be appended to the .ann file) $ python relation_dep.py <jPTDP_buffer_path> <doc_id>.txt <doc_id>.ann 2. To generate annotations for a set of text file under <directory> $ source pipeline.sh <directory> --- METHOD 3: neural networks Use dependency path and its distance as features to predict which person in the sentence is the best option The best model is saved in "model_86.h5" Usage: Predictions are made on files in "pred_path" and are written in place, "pred_path" can be set in config.py $ python pred.py ```


نیازمندی

مقدار نام
- numpy
- tensorflow
- tensorflow-addons
- spacy
- dynet
- pathlib


زبان مورد نیاز

مقدار نام
>=3.6 Python


نحوه نصب


نصب پکیج whl extract-sfm-2.0:

    pip install extract-sfm-2.0.whl


نصب پکیج tar.gz extract-sfm-2.0:

    pip install extract-sfm-2.0.tar.gz