معرفی شرکت ها


debias-0.165


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

remove bias from GAF files
ویژگی مقدار
سیستم عامل -
نام فایل debias-0.165
نام debias
نسخه کتابخانه 0.165
نگهدارنده []
ایمیل نگهدارنده []
نویسنده xiao hu
ایمیل نویسنده xiaohu@iastate.edu
آدرس صفحه اصلی https://github.com/Rinoahu/debias
آدرس اینترنتی https://pypi.org/project/debias/
مجوز GPLv3
# Debiasing a Protein Annotation Database Debiaser removes bias from [GAF](http://www.geneontology.org/page/go-annotation-file-formats) files based on annotation information content, GO evidence, annotation source, number of proteins annotated from a given source, an date. Debiaser accepts one or more GAF files as input. The motivation for GAF lies in the observation that many organism annotations are biased due to high throughpout experimental studies ([1](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003063)). Removing such annotation biases can help present a more balanaced picture of protein annotations for a given organism or set of proteins. ### Prerequisites #### Required modules. Modules are available in most GNU/Linux distributions, or from their respective websites. * [networkx](https://networkx.github.io/) * [matplotlib](https://matplotlib.org/) * [numpy](http://www.numpy.org/) * [Biopython](http://biopython.org/) * [xlsxwriter](http://xlsxwriter.readthedocs.io/) #### Required files You would need an obo formatted version of the Gene Ontology. Depending on your needs, this would usually be one of [go-basic.obo](http://purl.obolibrary.org/obo/go/go-basic.obo) or [go.obo](http://purl.obolibrary.org/obo/go.obo). For more details and to download either the most recent daily version or the latest version go to the [Gene Ontology website](http://geneontology.org/page/download-ontology). <!-- A program debias_prep.py has been provided in the package. This program builds the graphs for each of the ontologies and puts them in three different files. Hence the .obo files are not needed. This program has been provided so that if the hierarchy changes then this program can be used to regenerate the files. In addition to the three hierarchy graphs for the three ontologies it also generates the mapping for alternate GO_ID to actual GO_ID. It also generates the mapping from one GO_ID to all its ancestors. --> ### Installation Installing from source ``` git clone https://github.com/Rinoahu/debias cd debias python setup.py install ``` Installing with pip ``` pip install debias ``` OR ``` pip install git+git://github.com/Rinoahu/debias ``` ### Initial files These files will be created upon running `debias_prep`. `debias_prep -i data/GOFILE.obo` GOFILE will usually be one of `go.obo` or `go-basic.obo` This will generate seven files in total. Three files corresponds to the three ontologies. Three files corresponds to the mapping between each GO_term and its ancestors in its own respective ontology. The last file contains mapping from alternate GO_ID to actual GO_ID. Please use this command when a new go.obo file is released. ``` 1. ./data/alt_to_id.graph : Needed to obtain mapping from alternate GO_ID to actual GO_ID 2. ./data/mf.graph : The MFO Ontology graph 3. ./data/bp.graph : The BPO Ontology graph 4. ./data/cc.graph : The CCO Ontology graph 5. ./data/mf_ancestors.map : The MFO Ancestors map 6. ./data/bp_ancestors.map : The BPO Ancestors map 7. ./data/cc_ancestors.map : The CCO Ancestors map ``` ### Quick setup steps 1. Download the latest go.obo file from http://www.geneontology.org/ontology/ 2. Run the program `debias_prep` program and provide the downloaded .obo file. See the usage details below. This program needs to be run only when a new .obo file needs to be used. 3. Run the program `debias` ``` usage: debias [-h] [--prefix PREFIX] [--cutoff_prot CUTOFF_PROT] [--cutoff_attn CUTOFF_ATTN] [--output OUTPUT] [--evidence EVIDENCE [EVIDENCE ...] | --evidence_inverse EVIDENCE_INVERSE [EVIDENCE_INVERSE ...]] --input INPUT [INPUT ...] [--aspect ASPECT [ASPECT ...]] [--assigned_by ASSIGNED_BY [ASSIGNED_BY ...] | --assigned_by_inverse ASSIGNED_BY_INVERSE [ASSIGNED_BY_INVERSE ...]] [--recalculate RECALCULATE] [--info_threshold_Wyatt_Clark_percentile INFO_THRESHOLD_WYATT_CLARK_PERCENTILE | --info_threshold_Wyatt_Clark INFO_THRESHOLD_WYATT_CLARK] [--info_threshold_Phillip_Lord_percentile INFO_THRESHOLD_PHILLIP_LORD_PERCENTILE | --info_threshold_Phillip_Lord INFO_THRESHOLD_PHILLIP_LORD] [--verbose VERBOSE] [--date_before DATE_BEFORE] [--date_after DATE_AFTER] [--single_file SINGLE_FILE] [--select_references SELECT_REFERENCES [SELECT_REFERENCES ...] | --select_references_inverse SELECT_REFERENCES_INVERSE [SELECT_REFERENCES_INVERSE ...]] [--report REPORT] [-histogram HISTOGRAM] optional arguments: -h, --help show this help message and exit --prefix PREFIX, -pref PREFIX Add a prefix to the name of your output files. --cutoff_prot CUTOFF_PROT, -cprot CUTOFF_PROT The threshold level for deciding to eliminate annotations which come from references that annotate more than the given 'threshold' number of PROTEINS --cutoff_attn CUTOFF_ATTN, -cattn CUTOFF_ATTN The threshold level for deciding to eliminate annotations which come from references that annotate more than the given 'threshold' number of ANNOTATIONS --output OUTPUT, -odir OUTPUT Writes the final outputs to the directory in this path. --evidence EVIDENCE [EVIDENCE ...], -e EVIDENCE [EVIDENCE ...] Accepts Standard Evidence Codes outlined in http://geneontology.org/page/guide-go-evidence-codes. All 3 letter code for each standard evidence is acceptable. In addition to that EXPEC is accepted which will pull out all annotations which are made experimentally. COMPEC will extract all annotations which have been done computationally. Similarly, AUTHEC and CUREC are also accepted. Cannot be provided if -einv is provided --evidence_inverse EVIDENCE_INVERSE [EVIDENCE_INVERSE ...], -einv EVIDENCE_INVERSE [EVIDENCE_INVERSE ...] Leaves out the provided Evidence Codes. Cannot be provided if -e is provided --aspect ASPECT [ASPECT ...], -a ASPECT [ASPECT ...] Enter P, C or F for Biological Process, Cellular Component or Molecular Function respectively --assigned_by ASSIGNED_BY [ASSIGNED_BY ...], -assgn ASSIGNED_BY [ASSIGNED_BY ...] Choose only those annotations which have been annotated by the provided list of databases. Cannot be provided if -assgninv is provided --assigned_by_inverse ASSIGNED_BY_INVERSE [ASSIGNED_BY_INVERSE ...], -assgninv ASSIGNED_BY_INVERSE [ASSIGNED_BY_INVERSE ...] Choose only those annotations which have NOT been annotated by the provided list of databases. Cannot be provided if -assgn is provided --recalculate RECALCULATE, -recal RECALCULATE Set this to 1 if you wish to enforce the recalculation of the Information Accretion for every GO term. Calculation of the information accretion is time consuming. Therefore keep it to zero if you are performing rerun on old data. The program will then read the information accretion values from a file which it wrote to in the previous run of the program --info_threshold_Wyatt_Clark_percentile INFO_THRESHOLD_WYATT_CLARK_PERCENTILE, -WCTHRESHp INFO_THRESHOLD_WYATT_CLARK_PERCENTILE Provide the percentile p. All annotations having information content below p will be discarded --info_threshold_Wyatt_Clark INFO_THRESHOLD_WYATT_CLARK, -WCTHRESH INFO_THRESHOLD_WYATT_CLARK Provide a threshold value t. All annotations having information content below t will be discarded --info_threshold_Phillip_Lord_percentile INFO_THRESHOLD_PHILLIP_LORD_PERCENTILE, -PLTHRESHp INFO_THRESHOLD_PHILLIP_LORD_PERCENTILE Provide the percentile p. All annotations having information content below p will be discarded. So if 5 is provided, proteins annotated by terms whose score is in the top 5% will be left in, the rest will be discarded. --info_threshold_Phillip_Lord INFO_THRESHOLD_PHILLIP_LORD, -PLTHRESH INFO_THRESHOLD_PHILLIP_LORD Provide a value t. All annotations having information content below t will be discarded --verbose VERBOSE, -v VERBOSE Set this argument to 1 if you wish to view the outcome of each operation on the console --date_before DATE_BEFORE, -dbfr DATE_BEFORE The date entered here will be parsed by the parser from dateutil package. For more information on acceptable date formats please visit https://github.com/dateutil/dateutil/. All annotations made prior to this date will be picked up --date_after DATE_AFTER, -daftr DATE_AFTER The date entered here will be parsed by the parser from dateutil package. For more information on acceptable date formats please visit https://github.com/dateutil/dateutil/. All annotations made after this date will be picked up --single_file SINGLE_FILE, -single SINGLE_FILE Set to 1 in order to output the results of each individual species in a single file. --select_references SELECT_REFERENCES [SELECT_REFERENCES ...], -selref SELECT_REFERENCES [SELECT_REFERENCES ...] Provide the paths to files which contain references you wish to select. It is possible to include references in case you wish to select annotations made by a few references. This will prompt the program to interpret string which have the keywords 'GO_REF','PMID' and 'Reactome' as a GO reference. Strings which do not contain that keyword will be interpreted as a file path which the program will except to contain a list of GO references. The program will accept a mixture of GO_REF and file names. It is also possible to choose all references of a particular category and a handful of references from another. For example if you wish to choose all PMID references, just put PMID. The program will then select all PMID references. Currently the program can accept PMID, GO_REF and Reactome --select_references_inverse SELECT_REFERENCES_INVERSE [SELECT_REFERENCES_INVERSE ...], -selrefinv SELECT_REFERENCES_INVERSE [SELECT_REFERENCES_INVERSE ...] Works like -selref but does not select the references which have been provided as input --report REPORT, -r REPORT Provide the path where the report file will be stored. If you are providing a path please make sure your path ends with a '/'. Otherwise the program will assume the last string after the final '/' as the name of the report file. A single report file will be generated. Information for each species will be put into individual worksheets. --histogram HISTOGRAM, -hist HISTOGRAM Set this option to 1 if you wish to view the histogram of GO_TERM frequency before and after debiasing is performed with respect to cutoffs based on number of proteins or annotations. If you wish to save the file then please enter a filepath. If you are providing a path please make sure your path ends with a '/'. Otherwise the program will assume the last string after the final '/' as the name of the image file. Separate histograms will be generated for each species. Required arguments: --input INPUT [INPUT ...], -i INPUT [INPUT ...] The input file path. Please remember the name of the file must start with goa in front of it, with the name of the species following separated by an underscore ``` NOTE: The files inside the folder "temp" are the one which have been generated by executing the command below <br> ### Examples 1. `debias_prep -i data/go.obo` This command will generate seven files in total. Three files corresponds to the three ontologies. Three files corresponds to the mapping between each GO_term and its ancestors in its own respective ontology. The last file contains mapping from alternate GO_ID to actual GO_ID. Please use this command every time you update GOFILE. 2. `debias -cprot 100 -i data/goa_yeast.gaf data/goa_dicty.gaf -a C -WCTHRESHp 2 -recal 1` This command reads from two input files one for yeast and the other for dicty. The -a C only selects the annotations which are CCO. The -WCTHRESHp argument specifies that the Wyatt Clark Threshold is a 2 percentile, which means all annotations having a Wyatt Clark Information content below 2% will be removed. Instead of providing a percentage value one can also provide a threshold value using the argument -WCTHRESH. In addition to that, those annotations will be removed which have been annotated by references that have in turn annotated more than 100 **proteins**. The output will be put in the current directory. It is necessary to have -recal 1 in this command since the GO_term to IC has to be created. Subsequent runs with different threshold and all other parameters fised is possible **WITHOUT** providing the argument -recal. This command will lead to 3 output files. One each for the two organisms and the third one is where both the organisms are combined. 3. `debias -i data/goa_yeast.gaf data/goa_dicty.gaf -a C P -PLTHRESHp 30 -e EXPEC IBA -odir data/output -single 1` This command will read from two input files, select CCO and BPO annotations. Further, it will **choose** only those annotations which have been made experimentally or have been annotated computationally as "IBA" (Inferred from Biological aspect of Ancestor). In addition to that it will discard all annotations which have a Phillip Lord information content less than 30%. Instead of providing a percentage value one can also provide a threshold value using the argument -PLTHRESH. The final output will be put inside the data/output directory. You can include non existent paths. The program will attempt to create the folders if required permissions are present. This will lead to only one file, since the -single argument has been provided, which will contain all the selected annotations from both the organisms. 4. `debias -cattn 1000 -i data/goa_yeast.gaf data/goa_dicty.gaf -a C P -einv COMPEC -pref testing -selrefinv Reactome` This command will read from two input files, select CCO and BPO annotations. Further, it will **discard** those annotations which have been made computationally. The program further filters out all annotations made by "Reactome". All files will be prefixed with the string "testing". Since the program creates a meaningful name for each file, the user has been given the opportunity to give a prefix. ### Running test data To test all the commands mentioned above, you can run the shell script named test.sh in the tests directory. ``` git clone https://github.com/Rinoahu/debias cd ./debias/tests bash test.sh ```


نحوه نصب


نصب پکیج whl debias-0.165:

    pip install debias-0.165.whl


نصب پکیج tar.gz debias-0.165:

    pip install debias-0.165.tar.gz