BioTEX
=======
The biotex library offers resources to support text mining strategy using bioinformatics tool.
Stand alone tools based on library are available at link <https://sourceforge.net/projects/biotex-tools/>.
Installation
------------
To install aminocode through `pip`::
pip install biotex
Tested Platforms
----------------
- Python:
- 3.7.4
- Windows (64bits):
- 10
- Ubuntu (64bits)
- 18.04.1 LTS
Required external libraries
---------------------------
- numpy
- pandas
- scipy
- scikit-learn
- matplotlib
- unidecode
- biopython
- sweep
Functions
---------------
.. csv-table::
:header: Function Name, Description, Input, Output
:widths: auto
:stub-columns: 1
:delim: -
biotex.aminocode.encodeText biotex.aminocode.encodetext biotex.aminocode.et-Encodes a string with AMINOcode.-text: natural language text string to be encoded; detailing: details in coding. 'd' for details in digits. 'p' for details on the punctuation. 'dp' or 'pd' for both.-encode text in string format.
biotex.aminocode.decodeText biotex.aminocode.decodetext biotex.aminocode.dt-Decodes a string with reverse AMINOcode.-text: text string encoded using the encodefile function to be decode; detailing: details used in the text to be decoded. 'd' for details in digits. 'p' for details on the punctuation. 'dp' or 'pd' for both.-decode text in string format.
biotex.aminocode.encodeFile biotex.aminocode.encodefile biotex.aminocode.ef-Encodes a text file or a list of strings with AMINOcode.-input_file_name: text file name or list of string. It can also be used in a list of SeqRecord, in which case the function will automatically extract the headers to do the encoding; output_file_name: the name for the output file. If not defined, the result will only be returned as a variable; detailing: same as in the encodetext function; header_format: format for the headers of the generated FASTA. It can be 'number+originaltext', 'number' or 'originaltext'. 'number' is a count of the lines in the input file. Blank lines are considered in the count, but are not added to the FASTA file. 'originaltext' is the input text itself; verbose: if True displays progress.-list of SeqRecord*; If defined output_file_name a file will be saved.
biotex.aminocode.decodeFile biotex.aminocode.decodefile biotex.aminocode.df-Decodes a fasta file or a list of SeqRecord with the reverse amino acid.-input_file_name: file name or list of SeqRecord; output_file_name: the name for the output file. If not defined, the result will only be returned as a variable; detailing: same as in the decodetext function; verbose: if True displays progress; output: string list. If defined output_file_name a file will be saved.-string list; if defined output_file_name a file will be saved.
biotex. dnabits.encodeText biotex.dnabits.encodetext biotex. dnabits.et-Encodes a string with DNAbits.-text: natural language text string to be encoded.-encode text in string format.
biotex.dnabits.decodeText biotex.dnabits.decodetext biotex.dnabits.dt-Decodes a string with reverse DNAbits.-text: text string encoded using the encodefile function to be decode.-decode text in string format.
biotex.dnabits.encodeFile biotex.dnabits.encodefile biotex.dnabits.ef-Encodes a text file or a list of strings with DNAbits.-input_file_name: text file name or list of string. It can also be used in a list of SeqRecord, in which case the function will automatically extract the headers to do the encoding; output_file_name: the name for the output file. If not defined, the result will only be returned as a variable; header_format: format for the headers of the generated FASTA. It can be 'number+originaltext', 'number' or 'originaltext'. 'number' is a count of the lines in the input file. Blank lines are considered in the count, but are not added to the FASTA file. 'originaltext' is the input text itself; verbose: if True displays progress.-list of SeqRecord. if defined output_file_name a file will be saved.
biotex.dnabits.decodeFile biotex.dnabits.decodefile biotex.dnabits.df-Decodes a text file or a list of SeqRecord with reverse DNAbits.-input_file_name: file name or list of SeqRecord; output_file_name: the name for the output file. If not defined, the result will only be returned as a variable; verbose: if True displays progress.-string list; if defined output_file_name a file will be saved.
biotex.fastatools.list2SeqRecord biotex.fastatools.list2seqrecord biotex.fastatools.list2bioSeqRecord biotex.fastatools.list2bioseqrecord biotex.fastatools.list2fasta-Converts a list of strings to a list of SeqRecord, a biopython object that holds Biological sequences and information about it.-seq: list of biological sequences in string format; header: list of headers in string format, if set to "None" the headers will be automatically defined with an increasing number.-list of SeqRecord.
biotex.fastatools.fastaRead biotex.fastatools.fastaread-Uses biopython to import a FASTA file.-input_file_name: input fasta file name.-list of SeqRecord.
biotex.fastatools.fastaWrite biotex.fastatools.fastawrite-Create a file using a list of SeqRecord.-records: list of SeqRecord; output_file_name: output fasta file name.-a file is saved with the defined name.
biotex.fastatools.getHeader biotex.fastatools.getheader-Extracts the header from a list of SeqRecord.-records: list of SeqRecord.-list with headers.
biotex.fastatools.getSeq biotex.fastatools.getseq-Extracts the string from a list of SeqRecord.-records: list of SeqRecord.-list with sequences.
biotex.fastatools.removePattern biotex.fastatools.removepattern-Removes patterns from a SeqRecord range based on regular expression.-records: list of SeqRecord; rex: regular expression.-list of SeqRecord with removal applied.
biotex.fastatools.clustalOmega biotex.fastatools.clustalomega biotex.fastatools.clustalo-Uses the Clustal Omega to align the strings in a FASTA file.-input_file_name: input FASTA file name.-list with strings aligned in string format.
biotex.fastatools.getCons biotex.fastatools.getcons-Save a temporary file with the sequences from the list of SeqRecord, apply the clustalo function and obtain alignment consensus.-records: SeqRecord.list.-consensus for alignment in string format.
\*SeqRecord: Biopython object to store biological sequences and its information, as described in <https://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html>