معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

A small program to mask positions that could result in a PMD artifact from a BAM file using a predefined SNP catalog

ویژگی	مقدار
سیستم عامل	-
نام فایل	bamrefine-0.1.0
نام	bamrefine
نسخه کتابخانه	0.1.0
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	-
ایمیل نویسنده	Etka Yapar <etka.yapar@gmail.com>
آدرس صفحه اصلی	-
آدرس اینترنتی	https://pypi.org/project/bamrefine/
مجوز	-

# bamrefine [![PyPI version](https://badge.fury.io/py/bamrefine.svg)](https://badge.fury.io/py/bamrefine) This is a BAM filtering/masking program that masks the positions from an alignment if that read carries a variant position that could be affected by post-mortem damage (PMD). e.g. If a read carries a C/\* position at its 5' end or it carries a G/\* position at its 3' end. The regions treated as 5' or 3' "end" is not pre-defined and should be provided to the program with its argument (see `-l` or `--pmd-length-threshold` parameter) It is designed to be an alternative to hard clipping first N bases of BAM reads with the suspicion of those regions containing non-authentic (post mortem damage artifact) SNPs, or ignoring transition SNPs alltogether. Thus, avoiding a substantial amount of data loss that is encountered when using mainstream pipelines. You can install `bamrefine` from PyPI using `pip`. Upon installation of the package, the command `bamrefine` should be available to you. ## Masking strategy Consider the following example. ![](https://github.com/etkayapar/bamRefine/blob/master/figs/healthyAlignment_scaled.jpg?raw=true) above, in the left, you see the table of a theoretical SNP catalog with the variant positions and minor and major alleles observed in a population. In the right,you see a theoretical alignment of three paired end reads to the reference (orange) region covering all the variants listed on the left. Some reads match the reference, others carry variants (green). Below, a more realistic example for ancient DNA reads, in addition to the expected biological variation among the aligned reads, there is also unexpedted variants resulting from DNA damage at the ends of fragmented ancient DNA. ![](https://github.com/etkayapar/bamRefine/blob/master/figs/damagedAlignment_scaled.jpg?raw=true) At the 5' ends, these damages will occur as C-\>T changes, and at the 3' ends they will appear as G-\>A changes. ![](https://github.com/etkayapar/bamRefine/blob/master/figs/PMD_smiley.jpg?raw=true) Therefore, a C/\* variant in the table becomes a possible PMD source if the read overlaps with it at the reads 5' end, and this is also the case for a G/\* variant if a read overlaps with it at the 3' end. Bamrefine, first characterizes the input SNP collection into two tables as 5' end suspects and 3' end suspects: ![](https://github.com/etkayapar/bamRefine/blob/master/figs/snpTables.jpg?raw=true) and then iterates over the reads in the BAM file searching for overlaps with the two suspect tables at the appropriate ends of the reads (overlaps with the 5' end of the read and the 5' table variants, 3' end of the reads and the 3' table variants). Then if a read carries a suspected/dangerous variant at the ends, the program masks those specific position in the read so that they won't interfere with the SNP calling for the downstream steps: ![](https://github.com/etkayapar/bamRefine/blob/master/figs/maskingExample.jpg?raw=true) As you can see at the right hand side example of the above picture, there is still one unexpected 'A' (in red), it stayed unmasked even though it is within the 3' end lookup range (indicated with double-headed arrow). This is because, - there is no entry in the SNP collection at that position of the genome. Meaning, since there won't be any SNP calling concerning that specific locus, there is no need to bother with unexpected variants like this. - `bamrefine` does not check the specific allele that a read carries, rather does the masking based on the alignment overlap of the reads ends and the "dangerous" positions in the SNP catalog. See the 5' end maskings at the left part of above example: Even though the read carries 'C' bases (not different from the reference), those 'C's are still masked. This therefore means: - reads that do not overlap with the problematic variant positions at their ends will still provide information regarding the allele pool - you disregard unreliable information from the alignments while preserving anything else. i.e. If there is more than one variant spanning a single read and one is a possible PMD source, by selectively masking the problematic position, you still have the reliable information coming from the other position. ## Usage ```bamrefine [parameters] [flags] <in.bam> <out.bam>``` ### Parameters: * `-s, --snps`: SNP collection file. This can be either 4/5 column BED (genomic position [with or without the `start` position column] + Minor and Major allele) or SNP (that is used by EIGENSTRAT) formatted files. Format distinction is done by checking the file extension so it is important that the input file follows the format of the extension it has. See the `sample_data` data directory in the installation path for examples of each of the three supported file types. You can find the installation directory by running `pip3 show bamrefine` or `pip show bamrefine`. * `-p, --threads`: Threads to run the program in parallel. * `-l, --pmd-length-threshold`: Either a single integer (e.g `-l 10`) or two integers separated by a comma (e.g `-l 2,0`) representing different length values corresponding 5' and 3' ends , respectively. Positions that are up to and including this far in any read will be evaluated and masked. Also, it is recommended that you have PMD related statistics (e.g. up to which position there is a high risk of seeing a PMD artifact) regarding your libraries before you run this program on the BAM files because the `--pmd-length-threshold` parameter requires an user specified value for the program to run, there is no default value. This can be achieved by using [PMDtools](https://github.com/pontussk/PMDtools) ### Flags: * `-v, --verbose`: verbose output of progress. * `-t, --add-tags`: Add tags to reads in output BAM file related to masking statistics using optional SAM fields in alignment records. e.g. `ZC:Z:2,1` and `ZP:Z:0,5;-3` would mean that the program masked n=2 5' and n=1 3' positions and they were at index 0,5 and -3 in the read sequence. 3' masking positions are represented with negative indices that start counting from the 3' end of the read and is compatible with python list indices. * `-k, --keep-tmp`: Don't remove the temprorary run directory. It will include intermediate BAM files, cached SNPs, etc. this directory is under the same directory with specified ouptut BAM, named as `.YYYY-MM-DD_HH-MM-SS_<out.bam>_tmp_bamrefine`

نیازمندی

مقدار	نام
~=0.20.0	pysam

زبان مورد نیاز

مقدار	نام
>=3.6	Python

نحوه نصب

نصب پکیج whl bamrefine-0.1.0:

pip install bamrefine-0.1.0.whl

نصب پکیج tar.gz bamrefine-0.1.0:

pip install bamrefine-0.1.0.tar.gz