معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

A bioinformatics package to estimate the depth-complexity ratio and amplification noisiness of a sequencing experiment.

ویژگی	مقدار
سیستم عامل	-
نام فایل	decoratio-1.0.5
نام	decoratio
نسخه کتابخانه	1.0.5
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	-
ایمیل نویسنده	"Nicolas C. Rochette" <nic.rochette@gmail.com>, "Angel G. Rivera-Colon" <angelgr2@illinois.edu>, "Julian M. Catchen" <jcatchen@illinois.edu>
آدرس صفحه اصلی	-
آدرس اینترنتی	https://pypi.org/project/decoratio/
مجوز	-

NAME ========== **decoratio** — estimate the DEpth-COmplexity RATIO and amplification noisiness of a sequencing experiment based on the PCR clone size histogram. INSTALLATION ========== ```sh python3 -m pip decoratio ``` Decoratio depends on `numpy`, `scipy`, and `matplotlib`. SYNOPSIS ========== ```sh python3 -m decoratio [--model=MODEL] [...] CLONE_SIZE_HISTOGRAM.tsv ``` AUTHORS & CITATION ========== Nicolas C. Rochette, Angel G. Rivera-Colón, Julian M. Catchen If you use Decoratio, please cite: Rochette NC, Rivera-Colón AG, Walsh J, Sanger TJ, Campbell-Staton SC, Catchen JM (2022). On the causes, consequences, and avoidance of PCR duplicates: towards a theory of library complexity. bioRxiv. https://doi.org/10.1101/2022.10.10.511638 DESCRIPTION ========== PCR duplicates become prevalent in sequencing experiments when the sequencing effort exceeds the number of molecules the library was originally prepared from. The precise duplicates rate that is observed depends primarily on the ratio between the sequencing depth and the complexity of the library, as well as secondarily on the variance of the amplification factor among the original template molecules the library. This program leverages PCR duplicate patterns (namely, the relative occurrence of duplication clones of various sizes) in a sequencing dataset to jointly estimate (1) the depth-complexity ratio, and (2) a variance model for the PCR amplification of the library. As the sequencing depth should be known to the experimenter, the depth-complexity ratio is a library complexity estimate. This, together with the amplification variance model, informs on the quality of the library, allowing to identify failed libraries and/or to compare the performance of a set of library preparation protocols. Input TSV histogram ---------- The main input for the program is a TSV table giving the number of duplication clones of each size that were observed. A duplication clone is a set of individual reads that are all descend from the same template, and are PCR duplicates of one another. Thus, clones of size 1 correspond to singleton reads that do not have duplicates; clones of size 2 to pairs of duplicate reads; clones of size 3 to read triplets; etc. The input file should look like this: ```txt clone_size n_clones 1 9746142 2 7320751 3 4969106 4 3023127 5 1693158 6 893344 7 456659 ... # (Empty size classes may be omitted.) ``` For instructions on how to obtain this histogram from sequencing data (e.g., BAM alignment files), see the EXAMPLES section below. Options ---------- `--model` : an amplification model specifier such as "logskewnormal" (default), "lognormal", or "inheffbeta:12"; see AMPLIFICATION MODELS below `--output-prefix` : defaults to `CLONE_SIZE_HISTOGRAM.tsv.decoratio.*` `--ratio-min`, `--ratio-max` : constrains the optimization range for the depth-complexity ratio `--no-plots`, `--tsv-out`, `--log-level` : miscelaneous output control Amplification models ---------- Default settings should work well in most cases. Three models are currently implemented for the amplification: log-skew-normal (`logskewnormal`, default), log-normal (`lognormal`), and the empirical inherited efficiency model (`inheff` and `inheffbeta` for the original and beta distribution-based versions respectively). All models may be specified without parameters, in which case their parameters will be jointly optimized with the depth-complexity ratio, or with user-specified parameters, in which case only the depth-complexity ratio will be optimized. #### Log-skew-normal and log-normal models The PCR duplicate rate is only sensitive to the variance in relative amplification factors, and not to the mean amplification factor. As a result, the log-normal model has only one parameter (a standard deviation) and log-skew-normal two (a standard deviation and a skew). Thus, for these models the possible values take the form: ```txt --model=lognormal --model=lognormal:0.54 --model=logskewnormal --model=logskewnormal:0.72:-3.4 ``` All numerical values are given in units of the natural logarithm (this is important as e.g., an amplification factor log-distribution with a standard deviation of 0.54 in base *e* has a standard deviation of 0.78 in base 2). #### Inherited efficiency models This is the empirical model of (Best et al., 2015). We provide two closely related versions, the normal distribution-based model and a stabilized model based on the beta distribution, which we recommend. A number of PCR cycles (e.g., 12) must always be provided. The models then depend on two additional parameters, the mean efficiency of the PCR process and its standard deviation across clones. Thus, possible values take the form: ```txt --model=inheff:12 --model=inheffbeta:12 --model=inheffbeta:12:0.76:0.14 ``` Computations based on this model are substantially slower. The default is to perform binned simulations, as this is more computationally efficient, and the default effort is to perform 1000 simulations for each of 1000 efficiency bins. This should work well, but can be changed with `--inheff-n-efficiency-bins` and `--inheff-n-sims-per-bin`. Binning during simulations can be deactivated with `--inheff-legacy-sims`. If optimizing the parameters, the default is to optimize in a one-dimentional space, but `--inheff-uncouple-beta-parameters` will cause the two parameters to be optimized independently (for the beta distribution version only). This may impact the precise shape of the amplification factor distribution. EXAMPLES ========== ```sh curl -O https://bitbucket.org/rochette/decoratio/raw/HEAD/examples/brown_alone.clone_sizes.tsv # Using the default logskewnormal amplification model: python3 -m decoratio ./brown_alone.clone_sizes.tsv # Using the lognormal amplification model (this is faster): python3 -m decoratio --model=lognormal ./brown_alone.clone_sizes.tsv # Using a user-specified amplification model: python3 -m decoratio --model=logskewnormal:0.72:-3.4 ./brown_alone.clone_sizes.tsv ``` Sample output: ```txt decoratio-1.0.0 (python-3.8, scipy-1.5.2) Called as: decoratio ./brown_alone.clone_sizes.tsv Outputing to: ./brown_alone.clone_sizes.tsv.decoratio.logskewnormal.* Reading input file... Input data shows 60.8% PCR duplicates. Optimizing the depth-complexity ratio & amplification noise... Optimized: depth_complexity_ratio = 1.93 model = logskewnormal:0.719:-3.4 Model fit was 99.1%; the predicted duplicate was rate 60.7% and the saturation 75.8%; the amplification model's noisiness was 0.719 and its skewness -0.72. Writing output files... ./brown_alone.clone_sizes.tsv.decoratio.logskewnormal.png ./brown_alone.clone_sizes.tsv.decoratio.logskewnormal.ampl_factors.png Done. ``` Obtaining the input file ---------- ### Using **gstacks** For RAD-seq datasets analyzed using STACKS, the clone sizes histogram is present in one of the output files by default: ```sh stacks-dist-extract ./gstacks.log.distribs pcr_clone_size_distrib > ./clone_sizes.tsv python3 -m decoratio ./clone_sizes.tsv # The program can actually be run directly on the distribs file: python3 -m decoratio ./gstacks.log.distribs ``` *n.b.*, for datasets that comprise several libraries, for PCR duplicates and library complexity purposes gstacks should be run separately on each library/set of samples. ### Using **samtools-markdup** ```sh prefix=./my_sample samtools view -b -f0x3 -F0x904 $prefix.bam | samtools collate -O - | samtools fixmate -m - - | samtools sort | samtools markdup -t - $prefix.markdup.bam # We then extract the `do:` (presumably "duplicate of") tags in the SAM records. samtools view -f0x40 $prefix.markdup.bam | sed -E 's/.*\tdo:Z:([^\t]+).*/\1/; s/\t.*//' | gzip > $prefix.clones.txt.gz gzip -cd $prefix.clones.txt.gz | sort | uniq -c | awk '{print $1}' | sort -n | uniq -c | awk 'BEGIN{print "clone_size\tn_clones"} {print $2 "\t" $1}' \ > $prefix.clone_sizes.tsv python3 -m decoratio $prefix.clone_sizes.tsv ``` ### Using **GATK/Picard** ```sh prefix=./my_sample picard=/path/to/picard.jar # Filter for properly aligned read pairs. samtools view -b -f0x3 -F0x904 $prefix.bam > $prefix.paired.bam # Sort by read name. java -jar $picard \ SortSam \ --SORT_ORDER queryname \ -I $prefix.paired.bam -O $prefix.sortsam.bam \ &> $prefix.sortsam.log # Annotate duplicates. java -jar $picard \ MarkDuplicates \ --TAG_DUPLICATE_SET_MEMBERS --READ_NAME_REGEX null \ -I $prefix.sortsam.bam -O $prefix.markdups.bam -M $prefix.markdups.txt \ &> $prefix.markdups.log # Then extract the `DS:` (duplicate set size, i.e., clone size) tags in the SAM records. samtools view -f0x40 -F0x400 $prefix.markdups.bam | sed -E '/\tDS:/!s/.*/1/; s/.*\tDS:i:([0-9]+).*/\1/' | sort -n | uniq -c | awk 'BEGIN{print "clone_size\tn_clones"} {print $2 "\t" $1}' \ > $prefix.clone_sizes.tsv python3 -m decoratio $prefix.clone_sizes.tsv ``` ### Using **UMI-tools** Handling of duplicates with UMI-tools is appropriate for experiments that make use of unique molecular identifers (e.g., single-cell RNA-seq datasets). Users should use the `umi_tools group` command, exporting the information about the size of "read groups" (i.e., PCR duplication clones) to a text file with the `--group-out` option.

نیازمندی

مقدار	نام
-	numpy
-	scipy
-	matplotlib

زبان مورد نیاز

مقدار	نام
>=3.5	Python

نحوه نصب

نصب پکیج whl decoratio-1.0.5:

pip install decoratio-1.0.5.whl

نصب پکیج tar.gz decoratio-1.0.5:

pip install decoratio-1.0.5.tar.gz