معرفی شرکت ها


eclip-peak-1.0.9


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Pipeline for using IDR to identify a set of reproducible peaks given eClIP dataset with two or three replicates.
ویژگی مقدار
سیستم عامل -
نام فایل eclip-peak-1.0.9
نام eclip-peak
نسخه کتابخانه 1.0.9
نگهدارنده []
ایمیل نگهدارنده []
نویسنده FEI YUAN
ایمیل نویسنده fei.yuan@bcm.edu
آدرس صفحه اصلی https://github.com/VanNostrandLab/peak
آدرس اینترنتی https://pypi.org/project/eclip-peak/
مجوز MIT
# eCLIP-Peak Pipeline for using IDR to identify a set of reproducible peaks given eClIP dataset with two or three replicates. ## Installation - For Van Nostrand Lab The pipeline has already been installed. Activate its environment by issue the following command: `source /storage/vannostrand/software/eclip/venv/environment.sh`. - For all others: - Install Python (3.6+) - Install peak (`pip install eclip-peak`) - Install [IDR](https://github.com/nboley/idr) (2.0.3+) - Install Perl (5.10.1+) with the following packages: - Statistics::Basic (`cpanm install Statistics::Basic`) - Statistics::Distributions (`cpanm install Statistics::Distributions`) - install Statistics::R (`cpanm install Statistics::R`) ## Usage - For Van Nostrand Lab After activate peak's environment call` peak -h` to see the detailed usage. - For all others: After successfully installed Python, peak, Perl (with required packages), call `peak -h` inside your terminal to see the following detailed usage: ```shell $ peak -h usage: peak [-h] [--ip_bams IP_BAMS [IP_BAMS ...]] [--input_bams INPUT_BAMS [INPUT_BAMS ...]] [--peak_beds PEAK_BEDS [PEAK_BEDS ...]] [--read_type READ_TYPE] [--outdir OUTDIR] [--species SPECIES] [--l2fc L2FC] [--l10p L10P] [--idr IDR] [--dry_run] [--cores] [--debug] Pipeline for using IDR to identify a set of reproducible peaks given eClIP dataset with two or three replicates. optional arguments: -h, --help show this help message and exit --ip_bams IP_BAMS [IP_BAMS ...] Space separated IP bam files (at least 2 files). --input_bams INPUT_BAMS [INPUT_BAMS ...] Space separated INPUT bam files (at least 2 files). --peak_beds PEAK_BEDS [PEAK_BEDS ...] Space separated peak bed files (at least 2 files). --ids IDS [IDS ...] Optional space separated short IDs (e.g., S1, S2, S3) for datasets. --read_type READ_TYPE Read type of eCLIP experiment, either SE or PE. --outdir OUTDIR Path to output directory. --species SPECIES Short code for species, e.g., hg19, mm10. --l2fc L2FC Only consider peaks at or above this l2fc cutoff, default: 3. --l10p L10P Only consider peaks at or above this l10p cutoff, default: 3. --idr IDR Only consider peaks at or above this idr score cutoff, default: 0.01. --cores CORES Maximum number of CPU cores for parallel processing, default: 1. --dry_run Print out steps and inputs/outputs of each step without actually running the pipeline. --debug Invoke debug mode (only for develop purpose). ``` ## Outline of workflow - Normalize CLIP IP BAM over INPUT for each replicate - Peak compression/merging on input-normalized peaks for each replicate - Entropy calculation on IP and INPUT read probabilities within each peak for each replicate - Run IDR on peaks ranked by entropy - Normalize IP BAM over INPUT using new IDR peak regions - Identify reproducible peaks within IDR regions ## Examples - eCLIP with 2 replicates Assuming we have eCLIP pipeline run successfully and have the following files generated for species `hg19`: ``` replicate 1: IP BAM: ip1.bam INPUT BAM: input1.bam Peak BED: clip1.peak.clusters.bed replicate 2: IP BAM: ip2.bam INPUT BAM: input2.bam Peak BED: clip2.peak.clusters.bed ``` The pipeline then can be called like this to identify reproducible peaks: ```shell peak \ --ip_bams ip1.bam ip2.bam \ --input_bams input1.bam input2.bam \ --peak_beds clip1.peak.clusters.bed clip2.peak.clusters.bed \ --species hg19 ``` - eCLIP with 3 replicates Assuming we have eCLIP pipeline run successfully and have the following files generated for species `hg19`: ``` replicate 1: IP BAM: ip1.bam INPUT BAM: input1.bam Peak BED: clip1.peak.clusters.bed replicate 2: IP BAM: ip2.bam INPUT BAM: input2.bam Peak BED: clip2.peak.clusters.bed replicate 3: IP BAM: ip3.bam INPUT BAM: input3.bam Peak BED: clip3.peak.clusters.bed ``` The pipeline then can be called like this to identify reproducible peaks: ```shell peak \ --ip_bams ip1.bam ip2.bam ip3.bam \ --input_bams input1.bam input2.bam input3.bam \ --peak_beds clip1.peak.clusters.bed clip2.peak.clusters.bed clip3.peak.clusters.bed \ --species hg19 ``` Note: - The indentation of the command does not matter, you can write it on the same line. - The order of bam and peak files followed by `--ip_bams`, `input_bams`, and `peak_beds` DOES matter, make sure you pass them in a consistent order for these three parameters. - There are 3 cutoffs can be set for fine tune the peak filtering, see Usage part for more details. - If the pipeline failed, check the log to identify the error and make necessary changes, re-run the pipeline will skip successfully processed parts only continue to processed failed and unprocessed parts. ## Output The peak pipeline will output 5 different types of files into the current work directory or into a user specified output directory (via `--outdir`): 1. *.bed: either a 6 columns or 9 columns bed file saves information for peaks. 2. *.tsv: TSV separated text file saves more information in addition to the BED file. 3. *.txt: text file saves the mapped reads count 4. *.out: TAB separated text file generated by IDR. 5. *.png: plot generated by IDR. All filenames of output files are self-explained, only the basename of peak bed files ( after the removal of .peak.clusters.bed) was used to mark the name of each replicate. The reproducible peaks can be found in *.reproducible.peaks.bed and additional information can be found in *.reproducible.peaks.custom.tsv. While the former file is 6-column bed file, the later one is a TSV separated text file with the following columns in order: - IDR region (entire IDR identified reproducible region) - Peak (reproducible peak region) - Geomean of the l2fc - Columns of log2 fold change (2 or 3 columns for 2 or 3 replicates experiment, respectively) - Columns of -log10 p-value (2 or 3 columns for 2 or 3 replicates experiment, respectively)


نیازمندی

مقدار نام
>=1.0 cmder
>=1.2.3 pandas
>=5.3.0 inflect
>=0.0.3 seqflow


نحوه نصب


نصب پکیج whl eclip-peak-1.0.9:

    pip install eclip-peak-1.0.9.whl


نصب پکیج tar.gz eclip-peak-1.0.9:

    pip install eclip-peak-1.0.9.tar.gz