معرفی شرکت ها


fitseq-1.5.0


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

A utility for fitting lineage fitnesses within a pooled competition experiment, achieved by iteratively optimizing models of individual and population-average lineage fitness.
ویژگی مقدار
سیستم عامل -
نام فایل fitseq-1.5.0
نام fitseq
نسخه کتابخانه 1.5.0
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Fangfei Li, Darach Miller
ایمیل نویسنده -
آدرس صفحه اصلی https://github.com/darachm/fitseq
آدرس اینترنتی https://pypi.org/project/fitseq/
مجوز MIT
[![Python 3.9](https://img.shields.io/badge/python-3.9-green.svg)](https://www.python.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) # fitseq Accurate pooled competition assays requires accounting for the changing population-average fitness in order toe best estimate the fitness of lineages within the pool. This utility does both by iterating between optimizations of per-lineage fitness given the average, and calculating the new average fitness. The core concept and architecture is written by FangFei Li (@FangFei05). This fork is just tweaking the interface and metrics for my personal use, but if anyone else wants to use it I can offer limited help. On a recent datasets of five timepoints for ~3.5 million lineages, `fitseq` was finished within 4.5 hours (wall), using 20 cores and 4GB of RAM. If you use this software, please reference: [F. Li, et al. Unbiased Fitness Estimation of Pooled Barcode or Amplicon Sequencing Studies. Cell Systems, 7: 521-525 (2018)](https://doi.org/10.1016/j.cels.2018.09.004) # Installation ## With pip You can install from this git repo directly as: python3 -m pip install git+https://github.com/darachm/fitseq.git <!-- or from PyPi with: WELL NOT QUITE YET BUT SOON python3 -m pip install fitseq --> Install the latest development branch with something like: python3 -m pip install git+https://github.com/darachm/fitseq.git@dev Test installation with: fitseq.py -h ## Or don't install, use a container ### Docker This repository has a `Dockerfile` to build off the main branch of this repo, like so docker build -t darachm/fitseq ./ Or you should be able to pull it off of Dockerhub like: docker run darachm/fitseq:latest fitseq.py -h You can then use that, with a Docker installation like so: docker run \ --mount type=bind,source=$(pwd)/testing,target=/testing \ darachm/fitseq \ fitseq.py \ -i testing/data/ppiseq_test_counts_1000.csv \ -p 8 -t 0 1 2 3 4 \ -m 20 --min-step 0.001 \ --output-mean-fitness testing/output/test_means.csv \ -o testing/output/test_out.csv Note that you need to `--mount` the directory with the inputs in a folder, and use that in the command. Directories on containers... yeah. I think Singularity is more intuitive/accessible for most folks... ### Singularity On a multi-user HPC? Want to get an achive-ready monolithic stable container? [Singularity](https://sylabs.io/guides/3.8/user-guide/quick_start.html#quick-installation-steps) is a container system for scientific multi-user HPC computing and archiving. You can build your own container from the Singularity file in this repo using a command like: sudo singularity build fitseq.sif Singularity.fitseq ( This is just turning the docker image into a Singularity image. Just so you know. ) Then put it where you need it, and run with something like: singularity exec fitseq.sif \ fitseq.py \ -i testing/data/ppiseq_test_counts_1000.csv \ -p 8 -t 0 1 2 3 4 \ -m 20 --min-step 0.001 \ --output-mean-fitness testing/output/test_means.csv \ -o testing/output/test_out.csv <!-- or, download from the github container registry like so TODO gotta figure this out --> # Usage The `fitseq.py` script functions to estimate fitnesses of lineages in a pool. There is also a script `evo_simulator.py` that can perform simulations of competitive pooled growth of lineages, in order to generate synthetic data for benchmarking. ## `fitseq.py` - estimate fitnesses from counts data This tool expects a comma-separated table (CSV) of your best estimate of lineage counts of the lineage, with one column per timepoint. Each lineage is a row, and outputs are in the same order as the input. For an example using data distributed in this repo, try: python3 fitseq.py \ --input testing/data/ppiseq_test_counts_1000.csv \ --processes 8 \ --t-seq 0 1 2 3 4 \ --min-iter 10 \ --max-iter-num 100 \ --min-step 0.001 \ --output-mean-fitness test_output_means.csv \ -o test_output.csv This reads an input file at `testing/data/ppiseq_test_counts_1000.csv`, and uses 8 processes. It assumes each sample is 1 "generation" of growth. It does a mandatory 10 iterations of burn-in to stabilize the estimates, then proceeds until the sum of negative log likelihood doesn't improve by at least 0.1% over the previous step, at 100 iterations max. Then it writes the mean fitness values to that CSV, and the rest to `test_output.csv`. ### File formats #### Input file format This tool expects a comma-separated table (CSV) of your best estimate of lineage counts of the lineage, with one column per timepoint. Each lineage is a row, and outputs are in the same order as the input. Something like: 21,7,2,0,0 35,71,34,38,12 5,9,1,0,0 3,8,4,3,1 12,10,11,1,0 #### Output file format There are two outputs generated, the first is the per-lineage (per input row) fit parameters, an estimate of error of the optimization process[^eerror], and the model-projected psuedo-count expectations for each timepoint. For example, *but rounded to 3 decimal places for tidy-ness*: Estimated_Fitness,Estimation_Error,Likelihood_Log,Estimated_Read_Number_t0,Estimated_Read_Number_t1,Estimated_Read_Number_t2,Estimated_Read_Number_t3,Estimated_Read_Number_t4 -1.529,0.517,4.998,21.0,6.608,2.148,0.298,0.004 -0.274,0.189,21.556,35.0,38.661,76.501,17.801,5.471 -0.728,0.316,10.277,5.0,3.504,6.152,0.332,0.009 -0.194,0.252,10.379,3.0,3.588,9.333,2.267,0.467 -0.596,0.214,7.849,12.0,9.602,7.805,4.172,0.104 -2.942,1.591,2.233,5.0,0.383,0.007,0.003,0.000 The headers are a bit long, I suppose. But they're informative...? [^eerror]: The "Estimation_Error" is probably not what you're expecting. It's the second derivative of the change in sum-negative-log-likelihood of this lineage's optimization. So that might not be what you're wanting to use to filter lineages. You can however use the "Estimated" counts to calculate an R^2, have fun. There is also the optional (and strongly suggested!) output file of the mean fitness per timepoint, given as a CSV format with headers, such as: Samples,Estimate_Mean_Fitness 0,0.0 1,0.3833658804427269 2,0.9907541206263276 3,1.0394387021430962 4,1.0542176653660411 ### Options, from `fitseq.py -h` #### input * `-i` INPUT, `--input` INPUT The path to a header`-less` CSV file, where each column contains the count of each lineage (each row is a lineage) at that sample/timepoint. REQUIRED * `--t-seq` [T_SEQ [T_SEQ ...]], `-t` [T_SEQ [T_SEQ ...]] The estimated "generations" of growth elapse at each sampled timepoint. This is useful for scaling the fitness or using unevenly spaced timepoints. REQUIRED #### output * `-o` OUTPUT, `--output` OUTPUT The path (default STDOUT) from which to output the fitnesses and errors and likelihoods and estimated reads. CSV format. (default: STDOUT, so that you can pipe it into other tools) * `--output-mean-fitness` OUTPUT_MEAN_FITNESS, `-om` OUTPUT_MEAN_FITNESS The path (default None) to which to write the mean fitnessescalculated per sample. #### parallelism * `-p` PROCESSES, `--processes` PROCESSES Number of processes to launch with multiprocessing * `--max-chunk-size` MAX_CHUNK_SIZE The max chunksize for parallelism, automatically set to a roughly even split of lineages per chunk. Tune if you want to. #### optimization stopping control * `--min-iter` MIN_ITER Force FitSeq to run at least this many iterations in the optimization (default: 10) * `--max-iter-num` MAX_ITER_NUM, `-m` MAX_ITER_NUM Maximum number of iterations in the optimization (of optimizing population average fitness) (default: 100) * `--minimum-step-size` MINIMUM_STEP_SIZE, `--min-step` MINIMUM_STEP_SIZE Set a minimum fracitonal step size for improvement, if below this then the optimization iterations terminate. (default: 0.0001) #### tuning details * `--fitness-type` {m,w}, `-f` {m,w} SORRY but **Wrightian fitness does not yet work in this version**, so just don't set the `--fitness_type`, or set to `m`. Sorry. Maybe we'll re-implement Wrightian fitness (w). Maybe. * `-k` KAPPA, `--kappa` KAPPA a noise parameter that characterizes the total noise introduced. For estimation, see doi:10.1038/nature14279 (default: 2.5) * `--gtol` GTOL The gradient tolerance parameter for the BFGS opitmization, default (from SciPy) is 1e-5 * `-g` REGRESSION_NUM, `--regression-num` REGRESSION_NUM number of points used in the initial linear`-regression-based` fitness estimate (default: 2) <!-- A walk-through of an old version is included as a jupyter notebook in a previous version of the software [here](https://github.com/FangfeiLi05/PyFitSeq/blob/master/PyFitSeq_Walk_Through.ipynb). --> ## Evolution Simulator `evo_simulator.py` simulates competitive pooled growth of lineages. This simulation includes sampling noise from growth, cell transfers (bottlenecks), DNA extraction, PCR, and sequencing. For example: python evo_simulator.py -i input_EvoSimulation.csv \ -t 0 3 6 9 12 -r 50 50 50 50 50 \ -o output python evo_simulator.py -i input_EvoSimulation.csv \ -t 0 2 4 6 8 -r 75 75 75 75 50 \ -n DNA_extraction PCR sequencing -d 300 -p 27 -f w \ -o output ### Options * `--input` or `-i`: a .csv file, with + 1st column of .csv: fitness of each genotype, [x1, x2, ...] + 2nd column .csv: initial cell number of each genotype at generation 0, [n1, n2, ...] * `--t_seq` or `-t`: time-points evaluated in number of generations (format: 0 t1 t2 ...) * `--read_num_average_seq` or `-r`: average number of reads per genotype for each time-point (format: 0 r1 r2 ...) * `--noise_option` or `-n`: which types of noise to include in the simulation, default is all sources of noise (default: growth bottleneck_transfer DNA_extraction PCR sequencing) * `--dna_copies` or `-d`: average genome copy number per genotype used as template in PCR (default: 500) * `--pcr_cycles` or `-p`: number of cycles of PCR (default: 25) * `--fitness_type` or `-f`: type of fitness: Wrightian fitness (w), or Malthusian fitness (m)' (default: m) * `--output_filename` or `-o`: prefix of output .csv files (default: output), this tool *AUTOMATICALLY* generates files named, for a `-o` option of `output`: + `output_filename_EvoSimulation_Read_Number.csv`: read number per genotype for each time-point + `output_filename_EvoSimulation_Mean_Fitness.csv`: mean fitness for each time-point + `output_filename_EvoSimulation_Input_Log.csv`: a record of all inputs See `python evo_simulator.py --help` for a reminder... # History of version of this software. 1. PyFitSeq is a Python-based fitness estimation tool for pooled amplicon sequencing studies. The conceptual/math work and first implementation is described in the paper [Unbiased Fitness Estimation of Pooled Barcode or Amplicon Sequencing Studies](https://doi.org/10.1016/j.cels.2018.09.004), and this code is [available here](https://github.com/sashaflevy/Fit-Seq). 2. This was rewritten in python, [available here](https://github.com/FangfeiLi05/PyFitSeq) and is a python-translated version of the MATLAB tool FitSeq above. 3. This repo is a fork of that python version to fix some bugs and tweak the speed, flexibility, and interface. Also, changed the name to `fitseq` to reflect that it's the current development version. **Wrightian fitness does not yet work in this version**. Sorry. # Run from the base repo directory, so bash testing/test_singularity.sh # After building the image of course


نیازمندی

مقدار نام
>=1.17.3 numpy
>=0.25.3 pandas
>=1.3.1 scipy
- tqdm


زبان مورد نیاز

مقدار نام
>=3.6 Python


نحوه نصب


نصب پکیج whl fitseq-1.5.0:

    pip install fitseq-1.5.0.whl


نصب پکیج tar.gz fitseq-1.5.0:

    pip install fitseq-1.5.0.tar.gz