معرفی شرکت ها


CMash-0.3.0


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Fast and accurate set similarity estimation via containment min hash (for genomic datasets).
ویژگی مقدار
سیستم عامل -
نام فایل CMash-0.3.0
نام CMash
نسخه کتابخانه 0.3.0
نگهدارنده []
ایمیل نگهدارنده []
نویسنده David Koslicki
ایمیل نویسنده dmkoslicki@gmail.com
آدرس صفحه اصلی https://github.com/dkoslicki/CMash
آدرس اینترنتی https://pypi.org/project/CMash/
مجوز -
# CMash CMash is a fast and accurate way to estimate the similarity of two sets. This is a probabilisitic data analysis approach, and uses containment min hashing. Please see the [associated paper](http://www.biorxiv.org/content/early/2017/09/04/184150) for further details (and please cite if you use it): >Improving Min Hash via the Containment Index with applications to Metagenomic Analysis >David Koslicki, Hooman Zabeti >bioRxiv 184150; doi: https://doi.org/10.1101/184150 ## Installation The easiest way to install this is to use [virtualenv](https://virtualenv.pypa.io/en/stable/): ```bash virtualenv CMashVE source CMashVE/bin/activate pip install -U pip pip install CMash ``` You can also just use ``pip install CMash`` if you don't want to create a virtual environment. To get the absolute latest edition of CMash, then you can build from the Github repository via: ```bash virtualenv CMashVE source CMashVE/bin/activate pip install -U pip git clone https://github.com/dkoslicki/CMash.git cd CMash pip install -r requirements.txt ``` Note that the python code in this repository is python2 and python3 compatible, but the dependency ``khmer`` technically requires python3 (but ``khmer`` version ``2.1.1`` runs just fine in python2.) The external dependency ``pytst`` requires python2, so I'm making this a python2 repository. ## Usage The basic paradigm is to create a reference/training database, form a sample bloom filter, and then query the database. #### Forming a reference/training database Say you have three reference fasta/q file: ``ref1.fa``, ``ref2.fa`` and ``ref3.fa``. In a file (here called ``FileNames.txt``), place the absolute paths pointing to the fasta/q files: ```bash cat FileNames.txt # /abs/path/to/ref1.fa # /abs/path/to/ref2.fa # /abs/path/to/ref3.fa ``` Then you can create the training database via: ```bash MakeDNADatabase.py FileNames.txt TrainingDatabase.h5 ``` See ``MakeDNADatabase.py -h`` for more options when forming a database. #### Creating a sample bloom filter Given a (large) query fasta/q file ``Metagenome.fa``, you can *optionally* create a bloom filter via ``MakeNodeGraph.py Metagenome.fa .``. See ``MakeNodeGraph.py -h`` for more details about this function. This step is not strictly necessary (as the next step automatically forms a nodegraph/bloom filter if you didn't already create one). However, I've provided this script in case you want to pre-process a bunch of metagenomes. #### Query the database To get containment and Jaccard index estimates of the references files in your query file ``Metagenome.fa``, use something like the following ``QueryDNADatabase.py Metagenome.fa TrainingDatabase.h5 Output.csv``. There are a bunch of options available: ``QueryDNADatabase.py -h``. The output file is a CSV file with rows corresponding (in this case) to ``ref1.fa``, ``ref2.fa``, and ``ref3.fa`` and columns corresponding to the containment index estimate, intersection cardinality, and Jaccard index estimate. #### Other functionality The module ``MinHash`` (imported in python via ``from CMash import MinHash as MH``) has a bunch more functionality, including (but not limited to!): 1. Fast updates to the training databases (via ``help(MH.delete_from_database)``, ``help(MH.insert_to_database)``, ``help(MH.union_databases)``) 2. Ability to form a matrix of Jaccard indexes (for comparison of all pairwise Jaccard indexes of organisms in the training database). This is useful for identifying redundances/patterns/structure in your training database: ``help(MH.form_jaccard_count_matrix)`` and ``help(MH.form_jaccard_matrix)``. 3. Access to the k-mers that MinHash randomly selected (see the class ``CountEstimator`` and the associated ``_kmers`` data structure.) I'd encourage you to poke through the source code of ``MinHash.py`` and take a look at the scripts as well. Protein databases (and for that matter, arbitrary K-length strings) coming soon...


نیازمندی

مقدار نام
>=2.1.1 khmer
- screed
- h5py
- numpy
- blist
- argparse
- pandas
>=24.2.0 setuptools
- six
- scipy
- matplotlib


نحوه نصب


نصب پکیج whl CMash-0.3.0:

    pip install CMash-0.3.0.whl


نصب پکیج tar.gz CMash-0.3.0:

    pip install CMash-0.3.0.tar.gz