معرفی شرکت ها


bambu-qsar-0.0.9


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

bambu (bioassays model builder) is CLI tool to build QSAR models based on PubChem BioAssays datasets
ویژگی مقدار
سیستم عامل -
نام فایل bambu-qsar-0.0.9
نام bambu-qsar
نسخه کتابخانه 0.0.9
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Isadora Leitzke Guidotti, Frederico Schmitt Kremer
ایمیل نویسنده leitzke.gi@gmail.com, fred.s.kremer@gmail.com
آدرس صفحه اصلی -
آدرس اینترنتی https://pypi.org/project/bambu-qsar/
مجوز -
# Bambu ![](https://img.shields.io/docker/pulls/omixlab/bambu-qsar) ![](https://img.shields.io/pypi/dm/bambu-qsar) ![](media/logo.png) Bambu (*BioAssays Model Builder*), is a simple tool to generate QSAR models based on [PubChem BioAssays](https://pubchem.ncbi.nlm.nih.gov/) datasets. It relies on [RDKit](https://rdkit.org/) and the [FLAML AutoML framework](https://github.com/microsoft/FLAML) and provides utilitaries for downloading and preprocessing datasets, as well training and running the predictive models. ## Try it! [Try Bambu on this Google Colab Notebook ^^](https://colab.research.google.com/github/omixlab/bambu-v2/blob/main/notebooks/Bambu%20Google%20Colab%20Tutorial.ipynb) ## Installing ### Installing as a PyPI package using `pip`: ``` $ pip install bambu-qsar ``` **Note:** RDKit must be installed separately. ### Intalling as an environment using `conda` on Linux: ``` $ git clone https://github.com/omixlab/bambu-v2 $ cd bambu-qsar $ make setup PLATFORM=linux $ conda activate bambu-qsar ``` ### Running it with Docker ``` $ docker run -ti omixlab/bambu-qsar:latest ``` ## Downloading a PubChem BioAssays data Downloads a PubChem BioAssays data and save in a CSV file, containing the InchI representation and the label indicating molecules that were found to be active or inactive against a given target. ``` $ bambu-download \ --pubchem-assay-id 29 \ --pubchem-InchI-chunksize 100 \ --output 29_raw.csv ``` The generated output contains the columns `pubchem_molecule_id` (Substance ID or Compound ID, depending on the option selected during download), `InChI` and `activity`. Only the fields `InchI` and `activity` are used in futher steps. |pubchem_molecule_id|pubchem_molecule_type|InChI |activity| |-------------------|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------| |596 |compounds |InChI=1S/C9H13N3O5/c10-5-1-2-12(9(16)11-5)8-7(15)6(14)4(3-13)17-8/h1-2,4,6-8,13-15H,3H2,(H2,10,11,16) |active | |1821 |compounds |InChI=1S/C9H11FN2O6/c10-3-1-12(9(17)11-7(3)16)8-6(15)5(14)4(2-13)18-8/h1,4-6,8,13-15H,2H2,(H,11,16,17) |active | |2019 |compounds |InChI=1S/C62H86N12O16/c1-27(2)42-59(84)73-23-17-19-36(73)57(82)69(13)25-38(75)71(15)48(29(5)6)61(86)88-33(11)44(55(80)65-42)67-53(78)35-22-21-31(9)51-46(35)64-47-40(41(63)50(77)32(10)52(47)90-51)54(79)68-45-34(12)89-62(87)49(30(7)8)72(16)39(76)26-70(14)58(83)37-20-18-24-74(37)60(85)43(28(3)4)66-56(45)81/h21-22,27-30,33-34,36-37,42-45,48-49H,17-20,23-26,63H2,1-16H3,(H,65,80)(H,66,81)(H,67,78)(H,68,79)|active | |2082 |compounds |InChI=1S/C12H15N3O2S/c1-3-6-18-8-4-5-9-10(7-8)14-11(13-9)15-12(16)17-2/h4-5,7H,3,6H2,1-2H3,(H2,13,14,15,16) |active | |2569 |compounds |InChI=1S/C15H19N3O5/c1-8-11(17-3-4-17)14(20)10(9(22-2)7-23-15(16)21)12(13(8)19)18-5-6-18/h9H,3-7H2,1-2H3,(H2,16,21) |active | |2674 |compounds |InChI=1S/C29H26O10/c1-10(30)5-12-18-19-13(6-11(2)31)29(37-4)27(35)21-15(33)8-17-23(25(19)21)22-16(38-9-39-17)7-14(32)20(24(18)22)26(34)28(12)36-3/h7-8,10-11,30-31,34-35H,5-6,9H2,1-4H3 |active | |2693 |compounds |InChI=1S/C31H30N6O6S4/c1-33-25(42)30(15-38)34(2)23(40)28(33,44-46-30)12-17-13-36(21-11-7-4-8-18(17)21)27-14-29-24(41)35(3)31(16-39,47-45-29)26(43)37(29)22(27)32-20-10-6-5-9-19(20)27/h4-11,13,22,32,38-39H,12,14-16H2,1-3H3 |active | ## Computing descriptors or fingerprints Computes molecule descriptors or Morgan fingerprints for a given datasets produced by `bambu-download` (or following the same format). The output also contains a `train` and `test` subsets, whose sizes are defined based on the `--train-test-split-percent` argument. The argument `--resample` might be used to perform a random undersampling in the dataset, as most HTS datasets are heavily umbalanced. The path passed to `--output` is used as template to generate the `train` and `test` file. In this case, `29_preprocess_train.csv` and `29_preprocess_test.csv` respectively. ``` $ bambu-preprocess \ --input 29_raw.csv \ --output 29_preprocess.csv \ --output-preprocessor 29_preprocessor.pickle \ --feature-type morgan-2048 \ --train-test-split 0.75 \ --undersample ``` ## Train Trains a classification model using the FLAML AutoML framework based on the `bambu-preprocess` output datasets. The user may adjust most of the `flaml.automl.AutoML` parameters using the command line arguments. In this case we are using an Extra Trees Classifier. ``` $ bambu-train \ --input-train 29_preprocess_train.csv \ --output 29_model.pickle \ --time-budget 3600 \ --estimators extra_tree ``` A list of all available estimators can be accessed using the command `bambu-train --list-estimators`. Currently, only `rf` (*Random Forest*) and `extra_tree` are available. ## Validation An y-randomization validation can be performed using the command `bambu-validate`, which will compute accuracy, recall, precision, f1-score and ROC AUC training with the original training dataset and by validating with the test one, and furtherly randomizing the training labels and several times (`--randomizations`). For each randomization, classification metrics are computed again and significancy value (p-value) is computed based on the z-score-normalized metrics. ``` $ bambu-validate \ --input-train 29_preprocess_train.csv \ --input-test 29_preprocess_test.csv \ --model 29_model.pickle \ --output 29_model.validation.json \ --randomizations 100 ``` ## Predict Receives an inputs, preprocess it using a preprocess object (generated using `bambu-preprocess`) and then runs a classification model (generated using `bambu-train`). Results are saved in a CSV file. ``` $ bambu-predict \ --input pubchem_compounds.sdf \ --preprocessor 29_preprocessor.pickle \ --model 29_model.pickle \ --output 29_predictions.csv ``` # Contact Feel free to open issues or pull requests! You may also contact us by email. Dr. Frederico Schmitt Kremer, PhD. E-mail: [fred.s.kremer@gmail.com](fred.s.kremer@gmail.com).


نیازمندی

مقدار نام
- pandas
- numpy
- scikit-learn
- flaml
- imbalanced-learn
- requests
- tqdm
- lightgbm
- retrying
==3.8.3 gensim


نحوه نصب


نصب پکیج whl bambu-qsar-0.0.9:

    pip install bambu-qsar-0.0.9.whl


نصب پکیج tar.gz bambu-qsar-0.0.9:

    pip install bambu-qsar-0.0.9.tar.gz