معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Auto Tune Models

ویژگی	مقدار
سیستم عامل	-
نام فایل	atm-0.2.2
نام	atm
نسخه کتابخانه	0.2.2
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	MIT Data To AI Lab
ایمیل نویسنده	dailabmit@gmail.com
آدرس صفحه اصلی	https://github.com/HDI-project/ATM
آدرس اینترنتی	https://pypi.org/project/atm/
مجوز	MIT license

<p align="left"> <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt=“ATM” /> <i>An open source project from Data to AI Lab at MIT.</i> </p> # ATM - Auto Tune Models [![CircleCI](https://circleci.com/gh/HDI-Project/ATM.svg?style=shield)](https://circleci.com/gh/HDI-Project/ATM) [![Travis](https://travis-ci.org/HDI-Project/ATM.svg?branch=master)](https://travis-ci.org/HDI-Project/ATM) [![PyPi Shield](https://img.shields.io/pypi/v/atm.svg)](https://pypi.python.org/pypi/atm) [![Coverage Status](https://codecov.io/gh/HDI-project/ATM/branch/master/graph/badge.svg)](https://codecov.io/gh/HDI-project/ATM) [![Downloads](https://pepy.tech/badge/atm)](https://pepy.tech/project/atm) - License: MIT - Documentation: https://HDI-Project.github.io/ATM/ - Homepage: https://github.com/HDI-Project/ATM # Overview Auto Tune Models (ATM) is an AutoML system designed with ease of use in mind. In short, you give ATM a classification problem and a dataset as a CSV file, and ATM will try to build the best model it can. ATM is based on a [paper](https://dai.lids.mit.edu/wp-content/uploads/2018/02/atm_IEEE_BIgData-9-1.pdf) of the same name, and the project is part of the [Human-Data Interaction (HDI) Project](https://hdi-dai.lids.mit.edu/) at MIT. # Install ## Requirements **ATM** has been developed and tested on [Python 2.7, 3.5, and 3.6](https://www.python.org/downloads/) Also, although it is not strictly required, the usage of a [virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid interfering with other software installed in the system where **ATM** is run. These are the minimum commands needed to create a virtualenv using python3.6 for **ATM**: ```bash pip install virtualenv virtualenv -p $(which python3.6) atm-venv ``` Afterwards, you have to execute this command to have the virtualenv activated: ```bash source atm-venv/bin/activate ``` Remember about executing it every time you start a new console to work on **ATM**! ## Install with pip After creating the virtualenv and activating it, we recommend using [pip](https://pip.pypa.io/en/stable/) in order to install **ATM**: ```bash pip install atm ``` This will pull and install the latest stable release from [PyPi](https://pypi.org/). ## Install from source Alternatively, with your virtualenv activated, you can clone the repository and install it from source by running `make install` on the `stable` branch: ```bash git clone git@github.com:HDI-Project/ATM.git cd ATM git checkout stable make install ``` ## Install for Development If you want to contribute to the project, a few more steps are required to make the project ready for development. First, please head to [the GitHub page of the project](https://github.com/HDI-Project/ATM) and make a fork of the project under you own username by clicking on the **fork** button on the upper right corner of the page. Afterwards, clone your fork and create a branch from master with a descriptive name that includes the number of the issue that you are going to work on: ```bash git clone git@github.com:{your username}/ATM.git cd ATM git branch issue-xx-cool-new-feature master git checkout issue-xx-cool-new-feature ``` Finally, install the project with the following command, which will install some additional dependencies for code linting and testing. ```bash make install-develop ``` Make sure to use them regularly while developing by running the commands `make lint` and `make test`. # Data Format ATM input is always a CSV file with the following characteristics: * It uses a single comma, `,`, as the separator. * Its first row is a header that contains the names of the columns. * There is a column that contains the target variable that will need to be predicted. * The rest of the columns are all variables or features that will be used to predict the target column. * Each row corresponds to a single, complete, training sample. Here are the first 5 rows of a valid CSV with 4 features and one target column called `class` as an example: ``` feature_01,feature_02,feature_03,feature_04,class 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa ``` This CSV can be passed to ATM as local filesystem path but also as a complete AWS S3 Bucket and path specification or as a URL. You can find a collection of demo datasets in the [atm-data S3 Bucket in AWS](https://atm-data.s3.amazonaws.com/index.html). # Quickstart In this short tutorial we will guide you through a series of steps that will help you getting started with **ATM** by exploring its Python API. ## 1. Get the demo data The first step in order to run **ATM** is to obtain the demo datasets that will be used in during the rest of the tutorial. For this demo we will be using the pollution csv from the atm-data bucket, which you can download with your browser [from here](https://atm-data.s3.amazonaws.com/pollution_1.csv), or using the following command: ```bash atm download_demo pollution_1.csv ``` ## 2. Create an ATM instance The first thing to do after obtaining the demo dataset is creating an ATM instance. ```python from atm import ATM atm = ATM() ``` By default, if the ATM instance is without any arguments, it will create an SQLite database called `atm.db` in your current working directory. If you want to connect to a SQL database instead, or change the location of your SQLite database, please check the [API Reference](https://hdi-project.github.io/ATM/api/atm.core.html) for the complete list of available options. ## 3. Search for the best model Once you have the **ATM** instance ready, you can use the method `atm.run` to start searching for the model that better predicts the target column of your CSV file. This function has to be given the path to your CSV file, which can be a local filesystem path, an URL to and HTTP or S3 resource. For example, if we have previously downloaded the [pollution_1.csv](https://atm-data.s3.amazonaws.com/pollution_1.csv) file inside our current working directory, we can call `run` like this: ```python results = atm.run(train_path='pollution_1.csv') ``` Alternatively, we can use the HTTPS URL of the file to have ATM download the CSV for us: ```python results = atm.run(train_path='https://atm-data.s3.amazonaws.com/pollution_1.csv') ``` As the last option, if we have the file inside an S3 Bucket, we can download it by passing an URI in the `s3://{bucket}/{key}` format: ```python results = atm.run(train_path='s3://atm-data/pollution_1.csv') ``` In order to make this work with a Private S3 Bucket, please make sure to having configured your [AWS credentials file](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html), or to having created your `ATM` instance passing it the `access_key` and `secret_key` arguments. This `run` call will start what is called a `Datarun`, and a progress bar will be displayed while the different models are tested and tuned. ```python Processing dataset demos/pollution_1.csv 100%|##########################| 100/100 [00:10<00:00, 6.09it/s] ``` Once this process has ended, a message will print that the `Datarun` has ended. Then we can explore the `results` object. ## 4. Explore the results Once the Datarun has finished, we can explore the `results` object in several ways: **a. Get a summary of the Datarun** The `describe` method will return us a summary of the Datarun execution: ```python results.describe() ``` This will print a short description of this Datarun similar to this: ```python Datarun 1 summary: Dataset: 'demos/pollution_1.csv' Column Name: 'class' Judgment Metric: 'f1' Classifiers Tested: 100 Elapsed Time: 0:00:07.638668 ``` **b. Get a summary of the best classifier** The `get_best_classifier` method will print information about the best classifier that was found during this Datarun, including the method used and the best hyperparameters found: ```python results.get_best_classifier() ``` The output will be similar to this: ```python Classifier id: 94 Classifier type: knn Params chosen: n_neighbors: 13 leaf_size: 38 weights: uniform algorithm: kd_tree metric: manhattan _scale: True Cross Validation Score: 0.858 +- 0.096 Test Score: 0.714 ``` **c. Explore the scores** The `get_scores` method will return a `pandas.DataFrame` with information about all the classifiers tested during the Datarun, including their cross validation scores and the location of their pickled models. ```python scores = results.get_scores() ``` The contents of the scores dataframe should be similar to these: ```python cv_judgment_metric cv_judgment_metric_stdev id test_judgment_metric rank 0 0.8584126984 0.0960095737 94 0.7142857143 1.0 1 0.8222222222 0.0623609564 12 0.6250000000 2.0 2 0.8147619048 0.1117618135 64 0.8750000000 3.0 3 0.8139393939 0.0588721670 68 0.6086956522 4.0 4 0.8067754468 0.0875180564 50 0.6250000000 5.0 ... ``` ## 5. Make predictions Once we have found and explored the best classifier, we will want to make predictions with it. In order to do this, we need to follow several steps: **a. Export the best classifier** The `export_best_classifier` method can be used to serialize and save the best classifier model using pickle in the desired location: ```python results.export_best_classifier('path/to/model.pkl') ``` If the classifier has been saved correctly, a message will be printed indicating so: ```python Classifier 94 saved as path/to/model.pkl ``` If the path that you provide already exists, you can ovewrite it by adding the argument `force=True`. **b. Load the exported model** Once it is exported you can load it back by calling the `load` method from the `atm.Model` class and passing it the path where the model has been saved: ```python from atm import Model model = Model.load('path/to/model.pkl') ``` Once you have loaded your model, you can pass new data to its `predict` method to make predictions: ```python import pandas as pd data = pd.read_csv(demo_datasets['pollution']) predictions = model.predict(data.head()) ``` # What's next? For more details about **ATM** and all its possibilities and features, please check the [documentation site](https://HDI-Project.github.io/ATM/). There you can learn more about its [Command Line Interface](https://hdi-project.github.io/ATM/cli.html) and its [REST API](https://hdi-project.github.io/ATM/rest.html), as well as [how to contribute to ATM](https://HDI-Project.github.io/ATM/community/contributing.html) in order to help us developing new features or cool ideas. # Credits ATM is an open source project from the Data to AI Lab at MIT which has been built and maintained over the years by the following team: * Bennett Cyphers <bcyphers@mit.edu> * Thomas Swearingen <swearin3@msu.edu> * Carles Sala <csala@csail.mit.edu> * Plamen Valentinov <plamen@pythiac.com> * Kalyan Veeramachaneni <kalyan@mit.edu> * Micah Smith <micahjsmith@gmail.com> * Laura Gustafson <lgustaf@mit.edu> * Kiran Karra <kiran.karra@gmail.com> * Max Kanter <kmax12@gmail.com> * Alfredo Cuesta-Infante <alfredo.cuesta@urjc.es> * Favio André Vázquez <favio.vazquezp@gmail.com> * Matteo Hoch <minime@hochweb.com> ## Citing ATM If you use ATM, please consider citing the following paper: Thomas Swearingen, Will Drevo, Bennett Cyphers, Alfredo Cuesta-Infante, Arun Ross, Kalyan Veeramachaneni. [ATM: A distributed, collaborative, scalable system for automated machine learning.](https://cyphe.rs/static/atm.pdf) *IEEE BigData 2017*, 151-162 BibTeX entry: ```bibtex @inproceedings{DBLP:conf/bigdataconf/SwearingenDCCRV17, author = {Thomas Swearingen and Will Drevo and Bennett Cyphers and Alfredo Cuesta{-}Infante and Arun Ross and Kalyan Veeramachaneni}, title = {{ATM:} {A} distributed, collaborative, scalable system for automated machine learning}, booktitle = {2017 {IEEE} International Conference on Big Data, BigData 2017, Boston, MA, USA, December 11-14, 2017}, pages = {151--162}, year = {2017}, crossref = {DBLP:conf/bigdataconf/2017}, url = {https://doi.org/10.1109/BigData.2017.8257923}, doi = {10.1109/BigData.2017.8257923}, timestamp = {Tue, 23 Jan 2018 12:40:42 +0100}, biburl = {https://dblp.org/rec/bib/conf/bigdataconf/SwearingenDCCRV17}, bibsource = {dblp computer science bibliography, https://dblp.org} } ``` ## Related Projects ### BTB [BTB](https://github.com/hdi-project/btb), for Bayesian Tuning and Bandits, is the core AutoML library in development under the HDI project. BTB exposes several methods for hyperparameter selection and tuning through a common API. It allows domain experts to extend existing methods and add new ones easily. BTB is a central part of ATM, and the two projects were developed in tandem, but it is designed to be implementation-agnostic and should be useful for a wide range of hyperparameter selection tasks. ### Featuretools [Featuretools](https://github.com/featuretools/featuretools) is a python library for automated feature engineering. It can be used to prepare raw transactional and relational datasets for ATM. It is created and maintained by [Feature Labs](https://www.featurelabs.com) and is also a part of the [Human Data Interaction Project](https://hdi-dai.lids.mit.edu/). # History ## 0.2.2 (2019-07-30) ### New Features * Curate dependencies - [Issue #152](https://github.com/HDI-Project/ATM/issues/152) by @csala * POST request blocked by CORS policy - [Issue #151](https://github.com/HDI-Project/ATM/issues/151) by @pvk-developer ## 0.2.1 (2019-06-24) ### New Features * Rest API Cross-origin resource sharing (CORS) - [Issue #146](https://github.com/HDI-Project/ATM/issues/146) by @pvk-developer ## 0.2.0 (2019-05-29) New Python API ### New Features * New API for ATM usage within Python - [Issue #142](https://github.com/HDI-Project/ATM/issues/142) by @pvk-developer and @csala * Improved Documentation - [Issue #142](https://github.com/HDI-Project/ATM/issues/142) by @pvk-developer and @csala * Code cleanup - [Issue #102](https://github.com/HDI-Project/ATM/issues/102) by @csala * Ensure datasets can be downloaded from S3 - [Issue #137](https://github.com/HDI-Project/ATM/issues/137) by @pvk-developer * Change to PyMySQL to remove libmysqlclient-dev system dependency - [Issue #136](https://github.com/HDI-Project/ATM/issues/136) by @pvk-developer and @csala ## 0.1.2 (2019-05-07) REST API and Cluster Management. ### New Features * REST API Server - Issues [#82](https://github.com/HDI-Project/ATM/issues/82) and [#132](https://github.com/HDI-Project/ATM/issues/132) by @RogerTangos, @pvk-developer and @csala * Add Cluster Management commands to start and stop the server and multiple workers as background processes - [Issue #130](https://github.com/HDI-Project/ATM/issues/130) by @pvk-developer and @csala * Add TravisCI and migrate docs to GitHub Pages - [Issue #129](https://github.com/HDI-Project/ATM/issues/129) by @pvk-developer ## 0.1.1 (2019-04-02) First Release on PyPi. ### New Features * Upgrade to latest BTB. * New Command Line Interface. ## 0.1.0 (2018-05-04) * First Release.

نیازمندی

مقدار	نام
<0.3,>=0.2.5	baytune
<2,>=1.9.146	boto3
<0.18,>=0.16.0	future
<0.10,>=0.9.3	pymysql
<1.17,>=1.13.1	numpy
<0.25,>=0.22.0	pandas
<6,>=5.6.1	psutil
<3,>=2.2.3	python-daemon
<3,>=2.18.4	requests
<0.22,>=0.18.2	scikit-learn
<1.4,>=0.19.1	scipy
<1.4,>=1.1.14	sqlalchemy
<2,>=1.0.2	flask
<0.18,>=0.17.0	flask-restless
<2.5,>=2.3.2	flask-sqlalchemy
==0.0.3	flask-restless-swagger-2
<4,>=3.16.0	simplejson
<5,>=4.31.1	tqdm
<0.15,>=0.10	docutils
>=0.5.3	bumpversion
>=9.0.1	pip
>=0.8.3	watchdog
>=0.2.0	m2r
>=1.7.1	Sphinx
>=0.2.4	sphinx-rtd-theme
>=0.1.10	autodocsumm
>=3.7.7	flake8
>=4.3.4	isort
>=1.1	autoflake
>=1.4.3	autopep8
>=1.10.0	twine
>=0.30.0	wheel
>=4.5.1	coverage
>=2.9.1	tox
>=2.0.0	mock
>=2.5.1	pytest-cov
>=3.0	pytest-runner
>=1.20.1	pytest-xdist
>=3.2.3	pytest
==2.8.12	google-compute-engine
>=2.0.0	mock
>=2.5.1	pytest-cov
>=3.0	pytest-runner
>=1.20.1	pytest-xdist
>=3.2.3	pytest
==2.8.12	google-compute-engine

نحوه نصب

نصب پکیج whl atm-0.2.2:

pip install atm-0.2.2.whl

نصب پکیج tar.gz atm-0.2.2:

pip install atm-0.2.2.tar.gz