معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Automate machine learning pipelines rapidly

ویژگی	مقدار
سیستم عامل	-
نام فایل	blitzml-0.9.0
نام	blitzml
نسخه کتابخانه	0.9.0
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	AI Team
ایمیل نویسنده	AI Team <CV.Team.CSE.2023@gmail.com>
آدرس صفحه اصلی	-
آدرس اینترنتی	https://pypi.org/project/blitzml/
مجوز	-

<div align="center"> # <span style="color:violet">BlitzML</span> ### **Automate machine learning pipelines rapidly** <div align="left"> - [Install BlitzML](#install-blitzml) - [Classification](#classification) - [Regression](#regression) # Install BlitzML ```bash pip install blitzml ``` # Classification ```python from blitzml.tabular import Classification import pandas as pd # prepare your dataframes train_df = pd.read_csv("auxiliary/datasets/banknote/train.csv") test_df = pd.read_csv("auxiliary/datasets/banknote/test.csv") # create the pipeline auto = Classification(train_df, test_df, classifier = 'RF', n_estimators = 50) # first perform data preprocessing auto.preprocess() # second train the model auto.train_the_model() # After training the model we can generate: auto.gen_pred_df(auto.test_df) auto.gen_metrics_dict() # We can get their values using: pred_df = auto.pred_df metrics_dict = auto.metrics_dict print(pred_df.head()) print(metrics_dict) ``` ## Available Classifiers - Random Forest 'RF' - LinearDiscriminantAnalysis 'LDA' - Support Vector Classifier 'SVC' - KNeighborsClassifier 'KNN' - GaussianNB 'GNB' - LogisticRegression 'LR' - AdaBoostClassifier 'AB' - GradientBoostingClassifier 'GB' - DecisionTreeClassifier 'DT' - MLPClassifier 'MLP' ## **Parameters** **classifier** options: {'RF','LDA','SVC','KNN','GNB','LR','AB','GB','DT','MLP', 'auto', 'custom'}, default = 'RF' `auto: selects the best scoring classifier based on f1-score` `custom: enables providing a custom classifier through *file_path* and *class_name* parameters` **file_path** when using 'custom' classifier, pass the path of the file containing the custom class, default = 'none' **class_name** when using 'custom' classifier, pass the class name through this parameter, default = 'none' **feature_selection** options: {'correlation', 'importance', 'none'}, default = 'none' `correlation: use feature columns with the highest correlation with the target` `importance: use feature columns that are important for the model to predict the target` `none: use all feature columns` **validation_percentage** value determining the validation split percentage (value from 0 to 1), default = 0.1 **average_type** when performing multiclass classification, provide the average type for the resulting metrics, default = 'macro' **cross_validation_k_folds** number of k-folds for cross validation, if 1 then no cv will be performed, default = 1 ****kwargs** optional parameters for the chosen classifier. you can find available parameters in the [sklearn docs](https://scikit-learn.org/stable/user_guide.html) ## **Attributes** **train_df** the preprocessed train dataset (after running `Classification.preprocess()`) **test_df** the preprocessed test dataset (after running `Classification.preprocess()`) **model** the trained model (after running `Classification.train_the_model()`) **pred_df** the prediction dataframe (test_df + predicted target) (after running `Classification.gen_pred_df(Classification.test_df)`) **metrics_dict** the validation metrics (after running `Classification.gen_metrics_dict()`) { "accuracy": acc, "f1": f1, "precision": pre, "recall": recall, "hamming_loss": h_loss, "cross_validation_score":cv_score, `returns None if cross_validation_k_folds==1` } ## **Methods** **preprocess()** perform preprocessing on train_df and test_df **train_the_model()** train the chosen classifier on the train_df **accuracy_history()** accuracy scores when varying the sampling size of the train_df (after running `Classification.train_the_model()`). *returns:* { 'x':train_df_sample_sizes, 'y1':train_scores_mean, 'y2':test_scores_mean, 'title':title } **gen_pred_df(test_df)** generates the prediction dataframe and assigns it to the `pred_df` attribute **gen_metrics_dict()** generates the validation metrics and assigns it to the `metrics_dict` **run()** a shortcut that runs the following methods: preprocess() train_the_model() gen_pred_df(Classification.test_df) gen_metrics_dict() # Regression ```python from blitzml.tabular import Regression import pandas as pd # prepare your dataframes train_df = pd.read_csv("auxiliary/datasets/house prices/train.csv") test_df = pd.read_csv("auxiliary/datasets/house prices/test.csv") # create the pipeline auto = Regression(train_df, test_df, regressor = 'RF') # first perform data preprocessing auto.preprocess() # second train the model auto.train_the_model() # After training the model we can generate: auto.gen_pred_df(auto.test_df) auto.gen_metrics_dict() # We can get their values using: pred_df = auto.pred_df metrics_dict = auto.metrics_dict print(pred_df.head()) print(metrics_dict) ``` ## Available Regressors - Random Forest 'RF' - Support Vector Regressor 'SVR' - KNeighborsRegressor 'KNN' - Lasso Regressor 'LSS' - LinearRegression 'LR' - Ridge Regressor 'RDG' - GaussianProcessRegressor 'GPR' - GradientBoostingRegressor 'GB' - DecisionTreeRegressor 'DT' - MLPRegressor 'MLP' ## **Parameters** **regressor** options: {'RF','SVR','KNN','LSS','LR','RDG','GPR','GB','DT','MLP', 'auto', 'custom'}, default = 'RF' `auto: selects the best scoring regressor based on r2 score` `custom: enables providing a custom regressor through *file_path* and *class_name* parameters` **file_path** when using 'custom' regressor, pass the path of the file containing the custom class, default = 'none' **class_name** when using 'custom' regressor, pass the class name through this parameter, default = 'none' **feature_selection** options: {'correlation', 'importance', 'none'}, default = 'none' `correlation: use feature columns with the highest correlation with the target` `importance: use feature columns that are important for the model to predict the target` `none: use all feature columns` **validation_percentage** value determining the validation split percentage (value from 0 to 1), default = 0.1 **cross_validation_k_folds** number of k-folds for cross validation, if 1 then no cv will be performed, default = 1 ****kwargs** optional parameters for the chosen regressor. you can find available parameters in the [sklearn docs](https://scikit-learn.org/stable/user_guide.html) ## **Attributes** **train_df** the preprocessed train dataset (after running `Regression.preprocess()`) **test_df** the preprocessed test dataset (after running `Regression.preprocess()`) **model** the trained model (after running `Regression.train_the_model()`) **pred_df** the prediction dataframe (test_df + predicted target) (after running `Regression.gen_pred_df(Regression.test_df)`) **metrics_dict** the validation metrics (after running `Regression.gen_metrics_dict()`) { "r2_score": r2, "mean_squared_error": mse, "root_mean_squared_error": rmse, "mean_absolute_error" : mae, "cross_validation_score":cv_score, `returns None if cross_validation_k_folds==1` } ## **Methods** **preprocess()** perform preprocessing on train_df and test_df **train_the_model()** train the chosen regressor on the train_df **RMSE_history()** RMSE scores when varying the sampling size of the train_df (after running `Regression.train_the_model()`). *returns:* { 'x':train_df_sample_sizes, 'y1':train_scores_mean, 'y2':test_scores_mean, 'title':title } **gen_pred_df(test_df)** generates the prediction dataframe and assigns it to the `pred_df` attribute **gen_metrics_dict()** generates the validation metrics and assigns it to the `metrics_dict` **run()** a shortcut that runs the following methods: preprocess() train_the_model() gen_pred_df(Regression.test_df) gen_metrics_dict() ## Development - Clone the repo - run `pip install virtualenv` - run `python -m virtualenv venv` - run `. ./venv/bin/activate` on UNIX based systems or `. ./venv/Scripts/activate.ps1` if on windows - run `pip install -r requirements.txt` - run `pre-commit install`

نیازمندی

مقدار	نام
>=1.2.0	joblib
<=1.23.4	numpy
>=1.5.1	pandas
>=1.1.3	scikit-learn
>=0.3	Boruta
-	pip-tools
-	pytest

زبان مورد نیاز

مقدار	نام
>=3.8	Python

نحوه نصب

نصب پکیج whl blitzml-0.9.0:

pip install blitzml-0.9.0.whl

نصب پکیج tar.gz blitzml-0.9.0:

pip install blitzml-0.9.0.tar.gz