معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Simple and automatic data cleaning in one line of code! It performs one-hot encoding, date & time casting to datetime dtype, detects binary columns, safely convert non-numeric columns to numeric dtypes, cleaning dirty/empty values, normalizing values and removing unwanted columns all in one line of code. Get your data ready for model training and fitting quickly.

ویژگی	مقدار
سیستم عامل	-
نام فایل	AutoDataCleaner-1.1.3
نام	AutoDataCleaner
نسخه کتابخانه	1.1.3
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Sinking Titanic
ایمیل نویسنده	ofcourse7878@gmail.com
آدرس صفحه اصلی	https://github.com/sinkingtitanic/AutoDataCleaner
آدرس اینترنتی	https://pypi.org/project/AutoDataCleaner/
مجوز	-

# AutoDataCleaner [![version](https://img.shields.io/badge/Version-1.1.3-lightgrey)](https://github.com/sinkingtitanic/AutoDataCleaner) [![build](https://img.shields.io/badge/Pypi%20Build-Stable-blue)](https://pypi.org/project/AutoDataCleaner/) [![python-version](https://img.shields.io/badge/Python-3^-success)](https://www.python.org/downloads/) [![coverage](https://img.shields.io/badge/coverage-%25100-success)](https://pypi.org/project/AutoDataCleaner/) ![Preview](https://raw.githubusercontent.com/sinkingtitanic/AutoDataCleaner/main/images/autodatacleaner.png) Simple and automatic data cleaning in one line of code! It performs **one-hot encoding**, **converts columns to numeric dtype**, **cleaning dirty/empty values**, **normalizes values** and **removes unwanted columns** all in one line of code. Get your data ready for model training and fitting quickly. # Features 0. **Uses Pandas DataFrames** (no need to learn new syntax) 1. **One-hot encoding**: encodes non-numeric values to one-hot encoding columns 2. **Converts columns to numeric dtypes**: converts text numbers to numeric dtypes **see [1] below** 3. **Auto detects binary columns**: any column that has two unique values, these values will be replaced with 0 and 1 (e.g.: `['looser', 'winner'] => [0,1]`) 4. **Normalization**: performs normalization to columns (excludes binary [1/0] columns) 5. **Cleans Dirty/None/NA/Empty values**: replace None values with mean or mode of a column, delete row that has None cell or substitute None values with pre-defined value 6. **Delete Unwanted Columns**: drop and remove unwanted columns (usually this will be the 'id' column) 7. **Converts date, time or datetime columns to datetime dtype** # Installation #### Using pip `pip install AutoDataCleaner` #### Cloning repo: Clone repository and run `pip install -e .` inside the repository directory #### Install from repo directly Install from repository directly using `pip install git+git://github.com/sinkingtitanic/AutoDataCleaner.git#egg=AutoDataCleaner` # Quick One-line Usage: ``` import AutoDataCleaner.AutoDataCleaner as adc adc.clean_me(dataframe, detect_binary=True, numeric_dtype=True, one_hot=True, na_cleaner_mode="mean", normalize=True, datetime_columns=[], remove_columns=[], verbose=True) ``` # Example ``` >>> import pandas as pd >>> import AutoDataCleaner.AutoDataCleaner as adc >>> df = pd.DataFrame([ ... [1, "Male", "white", 3, "2018/11/20"], ... [2, "Female", "blue", "4", "2014/01/12"], ... [3, "Male", "white", 15, "2020/09/02"], ... [4, "Male", "blue", "5", "2020/09/02"], ... [5, "Male", "green", None, "2020/12/30"] ... ], columns=['id', 'gender', 'color', 'weight', 'created_on']) >>> >>> adc.clean_me(df, ... detect_binary=True, ... numeric_dtype=True, ... one_hot=True, ... na_cleaner_mode="mode", ... normalize=True, ... datetime_columns=["created_on"], ... remove_columns=["id"], ... verbose=True) +++++++++++++++ AUTO DATA CLEANING STARTED ++++++++++++++++ = AutoDataCleaner: Casting datetime columns to datetime dtype... + converted column created_on to datetime dtype = AutoDataCleaner: Performing removal of unwanted columns... + removed 1 columns successfully. = AutoDataCleaner: Performing One-Hot encoding... + detected 1 binary columns [['gender']], cells cleaned: 5 cells = AutoDataCleaner: Converting columns to numeric dtypes when possible... + 1 minority (minority means < %25 of 'weight' entries) values that cannot be converted to numeric dtype in column 'weight' have been set to NaN, nan cleaner function will deal with them + converted 5 cells to numeric dtypes = AutoDataCleaner: Performing One-Hot encoding... + one-hot encoding done, added 2 new columns = AutoDataCleaner: Performing None/NA/Empty values cleaning... + cleaned the following NaN values: {'weight NaN Values': 1} = AutoDataCleaner: Performing dataset normalization... + normalized 5 cells +++++++++++++++ AUTO DATA CLEANING FINISHED +++++++++++++++ gender weight created_on color_blue color_green color_white 0 1 -0.588348 2018-11-20 0 0 1 1 0 -0.392232 2014-01-12 1 0 0 2 1 1.765045 2020-09-02 0 0 1 3 1 -0.196116 2020-09-02 1 0 0 4 1 -0.588348 2020-12-30 0 1 0 ``` **If you want to pick and choose with more customization, please go to `AutoDataCleaner.py` (the code is highly documented for your convenience)** # Explaining Parameters `adc.clean_me(dataframe, detect_binary=True, one_hot=True, na_cleaner_mode="mean", normalize=True, remove_columns=[], verbose=True)` Parameters & what do they mean _Call the help function `adc.help()` to output the below instructions_ * `dataframe`: input Pandas DataFrame on which the cleaning will be performed * `detect_binary`: if True, any column that has two unique values, these values will be replaced with 0 and 1 (e.g.: ['looser', 'winner'] => [0,1]) * `numeric_dtype`: if True, columns will be converted to numeric dtypes when possible **see [1] below** * `one_hot`: if True, all non-numeric columns will be encoded to one-hot columns * `na_cleaner_mode`: what technique to use when dealing with None/NA/Empty values. Modes: * `False`: do not consider cleaning na values * `'remove row'`: removes rows with a cell that has NA value * `'mean'`: substitues empty NA cells with the mean of that column * `'mode'`: substitues empty NA cells with the mode of that column * `'*'`: any other value will substitute empty NA cells with that particular value passed here * `normalize`: if True, all non-binray (columns with values 0 or 1 are excluded) columns will be normalized. * `datetime_columns`: a list of columns which contains date or time or datetime entries (important to be announced in this list, otherwise `normalize_df` and `convert_numeric_df` functions will mess up these columns) * `remove_columns`: list of columns to remove, this is usually non-related featues such as the ID column * `verbose`: print progress in terminal/cmd * `returns`: processed and clean Pandas DataFrame [1] When `numeric_dtype` is set to True, columns that have strings of numbers (e.g.: "123" instead of 123) will be converted to numeric dtype. if in a particular column, the values that cannot be converted to numeric dtypes are minority in that column (< 25% of total entries in that column), these minority non-numeric values in that column will be converted to NaN; then, the NaN cleaner function will handle them according to your settings. See `convert_numeric_df()` function in `AutoDataCleaner.py` file for more documentation. # Prediction In prediction phase, put the examples to be predicted in Pandas DataFrame and run them through `adc.clean_me()` function **with the same parameters you used during training**. # Contribution & Maintenance This repository is seriously commented for your convenience; please feel free to send me feedback on "ofcourse7878@gmail.com", submit an issue or make a pull request!

نحوه نصب

نصب پکیج whl AutoDataCleaner-1.1.3:

pip install AutoDataCleaner-1.1.3.whl

نصب پکیج tar.gz AutoDataCleaner-1.1.3:

pip install AutoDataCleaner-1.1.3.tar.gz