معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

An utility to clean the data and return you the cleaned data

ویژگی	مقدار
سیستم عامل	-
نام فایل	data-cleaning-1.0.1
نام	data-cleaning
نسخه کتابخانه	1.0.1
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Murali krishna mopi devi, Induraj P.Ramamurthy
ایمیل نویسنده	mopidevi@gmail.com, induraj.gandhian@yahoo.com
آدرس صفحه اصلی	https://github.com/DataPreprocessing/DataCleaning
آدرس اینترنتی	https://pypi.org/project/data-cleaning/
مجوز	-

<img src="https://github.com/DataPreprocessing/DataCleaning/blob/mopidevimu/img/datacleaning.png" width="100%"/> ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/data-cleaning) ![PyPI - License](https://img.shields.io/pypi/l/data-cleaning) ![PyPI](https://img.shields.io/pypi/v/data-cleaning) ![GitHub repo size](https://img.shields.io/github/repo-size/DataPreprocessing/DataCleaning) <h1 align="center">DATA CLEANING</h1> ## Description <p>In any Machine Learning process, Data Preprocessing is the primary step wherein the raw/unclean data are transformed into cleaned data, So that in the later stage, machine learning algorithms can be applied. This python paackage make the data preprocessing very easy in just 2 lines of code. All you have to do is just input a raw data(CSV file), this library will clean your data and return you the cleaned dataframe on which further you can apply feature engineering, feature selection and modeling. - What this does? * Cleans special character * Removes duplicates * Fixes abnormality in column names * Imputes the data (categorical & numerical) </p> ## Data Cleaning <p>Data-cleaning is a python package for data preprocessing. This cleans the CSV file and returns the <b>cleaned data frame</b>. It does the work of imputation, removing duplicates, replacing special characters, and many more.</p> ## How to use: Step 1: Install the libaray ````python pip install data-cleaning ```` Step 2: Import the library, and specify the path of the csv file. ````python from datacleaning import DataCleaning dp = DataCleaning(file_uploaded='filename.csv') cleaned_df = dp.start_cleaning() ```` There are some optional parameters that you can specify as listed below, ## Usage: ````python from datacleaning import DataCleaning DataCleaning(file_uploaded='filename.csv', separator=",", row_threshold=None, col_threshold=None, special_character=None, action=None, ignore_columns=None, imputation_type="RDF") ```` ## Parameters ------ | Parameter | Default Value | Limit | Example | | ------ | ------ | ------ | ------ | | file_uploaded | ***none*** | Provide a CSV file. | filename.csv | | separator | ***,*** | Separator used in csv file | ****;**** | row_threshold | ***none*** | 0 to 100 | 80 | | col_threshold | ***none*** | 0 to 100 | 80 | | special_character | Check the list below |Sspecify the character <br> that is not listed in default_list (see below) | [ '$' , '?' ] | | action | ***none*** | ***add*** or ***remove*** | add | | ignore_columns | ***none*** | Provide list of column names <br> to ignoring the special characters operation. | [ 'column1', 'column2' ] | | imputation_type | ***RDF*** | Select your preferred imputation <br> ***RDF***, ***KNN***, ***mean***, ***median***, ***most_frequent***, ***constant*** . | KNN | ## Examples of using parameters ### - Appending extra special characters to the existing default_list The DEFAULT SPECIAL CHARACTERS included in the package are shown below, ````python default_list = ["!", '"', "#", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", ":", ";", "<", "=", ">", "?", "@", "[", "\\", "]", "^", "_", "`", "{", "|", "}", "~", "â€“", "//", "%*", ":/", ".;", "Ã˜", "Â§",'$',"Â£"] ```` How to remove a special character, say for example if you want to remove "?" and "%". <i>Note:- Do not forget to give <b> action = 'remove' </b></i> ````python from datacleaning import DataCleaning dp = DataCleaning(file_uploaded='filename.csv', special_character =['?', '%'], action='remove') cleaned_df = dp.start_cleaning() ```` How to add a special character, say for example if you want to add "Ã©" that is not in the default_list given above. <i>Note:- Do not forget to give <b> action = 'add' </b></i> ````python from datacleaning import DataCleaning dp = DataCleaning(file_uploaded='filename.csv', special_character =['Ã©'], action='add') cleaned_df = dp.start_cleaning() ```` ### - Ignoring a particular columns and adding a special character Say for example, column named "timestamp" and "date" needs to be removed and a special character needs to be added 'Ã©' ````python from datacleaning import DataCleaning dp = DataCleaning(file_uploaded='filename.csv', special_character =['Ã©'], action='add', ignore_columns=['timestamp', 'date']) cleaned_df = dp.start_cleaning() ```` ### - Changing threshold to remove null rows/columns above this given threshold value ````python from datacleaning import DataCleaning dp = DataCleaning(file_uploaded='filename.csv', row_threshold=50, col_threshold=90) cleaned_df = dp.start_cleaning() ```` ### - Imputation methods available - RDF (RandomForest) -> (DEFAULT) - KNN (k-nearest neighbors) - mean - median - most_frequent - constant ````python # Example for KNN imputation. from datacleaning import DataCleaning dp = DataCleaning(file_uploaded='filename.csv', imputation_type='KNN') cleaned_df = dp.start_cleaning() ```` <h2 align="center"> >> THANK YOU << </h2>

نیازمندی

مقدار	نام
-	missingpy
-	joblib
-	numpy
-	pandas
-	python-dateutil
-	pytz
-	scikit-learn
-	scipy
-	six
-	threadpoolctl

زبان مورد نیاز

مقدار	نام
>=3.6	Python

نحوه نصب

نصب پکیج whl data-cleaning-1.0.1:

pip install data-cleaning-1.0.1.whl

نصب پکیج tar.gz data-cleaning-1.0.1:

pip install data-cleaning-1.0.1.tar.gz