معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Python module to report, clean, and optimize Pandas Dataframes effectively

ویژگی	مقدار
سیستم عامل	-
نام فایل	clean-df-0.2.3
نام	clean-df
نسخه کتابخانه	0.2.3
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Nael Aqel
ایمیل نویسنده	dev@naelaqel.com
آدرس صفحه اصلی	https://github.com/naelaqel/clean_df
آدرس اینترنتی	https://pypi.org/project/clean-df/
مجوز	MIT license

======== clean_df ======== .. image:: https://img.shields.io/pypi/v/clean_df.svg :target: https://pypi.python.org/pypi/clean_df .. image:: https://github.com/NaelAqel/clean_df/actions/workflows/test.yml/badge.svg :target: https://github.com/NaelAqel/clean_df/actions/workflows/test.yml .. image:: https://readthedocs.org/projects/clean-df/badge/?version=latest :target: https://clean-df.readthedocs.io/en/latest/?version=latest :alt: Documentation Status .. image:: https://img.shields.io/pypi/l/clean_df.svg :target: https://github.com/NaelAqel/clean_df/blob/main/LICENSE Python module to report, clean, and optimize **Pandas Dataframes** effectively. **Full Documentation** `Here`_. .. _Here: https://naelaqel.com/clean_df/ Description and Features ------------------------ The first step of any data analysis project is to check and clean the data, in this module I implemented a very effiecint code that can: * Automatically drop columns that have a unique value (these columns are useless, so it will be dropped). * Report your **Pandas DataFrame** to decide for actions, this report will show: #. Duplicated rows report. #. Columnsâ€™ Datatype to optimize memory report. #. Columns to convert to categorical report. #. Outliers report. #. Missing values report. * Clean the dataframe by dropping columns that have a high ratio of missing values, rows with missing values, and duplicated rows in the dataframe. * Optimize the dataframe by converting columns to the desired data type and converting categorical columns to 'category' data type. Installation ------------ To install ``clean_df``, run this command in your terminal:: $ pip install clean_df For more information on installation details for this project, please see the ``docs/installation.rst`` file. Usage ----- This module is very easy to use, for a full detailed example please see the ``docs/usage.rst`` file. Importing the module ^^^^^^^^^^^^^^^^^^^^ :: from clean_df import CleanDataFrame Defining the class ^^^^^^^^^^^^^^^^^^ Pass your pandas dataframe to ``CleanDataFrame`` class:: cdf = CleanDataFrame( df=df, # the dataframe to be cleaned max_num_cat=5 # maximum number of unique values in a column to be ) # converted to categorical datatype, default is 5 Reporting ^^^^^^^^^ Call ``report`` method to see a full report about the dataframe (duplications, columns to optimize its data types, categorical columns, outliers, and missing values:: cdf.report( show_matrix=True, # show matrix missing values (from missingno package), default is True show_heat=True, # show heat missing values (from missingno package), default is True matrix_kws={}, # if need to pass any arguments to matrix plot, default is {} heat_kws={} # if need to pass any arguments to heat plot, default is {} ) Cleaning ^^^^^^^^ Call ``clean`` method to drop high number of missing value columns, duplicated rows, and rows with missing values:: cdf.clean( min_missing_ratio=0.05, # the minimum ratio of missing values to drop a column, default is 0.05 drop_nan=True # if True, drop the rows with missing values after dropping columns # with missingsa above min_missing_ratio drop_kws={}, # if need to pass any arguments to pd.DataFrame.drop(), default is {} drop_duplicates_kws={} # same drop_kws, but for drop_duplicates function ) Optimizing ^^^^^^^^^^ Call ``optimize`` method to optimize the dataframe by changing columns' data types based on its values for maximum memory savings:: cdf.optimize() Accessing the Cleaned Data DataFrame ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: cdf.df Contributing ------------ See the ``CONTRIBUTING.rst`` for contribution details. Feel free to contact me for any subject through my: * `Email`_ * `LinkedIn`_ * `WhatsApp`_ Also, you are welcomed to visit my personal `blog`_ . .. _Email: mailto:dev@naelaqel.com .. _LinkedIn: https://www.linkedin.com/in/naelaqel1 .. _WhatsApp: https://wa.me/962796780232 .. _blog: https://naelaqel.com License ------- Free software: MIT license. Documentation ------------- * The full documentation is hosted on my `website`_, and on `ReadTheDocs`_. * The source code is available in `GitHub`_. .. _website: https://naelaqel.com/clean_df/ .. _ReadTheDocs: https://clean_df.readthedocs.io .. _GitHub: https://github.com/naelaqel/clean_df Credits ------- * This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template. * Here are `additional`_ resources I got a lot from them. .. _Cookiecutter: https://github.com/audreyr/cookiecutter .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage .. _`additional`: https://naelaqel.com/resources/ ======= History ======= 0.2.3 (2013-03-04) ------------------ * Improve memory consumption and module performance. 0.2.2 (2023-03-03) ------------------ * Fix a bug that made "dict_keys" error in some speical cases. 0.2.1 (2023-03-03) ------------------ * Improve module performance. 0.2.0 (2023-03-02) ------------------ * Add a new report for categorical columns. * Make the module more efficient. 0.1.1 (2023-02-27) ------------------ * Rectify and organize documentation. * Provide test to GitHub Actions. 0.1.0 (2023-02-27) ------------------ * First release on PyPI.

نیازمندی

مقدار	نام
>=0.4.1	missingno
>=1.19.3	numpy
>=0.25.3	pandas
>=7.10	IPython
>=3.0.3	matplotlib

زبان مورد نیاز

مقدار	نام
>=3.7	Python

نحوه نصب

نصب پکیج whl clean-df-0.2.3:

pip install clean-df-0.2.3.whl

نصب پکیج tar.gz clean-df-0.2.3:

pip install clean-df-0.2.3.tar.gz