======================
Data Science Utilities
======================
.. image:: https://img.shields.io/pypi/v/data_science_utilities.svg
:target: https://pypi.python.org/pypi/data_science_utilities
.. image:: https://img.shields.io/travis/truocphamkhac/data-science-utilities.svg
:target: https://travis-ci.org/truocphamkhac/data-science-utilities
.. image:: https://readthedocs.org/projects/data-science-utilities/badge/?version=latest
:target: http://data-science-utilities-python.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
Data Science utilities in python.
* Free software: MIT license
* Documentation: http://data-science-utilities-python.readthedocs.io.
Features
========
Missing Data Statistic
----------------------
.. code:: python
from data_science_utilities import data_science_utilities
# make statistic
missing_data = data_science_utilities.missing_data_stats(df)
# display statistic
missing_data
Read CSV files from path
------------------------
.. code:: python
from data_science_utilities import data_science_utilities
train_path = '../data/raw/train.csv'
test_path = '../data/raw/test.csv'
X_train, X_test = data_science_utilities.read_csv_files(train_path, test_path)
Plotting distribution normal
----------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_dist_norm(dist, 'distribution normal')
Plotting correlation matrix
---------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_corelation_matrix(data)
Plotting top attributes correlation matrix
------------------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_top_corelation_matrix(data, target, k=10, cmap='YlGnBu')
Plotting attributes by scatter chart
------------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_scatter(data, column_name, target)
Plotting attributes by box bar
------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_box(data, column_name, target)
Plotting category by box bar
----------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_category_columns(data, limit_bars=10)
Generate a simple plot of the test and traning learning curve
-------------------------------------------------------------
.. code:: python
from data_science_utilities import data_science_utilities
data_science_utilities.plot_learning_curve(estimator, title, X, y, ylim=None,
cv=None, train_sizes=np.linspace(.1, 1.0, 5))
Credits
=======
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
=======
History
=======
0.2.4 (2018-05-21)
------------------
* Fixed render docs on README.
0.2.3 (2018-05-21)
------------------
* Fixed render docs on https://pypi.org/.
0.2.2 (2018-05-21)
------------------
* Fix render docs con't.
0.2.1 (2018-05-21)
------------------
* Fix render docs.
0.2.0 (2018-05-14)
------------------
* Adds utils about visualization.
0.1.0 (2018-05-11)
------------------
* First release on PyPI.