معرفی شرکت ها


etl-toolbox-0.0.3


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Useful ETL functions for Python
ویژگی مقدار
سیستم عامل -
نام فایل etl-toolbox-0.0.3
نام etl-toolbox
نسخه کتابخانه 0.0.3
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Brooklyn Rose Ludlow
ایمیل نویسنده -
آدرس صفحه اصلی https://etl-toolbox.readthedocs.io
آدرس اینترنتی https://pypi.org/project/etl-toolbox/
مجوز Apache-2.0
etl-toolbox =========== Useful ETL functions for Python **etl-toolbox** is a Python library of simple but powerful functions for ETL and data cleaning. It contains tools that are useful for nearly any ETL pipeline, with a specific focus on the data variety challenges that arise when compiling data from many sources. .. image:: https://img.shields.io/github/workflow/status/brookcub/etl-toolbox/Tests?logo=github :alt: GitHub Build Status :target: https://github.com/brookcub/etl-toolbox/actions?query=workflow%3ATests+branch%3Amaster .. image:: https://codecov.io/gh/brookcub/etl-toolbox/branch/master/graph/badge.svg :alt: Coverage :target: https://codecov.io/gh/brookcub/etl-toolbox .. image:: https://img.shields.io/readthedocs/etl-toolbox?label=docs&logo=Read%20the%20Docs :alt: Read the Docs :target: https://etl-toolbox.readthedocs.io/en/latest/ .. image:: https://img.shields.io/pypi/v/etl-toolbox?color=ffa313 :alt: PyPI Version :target: https://pypi.python.org/pypi/etl-toolbox .. image:: https://img.shields.io/pypi/pyversions/etl-toolbox?color=0882ac :alt: Supported Python versions :target: https://pypi.python.org/pypi/etl-toolbox .. image:: https://img.shields.io/pypi/l/etl-toolbox?color=0882ac :alt: License :target: https://pypi.python.org/pypi/etl-toolbox Features -------- - Standardize various null-indicating values (``'blank'``, ``'none'``, ``'null'``, etc) to Python's ``None`` - Trim, condense, and standardize whitespace with a single function - Locate and rename column labels in messy files Quick Start ----------- Installation ~~~~~~~~~~~~ .. _PyPI: http://pypi.python.org/pypi/etl-toolbox/ .. _pip: https://pip.pypa.io/en/stable/quickstart/ Install from PyPI_ using pip_:: $ pip install etl_toolbox Usage ~~~~~ >>> import pandas as pd >>> >>> df = pd.read_csv('./test_data/bad-data.csv') >>> df # doctest:+SKIP Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 0 created by: Brookcub Industries for testing purposes NaN NaN 1 date: 2020-06-07 3 rows some columns NaN 2 NaN NaN NaN NaN NaN 3 Cust. EML-addr On phn-nmbr col5 4 Golden jackal c.aureus@mail.com 03/04/14 333-4444 blank 5 Pie, rufous tree none NaN 222-3333 empty 6 Vulture, bengal blocked 06/01/15 777-7777 none 7 Arctic tern s_paradise@mail.com 01/28/16 NaN NaN 8 Eurasian badger meles@othermail.net notavailable NaN NaN 9 Grant's gazelle grant@randmail.com - NaN NaN Find and standardize column labels using a dictionary of the expected values: >>> from etl_toolbox.dataframe_functions import find_column_labels >>> from etl_toolbox.mapping_functions import map_labels >>> >>> fingerprint_map = { ... 'cust': 'Name', ... 'emladdr': 'Email', ... 'on': 'Date', ... 'phnnmbr': 'Phone' ... } >>> >>> find_column_labels(df, fingerprint_map) >>> df Cust. EML-addr On phn-nmbr col5 0 Golden jackal c.aureus@mail.com 03/04/14 333-4444 blank 1 Pie, rufous tree none NaN 222-3333 empty 2 Vulture, bengal blocked 06/01/15 777-7777 none 3 Arctic tern s_paradise@mail.com 01/28/16 NaN NaN 4 Eurasian badger meles@othermail.net notavailable NaN NaN 5 Grant's gazelle grant@randmail.com - NaN NaN >>> >>> df.columns = map_labels(df.columns, fingerprint_map) >>> df Name Email Date Phone - 0 Golden jackal c.aureus@mail.com 03/04/14 333-4444 blank 1 Pie, rufous tree none NaN 222-3333 empty 2 Vulture, bengal blocked 06/01/15 777-7777 none 3 Arctic tern s_paradise@mail.com 01/28/16 NaN NaN 4 Eurasian badger meles@othermail.net notavailable NaN NaN 5 Grant's gazelle grant@randmail.com - NaN NaN Standardize null values and remove empty rows/columns: >>> from etl_toolbox.dataframe_functions import dataframe_clean_null >>> >>> dataframe_clean_null(df) >>> df Name Email Date Phone 0 Golden jackal c.aureus@mail.com 03/04/14 333-4444 1 Pie, rufous tree NaN NaN 222-3333 2 Vulture, bengal NaN 06/01/15 777-7777 3 Arctic tern s_paradise@mail.com 01/28/16 NaN 4 Eurasian badger meles@othermail.net NaN NaN 5 Grant's gazelle grant@randmail.com NaN NaN Or clean individual data values: >>> from etl_toolbox.cleaning_functions import clean_whitespace >>> >>> clean_whitespace(''' 123 abc 456 ... def\t\t 789\t''') '123 abc 456 def 789' .. docs-exclusion-marker-start Documentation ------------- Full documentation is hosted at `etl-toolbox.readthedocs.io <https://etl-toolbox.readthedocs.io>`_. .. docs-exclusion-marker-end Contributing ------------ .. _Open an issue: https://github.com/brookcub/etl-toolbox/issues/new .. _issue tracker: https://github.com/brookcub/etl-toolbox/issues .. _this repository: https://github.com/brookcub/etl-toolbox/ .. _the Stack Overflow guide: https://stackoverflow.com/help/minimal-reproducible-example .. _PEP8: https://www.python.org/dev/peps/pep-0008/ .. _Flake8: https://flake8.pycqa.org/en/latest/ Contributions are appreciated! There are multiple ways to contribute: Bug Reports ~~~~~~~~~~~ Bug reports help make this library more robust. A good bug report should include: #. A clear description of the problem (the *expected* behavior vs the *actual* behavior) #. A minimal, reproducible example (see `the Stack Overflow guide`_) #. The platform and versions involved (operating system, Python version, ``etl-toolbox`` version, ``pandas``/``numpy`` version if applicable, etc) Submit bug reports with the `issue tracker`_ on GitHub. Feature Requests ~~~~~~~~~~~~~~~~ `Open an issue`_ to discuss features you'd like to see added to etl-toolbox. Pull Requests ~~~~~~~~~~~~~ Follow these steps for submitting pull requests: #. Find an issue or feature on the `issue tracker`_. #. Fork `this repository`_ on GitHub and make changes in a branch. #. Add tests to confirm that the bugfix/feature works as expected. #. Run the entire test suite and coverage report with ``pytest --doctest-modules --doctest-glob=*.rst --cov=etl_toolbox --ignore=docs/conf.py``. Make sure text coverage is 100% and all tests are passing. #. Submit a pull request. The code style for etl-toolbox mostly follows PEP8_. A linter like Flake8_ is recommended for double checking new contributions. Release History --------------- - 0.0.3 - Fix multiple bugs in ``merge_columns_by_label()`` that occurred with certain inputs - Change ``merge_columns_by_label()`` to remove ``None`` and ``np.nan`` values from merged columns - Change ``find_column_labels()`` to check whether the existing column labels fit the match criteria before searching rows - Change ``map_labels()`` to return ``'-'`` instead of ``None`` for unmapped labels - Change ``clean_whitespace()`` to return non-string inputs unaltered instead of raising an exception - 0.0.2 - Add GitHub continuous integration - Add project links and badges to readme and PyPI metadata - Fix bug in ``merge_columns_by_label()`` that raises a ``ValueError`` if ``df`` has multiple columns labeled ``None`` - 0.0.1 - Initial release


نیازمندی

مقدار نام
>=1.18.0 numpy
>=0.25.0 pandas
- pytest


زبان مورد نیاز

مقدار نام
>=3.5 Python


نحوه نصب


نصب پکیج whl etl-toolbox-0.0.3:

    pip install etl-toolbox-0.0.3.whl


نصب پکیج tar.gz etl-toolbox-0.0.3:

    pip install etl-toolbox-0.0.3.tar.gz