معرفی شرکت ها


datatracer-0.0.6.dev0


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Data Lineage Tracing Library
ویژگی مقدار
سیستم عامل -
نام فایل datatracer-0.0.6.dev0
نام datatracer
نسخه کتابخانه 0.0.6.dev0
نگهدارنده []
ایمیل نگهدارنده []
نویسنده MIT Data To AI Lab
ایمیل نویسنده dailabmit@gmail.com
آدرس صفحه اصلی https://github.com/HDI-Project/DataTracer
آدرس اینترنتی https://pypi.org/project/datatracer/
مجوز MIT license
<p align="left"> <img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt=“DAI-Lab” /> <i>An open source project from Data to AI Lab at MIT.</i> </p> [![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) [![PyPI Shield](https://img.shields.io/pypi/v/datatracer.svg)](https://pypi.python.org/pypi/datatracer) [![Downloads](https://pepy.tech/badge/datatracer)](https://pepy.tech/project/datatracer) [![Run Tests](https://github.com/data-dev/DataTracer/workflows/Run%20Tests/badge.svg)](https://github.com/data-dev/DataTracer/actions) # DataTracer Data Lineage Tracing Library * License: [MIT](https://github.com/data-dev/DataTracer/blob/master/LICENSE) * Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) * Homepage: https://github.com/data-dev/DataTracer ## Overview DataTracer is a Python library for solving Data Lineage problems using statistical methods, machine learning techniques, and hand-crafted heuristics. Currently the Data Tracer library implements discovery of the following properties: * **Primary Key**: Identify which column is the primary key in each table. * **Foreign Key**: Find which relationships exist between the tables. * **Column Mapping**: Given a field in a table, deduce which other fields, from the same table or other tables, are more related or contributed the most in generating the given field. ### REST API The DataTracer library also incorporates a REST API that enables interaction with the DataTracer Solvers via HTTP communication. You can check it [here](rest) # Install ## Requirements **DataTracer** has been developed and tested on [Python 3.5 and 3.6, 3.7](https://www.python.org/downloads/) Also, although it is not strictly required, the usage of a [virtualenv]( https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid interfering with other software installed in the system where **DataTracer** is run. ## Install with pip The easiest and recommended way to install **DataTracer** is using [pip]( https://pip.pypa.io/en/stable/): ```bash pip install datatracer ``` This will pull and install the latest stable release from [PyPi](https://pypi.org/). If you want to install from source or contribute to the project please read the [Contributing Guide](https://hdi-project.github.io/DataTracer/contributing.html#get-started). # Data Format: Datasets and Metadata The DataTracer library is prepared to work using datasets, which are a collection of tables loaded as `pandas.DataFrames` and a MetaData JSON which provides information about the dataset structure. You can find more information about the MetaData format in the [MetaData repository]( https://github.com/signals-dev/MetaData). The DataTracer also includes a few [demo datasets](datatracer/datasets) which you can easily download to your computer using the `datatracer.get_demo_data` function: ```python3 from datatracer import get_demo_data get_demo_data() ``` This will create a folder called `datatracer_demo` in your working directory with a few datasets ready to use inside it. # Quickstart In this short tutorial we will guide you through a series of steps that will help you getting started with **Data Tracer**. ## Load data The first step will be to load the data in the format expected by DataTracer. For this, we can use the `datatracer.load_dataset` function passing the path to the dataset folder. For example, if we want to use the `classicmodels` dataset included in the demo folder that we just created we can load it using: ```python3 from datatracer import load_dataset metadata, tables = load_dataset('datatracer_demo/classicmodels') ``` This will return a tuple which contains: * A `MetaData` instance with details about the dataset. * A `dict` with all the tables of the dataset loaded as a `pandas.DataFrame`. ## Select a Solver In the DataTracer project, the different Data Lineage problems are solved using what we call _solvers_. We can see the list of available solvers using the `get_solvers` function: ```python3 from datatracer import get_solvers get_solvers() ``` which will return a list with their names: ``` ['datatracer.column_map', 'datatracer.foreign_key.basic', 'datatracer.foreign_key.standard', 'datatracer.primary_key.basic'] ``` ## Use a DataTracer instance to find table relationships In order to use the selected solver you will need to load it using the `DataTracer` class. In this example, we will try to figure out the relationships between the tables in our dataset by using the solver `datatracer.foreign_key.standard`. ```python3 from datatracer import DataTracer # Load the Solver solver = DataTracer.load('datatracer.foreign_key.standard') # Solve the Data Lineage problem foreign_keys = solver.solve(tables) ``` The result will be a dictionary containing the foreign key candidates: ``` [{'table': 'products', 'field': 'productLine', 'ref_table': 'productlines', 'ref_field': 'productLine'}, {'table': 'payments', 'field': 'customerNumber', 'ref_table': 'customers', 'ref_field': 'customerNumber'}, {'table': 'orders', 'field': 'customerNumber', 'ref_table': 'customers', 'ref_field': 'customerNumber'}, {'table': 'orderdetails', 'field': 'productCode', 'ref_table': 'products', 'ref_field': 'productCode'}, {'table': 'orderdetails', 'field': 'orderNumber', 'ref_table': 'orders', 'ref_field': 'orderNumber'}, {'table': 'employees', 'field': 'officeCode', 'ref_table': 'offices', 'ref_field': 'officeCode'}] ``` # What's next? You can learn more about the DataTracer features in the [notebook tutorials](tutorials). Also don't forget to have a look at the DataTracer [REST API](rest). # History ## 0.0.6 - 2020-06-19 * Add `update_metadata` primitives and pipelines. * Upgrade to MetaData v0.0.2 ## 0.0.5 - 2020-06-12 * Add new `update_metadata` endpoint to the REST API. * New demo dataset and new tutorial. ## 0.0.4 - 2020-06-05 * Add initial version of pretrained solvers * Reorganize ColumnMapSolver code tree * Add REST API to access DataTracer solvers via HTTP ## 0.0.3 - 2020-05-28 * Finish Column Mapping and add tutorial * Minor refactoring and adding docstrings * Fix testing config ## 0.0.2 - 2020-05-26 * Curate configuration and dependencies ## 0.0.1 - 2020-05-22 First release. Features: * Primary Key Detection * Foreign Key Detection * Column Mapping


نیازمندی

مقدار نام
<0.25,>=0.23.4 pandas
<0.21,>=0.20.0 scikit-learn
<1.17,>=1.15.2 numpy
==0.3.4 mlblocks
==0.0.2 metad
<3,>=2.0.0 falcon
<3,>=2.6.1 hug
<6,>=5.3.1 pyyaml
<5,>=4.46.1 tqdm
<0.6,>=0.5.3 bumpversion
>=9.0.1 pip
<0.11,>=0.8.3 watchdog
<0.3,>=0.2.0 m2r
<0.7,>=0.5.0 nbsphinx
<3,>=1.7.1 Sphinx
<0.5,>=0.2.4 sphinx-rtd-theme
>=0.1.10 autodocsumm
<4,>=3.7.7 flake8
<5,>=4.3.4 isort
<2,>=1.1 autoflake
<2,>=1.4.3 autopep8
<4,>=1.10.0 twine
>=0.30.0 wheel
<6,>=4.5.1 coverage
<4,>=2.9.1 tox
>=3.4.2 pytest
>=2.6.0 pytest-cov
<2,>=1.0.0 jupyter
<0.5,>=0.4.3 rundoc
>=3.4.2 pytest
>=2.6.0 pytest-cov
<2,>=1.0.0 jupyter
<0.5,>=0.4.3 rundoc


زبان مورد نیاز

مقدار نام
>=3.5,<3.8 Python


نحوه نصب


نصب پکیج whl datatracer-0.0.6.dev0:

    pip install datatracer-0.0.6.dev0.whl


نصب پکیج tar.gz datatracer-0.0.6.dev0:

    pip install datatracer-0.0.6.dev0.tar.gz