معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

The data product processor (dpp) is a library for dynamically creating and executing Apache Spark Jobs based on a declarative description of a data product.

ویژگی	مقدار
سیستم عامل	-
نام فایل	data-product-processor-1.0.4
نام	data-product-processor
نسخه کتابخانه	1.0.4
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Amazon Web Services
ایمیل نویسنده	-
آدرس صفحه اصلی	https://github.com/aws-samples/dpac-data-product-processor
آدرس اینترنتی	https://pypi.org/project/data-product-processor/
مجوز	Apache License 2.0

# data product processor The data product processor is a library for dynamically creating and executing Apache Spark Jobs based on a declarative description of a data product. The declaration is based on YAML and covers input and output data stores as well as data structures. It can be augmented with custom, PySpark-based transformation logic. ## Installation **Prerequisites** - Python 3.x - Apache Spark 3.x **Install with pip** ```commandline pip install data-product-processor ``` ## Getting started ### Declare a basic data product Please see [Data product specification](docs/data-product-specification.md) for an overview on the files required to declare a data product. ### Process the data product From folder in which the previously created file are stored, run the data-product-processor as follows: ```commandline data-product-processor \ --default_data_lake_bucket some-datalake-bucket \ --aws_profile some-profile \ --aws_region eu-central-1 \ --local ``` This command will run Apache Spark locally (due to the --local switch) and store the output on an S3 bucket (authenticated with the AWS profile used in the parameter). If you want to run the library from a different folder than the data product decleration, reference the latter through the additional argument `--product_path`. ```commandline data-product-processor \ --product_path ../path-to-some-data-product \ --default_data_lake_bucket some-datalake-bucket \ --aws_profile some-profile \ --aws_region eu-central-1 \ --local ``` ## CLI Arguments ```commandline data-product-processor --help --JOB_ID - the unique id of this Glue/EMR job --JOB_RUN_ID - the unique id of this Glue job run --JOB_NAME - the name of this Glue job --job-bookmark-option - job-bookmark-disable if you don't want bookmarking --TempDir - tempoarary results directory --product_path - the data product definition folder --aws_profile - the AWS profile to be used for connection --aws_region - the AWS region to be used --local - local development --jars - extra jars to be added to the Spark context --additional-python-modules - this parameter is injected by Glue, currently it is not in use --default_data_lake_bucket - a default bucket location (with s3a:// prefix) ``` ## References - [Data product specification](docs/data-product-specification.md) - [Access management](docs/access-management.md) ## Tutorials - [How to write and test custom transformation logic?](docs/how-to/transformation-logic.md) - [How to reference custom Spark dependencies?](docs/how-to/custom-dependencies.md) - [How to set up local development?](docs/how-to/local-development.md)

نیازمندی

مقدار	نام
==1.18.34	boto3
-	botocore
==0.38.1	wheel
==5.4.1	pyyaml
-	pydantic
-	quinn
==1.18.34	boto3-stubs[glue]
==1.18.34	mypy-boto3-glue
==3.0.2	jsonschema
==3.2.0	pyspark
>=1.19.5	numpy

نحوه نصب

نصب پکیج whl data-product-processor-1.0.4:

pip install data-product-processor-1.0.4.whl

نصب پکیج tar.gz data-product-processor-1.0.4:

pip install data-product-processor-1.0.4.tar.gz