معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

DataRobot Airflow provider.

ویژگی	مقدار
سیستم عامل	-
نام فایل	airflow-provider-datarobot-0.0.4
نام	airflow-provider-datarobot
نسخه کتابخانه	0.0.4
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Andrius Senulis
ایمیل نویسنده	andrius.senulis@datarobot.com
آدرس صفحه اصلی	http://www.datarobot.com/
آدرس اینترنتی	https://pypi.org/project/airflow-provider-datarobot/
مجوز	Apache License 2.0

# DataRobot Provider for Apache Airflow This package provides operators, sensors, and a hook that integrates [DataRobot](https://www.datarobot.com) into Apache Airflow. Using these components you should be able to build the essential DataRobot pipeline - create a project, train models, deploy a model, score predictions against the model deployment. ## Installation Prerequisites: - [Apache Airflow](https://pypi.org/project/apache-airflow/) - [DataRobot Python API client](https://pypi.org/project/datarobot/) Install the DataRobot provider: ``` pip install airflow-provider-datarobot ``` ## Connection In the Airflow user interface, create a new DataRobot connection in `Admin > Connections`: * Connection Type: `DataRobot` * Connection Id: `datarobot_default` (default) * API Key: `your-datarobot-api-key` * DataRobot endpoint URL: `https://app.datarobot.com/api/v2` (default) Create the API Key in the [DataRobot Developer Tools](https://app.datarobot.com/account/developer-tools) page, `API Keys` section (see [DataRobot Docs](https://app.datarobot.com/docs/api/api-quickstart/api-qs.html#create-a-datarobot-api-token) for more details). By default, all components use `datarobot_default` connection ID. ## Config JSON for dag run Operators and sensors use parameters from the [config](https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html?highlight=config#Named%20Arguments_repeat21) which must be submitted when triggering the dag. Example config JSON with required parameters: { "training_data": "s3-presigned-url-or-local-path-to-training-data", "project_name": "Project created from Airflow", "autopilot_settings": { "target": "readmitted" }, "deployment_label": "Deployment created from Airflow", "score_settings": { "intake_settings": { "type": "s3", "url": "s3://path/to/scoring-data/Diabetes10k.csv", "credential_id": "62160b511fb29da8dd5f2c81" }, "output_settings": { "type": "s3", "url": "s3://path/to/results-dir/Diabetes10k_predictions.csv", "credential_id": "62160b511fb29da8dd5f2c81" } } } These config values can be accessed in the `execute()` method of any operator the dag in the `context["params"]` variable, e.g. getting a training data you would use this in the operator: def execute(self, context: Dict[str, Any]) -> str: ... training_data = context["params"]["training_data"] ... ## Modules ### [Operators](https://github.com/datarobot/airflow-provider-datarobot/blob/main/datarobot_provider/operators/datarobot.py) - `CreateProjectOperator` Creates a DataRobot project and returns its ID. Required config params: training_data: str - pre-signed S3 URL or local path to training dataset project_name: str - project name In case of an S3 input, the `training_data` value must be a [pre-signed AWS S3 URL](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html). For more [project settings](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/autodoc/api_reference.html#project) see the DataRobot docs. Returns a project ID. - `TrainModelsOperator` Triggers DataRobot Autopilot to train models. Parameters: project_id: str - DataRobot project ID Required config params: "autopilot_settings": { "target": "readmitted" } `target` is a required parameter with the column name which defines the modeling target. For more [autopilot settings](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/autodoc/api_reference.html#datarobot.models.Project.set_target) see the DataRobot docs. Returns `None`. - `DeployModelOperator` Deploys a specified model. Parameters: model_id: str - DataRobot model ID Required config params: deployment_label - deployment label/name For more [deployment settings](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/autodoc/api_reference.html#datarobot.models.Deployment) see the DataRobot docs. Returns a deployment ID. - `DeployRecommendedModelOperator` Deploys a recommended model. Parameters: project_id: str - DataRobot project ID Required config params: deployment_label: str - deployment label For more [deployment settings](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/autodoc/api_reference.html#datarobot.models.Deployment) see the DataRobot docs. Returns a deployment ID. - `ScorePredictionsOperator` Scores batch predictions against the deployment. Prerequisites: - [S3 credentials added to DataRobot via Python API client](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/reference/admin/credentials.html#s3-credentials). You need the `creds.credential_id` for the `credential_id` parameter in the config. - OR a Dataset ID in the AI Catalog Parameters: deployment_id: str - DataRobot project ID Required config params: "score_settings": { "intake_settings": { "type": "s3", "url": "s3://my-bucket/Diabetes10k.csv", "credential_id": "62160b511fb29da8dd5f2c81" }, "output_settings": { "type": "s3", "url": "s3://my-bucket/Diabetes10k_predictions.csv", "credential_id": "62160b511fb29da8dd5f2c81" } } Config params for scoring a Dataset in the AI Catalog: "score_settings": { "intake_settings": { "type": "dataset", "dataset_id": "<datasetId>", }, "output_settings": { ... } } For more [batch prediction settings](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/autodoc/api_reference.html#datarobot.models.BatchPredictionJob.score) see the DataRobot docs. Returns a batch prediction job ID. - `GetTargetDriftOperator` Gets the target drift from a deployment. Parameters: deployment_id: str - DataRobot deployment ID No config params are required. [Optional params](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/autodoc/api_reference.html#datarobot.models.Deployment.get_target_drift) may be passed in the config as follows: "target_drift": { ... } Returns a dict with the target drift data. - `GetFeatureDriftOperator` Gets the feature drift from a deployment. Parameters: deployment_id: str - DataRobot deployment ID No config params are required. [Optional params](https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/autodoc/api_reference.html#datarobot.models.Deployment.get_feature_drift) may be passed in the config as follows: "feature_drift": { ... } Returns a dict with the feature drift data. ### [Sensors](https://github.com/datarobot/airflow-provider-datarobot/blob/main/datarobot_provider/sensors/datarobot.py) - `AutopilotCompleteSensor` Checks whether the Autopilot has completed. Parameters: project_id: str - DataRobot project ID - `ScoringCompleteSensor` Checks whether batch scoring has completed. Parameters: job_id: str - Batch prediction job ID ### [Hooks](https://github.com/datarobot/airflow-provider-datarobot/blob/main/datarobot_provider/hooks/datarobot.py) - `DataRobotHook` A hook for initializing DataRobot Public API client. ## Pipeline The modules described above allows to construct a standard DataRobot pipeline in an Airflow dag: create_project_op >> train_models_op >> autopilot_complete_sensor >> deploy_model_op >> score_predictions_op >> scoring_complete_sensor ## Examples See the [**examples**](https://github.com/datarobot/airflow-provider-datarobot/blob/main/datarobot_provider/example_dags) directory for the example DAGs. ## Issues Please submit [issues](https://github.com/datarobot/airflow-provider-datarobot/issues) and [pull requests](https://github.com/datarobot/airflow-provider-datarobot/pulls) in our official repo: [https://github.com/datarobot/airflow-provider-datarobot](https://github.com/datarobot/airflow-provider-datarobot) We are happy to hear from you. Please email any feedback to the authors at [support@datarobot.com](mailto:support@datarobot.com). # Copyright Notice Copyright 2022 DataRobot, Inc. and its affiliates. All rights reserved. This is proprietary source code of DataRobot, Inc. and its affiliates. Released under the terms of DataRobot Tool and Utility Agreement.

نیازمندی

مقدار	نام
>=2.0	apache-airflow
>=2.28.0	datarobot

زبان مورد نیاز

مقدار	نام
~=3.7	Python

نحوه نصب

نصب پکیج whl airflow-provider-datarobot-0.0.4:

pip install airflow-provider-datarobot-0.0.4.whl

نصب پکیج tar.gz airflow-provider-datarobot-0.0.4:

pip install airflow-provider-datarobot-0.0.4.tar.gz