معرفی شرکت ها


databricks-pybuilder-plugin-0.0.7


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Pybuilder plugin providing tasks for assets deployment.
ویژگی مقدار
سیستم عامل -
نام فایل databricks-pybuilder-plugin-0.0.7
نام databricks-pybuilder-plugin
نسخه کتابخانه 0.0.7
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Mikhail Kavaliou
ایمیل نویسنده killswitch@tut.by
آدرس صفحه اصلی -
آدرس اینترنتی https://pypi.org/project/databricks-pybuilder-plugin/
مجوز -
#### databricks-pybuilder-plugin The plugin is considered to be used to deploy assets to a Databricks environment. The plugin is activated with the following command in your build.by: > use_plugin('pypi:databricks_pybuilder_plugin') It provides a set of tasks for uploading resources, workspaces, deploying jobs, or installing locally built egg dependencies into a databricks cluster. #### Deployment to Databricks Automated deployment is implemented on the PyBuilder tasks basis. The list of available tasks: 1. **export_workspace** - exporting a workspace. A workspace is considered to be a folder in Databricks holding a notebook, or a set of notebooks. The task uploads the `src/main/scripts/` content into a Databricks workspace. It overrides files of the same names and leaves other files as is. By default, a git branch name is used as a nested folder of a workspace for uploading the content into the Databricks workspace, if it's available. The `default` folder is used otherwise. Use the `branch` parameter to import a workspace in your own folder: >pyb export_workspace -P branch={custom_directory} The final output path would be `/team_folder/application_name/{custom_directory}` in this way. Executing the command from a master branch >pyb export_workspace would upload the workspace files into `/team_folder/application_name/master`. Here is the list of related deployment settings | Property | Value |Description | | ----------------------------- | ----- | ---------- | | project_workspace_path | `src/main/scripts/` | The path to a folder in the project tree holding notebooks. | | remote_workspace_path | `/team_folder/application_name/` | The Databricks folder the notebooks would be uploaded into from project_workspace_path. | All of the properties could be overridden with a -P parameter. Usage example: >pyb export_workspace [-P env={env}] [-P branch={branch}] Environment specific properties Disabled by default. 2. **export_resources** - exporting resources into dbfs. Uploads resource files into dbfs if any. Existing files are to be overridden. Here is the list of related deployment settings | Property | Value |Description | | -----------------------| ----- | ---------- | | project_resources_path | `src/main/resources/` | The path to the project resources. | All of the properties could be overridden with a -P parameter. Usage example: >pyb export_resources [-P env={env}] [-P branch={branch}] 3. **install_library** - deploying an egg-archive to a Databricks cluster. Uploads an egg archive to dbfs, and re-attaches the library to a cluster by name. Re-installing a new library version triggers the cluster starting to uninstall old libraries versions and to install a new one. Repetitive installations of a library of the same version don't start the cluster. The library is just re-attached to a cluster in this way. Other installed libraries are not affected. Here is the list of related deployment settings | Property | Value |Description | | --------------------| ----- | ---------- | | remote_cluster_name | `Test_cluster_name` | The name of a remote Databricks cluster the library to be installed to. | | dbfs_library_path | `dbfs:/FileStore/jars` | The dbfs path to a folder holding the egg archives. | All of the properties could be overridden with a -P parameter. Usage example: >pyb install_library 4. **deploy_to_cluster** - a full deployment to a cluster. Runs the `export_resources`, `export_workspace`, `install_library` in a row. Usage example: >pyb deploy_to_cluster 5. **deploy_job** - deploying a job to the Databricks by name. Please, make sure that the job is created on the Databricks side. Executes `export_resources` and `export_workspace` tasks preliminarily. Updates the existing job using a job definition file. The definition file supports the jinja2 template syntax. Please, check documentation for details: https://jinja.palletsprojects.com/en/2.11.x/ Here is the list of related deployment settings | Property | Value |Description | | --------------------| ----- | ---------- | | job_definition_path | `src/main/databricks/databricks_job_settings.json` | The project path to a job definition file. | All of the properties could be overridden with a -P parameter. Usage example: >pyb deploy_job [-P env={env}] [-P branch={branch}] #### To Run a notebook with a custom dependency 1. Build the egg-archive with the`pyb` command. 2. Deploy all the assets using the command `pyb deploy_to_cluster`. 3. Get to the target folder in the Databricks workspace. 4. Attach the notebook to a cluster and run the script. #### All properties list | Property | Default Value | Description | | --------------------|--------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | databricks_credentials | `{`<br/>`'dev': {'host': '', 'token': ''}`<br/>`'qa': {'host': '', 'token': ''}`<br/>`'prod': {'host': '', 'token': ''}`<br/>`}` | Please specify credentials in the dictionary format: host and token. | | default_environment | `dev` | There are 3 supported environments: `dev`, `qa` and `prod`. | | project_workspace_path | `src/main/scripts/` | The directory content is going to be uploaded into a databricks workspace.<br/>These are considered to be notebook scripts. | | remote_workspace_path | | The databricks workspace that files in `project_workspace_path` are copied to. | | include_git_branch_into_output_workspace_path | `True` | The flag enables adding an extra directory with the `branch` name to the `remote_workspace_path`. Requires git to be installed. | | enable_env_sensitive_workspace_properties | `False` | The flag enables environment properties chosen by the `env_config_workspace_path`. | | env_config_workspace_path | `environment-settings/{env}.py` | The path to a property file to be chosen as a env properties. By default `env` included into a file name is used to pick properties. | | env_config_name | `env` | The expected environment properties file name. The `env_config_workspace_path` will be copied to databricks workspace with name. | | with_dbfs_resources | `False` | The flage enables uploading resource files from the `project_resources_path` directory to databricks hdfs `dbfs_resources_path`. | | project_resources_path | `src/main/resources/` | The local directory path holding resource files to be copied (txt, csv etc). | | dbfs_resources_path | | The output hdfs directory on databricks environment holding resources. | | dbfs_library_path | `dbfs:/FileStore/jars` | The output hdfs directory on databricks environment holding a built dependency (egg-archive). | | attachable_lib_envs | `['dev']` | The list of environments that requires a dependency attached to a databricks cluster. The dependency is preliminary must be uploaded to the `dbfs_library_path`. | | cluster_init_timeout | `5 * 60` | The timeout of waiting a databricks cluster while it changes its state (initiating, restarting etc). | | remote_cluster_name | | The name of a databricks cluster that dependency is attached to. | | job_definition_path | 'src/main/databricks/job_settings.json' | The path to a dataricks job configuration in a json format - https://docs.databricks.com/dev-tools/api/2.0/jobs.html. It supports Jinja template syntax in order to setup env sensitive properties. It also supports multiple jobs definitions - use a json array for that. the list of properties available by default: env, branch, remote_workspace_path, remote_workspace_path | | deploy_single_job | | The name of a job to be deployed. If your databricks job config contains multiple definitions, you can deploy just one of these jobs specifying a name of the particular job. | | extra_rendering_args | | Custom properties to be populated in the job definition file. Use a dicionary as an argument. For example: `{'app_name': name}`. |


نیازمندی

مقدار نام
- Jinja2
- databricks-cli


نحوه نصب


نصب پکیج whl databricks-pybuilder-plugin-0.0.7:

    pip install databricks-pybuilder-plugin-0.0.7.whl


نصب پکیج tar.gz databricks-pybuilder-plugin-0.0.7:

    pip install databricks-pybuilder-plugin-0.0.7.tar.gz