معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

DataSentics Lab - experimental open-source repo

ویژگی	مقدار
سیستم عامل	-
نام فایل	datasentics-lab-0.1.3
نام	datasentics-lab
نسخه کتابخانه	0.1.3
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Adam Volny
ایمیل نویسنده	adam.volny@datasentics.com
آدرس صفحه اصلی	https://github.com/DataSentics/datasentics-lab
آدرس اینترنتی	https://pypi.org/project/datasentics-lab/
مجوز	MIT

# datasentics-lab dslab is a fully open-source package that simplifies everyday tasks for data scientists that rely on Databricks. It contains experimental code primarily developed and maintained by DataSentics. All contributions and contributors are very welcome! ## Installation Pyspark is the only dependency that needs to be preinstalled. The package is available on PyPI: ``` pip install datasentics-lab ``` ## Utilities ### DBPath `DBPath` is a QoL utility that simplifies and unifies files handling in Databricks. It's design and API is inspired by `pathlib.Path`. #### Showcase ```python from dslab.dbpath import DBPath DBPath.set_spark_session(spark) # used to initialize dbutils instance path = DBPath('dbfs:/FileStore/') path.ls() # lists files in directory in human-readable format path.tree(max_depth=2) # prints indented directory tree file = path / 'tmp' / 'my_file' with file.open('wt') as f: f.write('It really is this simple!') print(file.read_text()) file.write_text('And this is even easier!') print(file.read_text()) print(f'{file} exists: {file.exists()}, is dir: {file.is_dir()}, is in filestore: {file.in_filestore}') ``` And that is just a taste! See full list of features below vvvv. #### Features ``` from dslab.dbpath import DBPath help(DBPath) ``` ``` A Utility class for working with DataBricks API paths directly and in a unified manner. The Design is inspired by pathlib.Path >>> path = DBPath('abfss://...') >>> path = DBPath('dbfs:/...') >>> path = DBPath('file:/...') >>> path = DBPath('s3:/...') >>> path = DBPath('s3a:/...') >>> path = DBPath('s3n:/...') INITIALIZATION: >>> from dslab import DBPath Provide spark session for dbutils instance >>> DBPath.set_spark_session(spark) set FileStore base download url for your dbx workspace >>> DBPath.set_base_download_url('https://adb-1234.5.azuredatabricks.net/files/') PROPERTIES: path - the whole path name - just the filename (last part of path) parent - the parent (DBPath) children - sorted list of children files (list(DBPath)), empty list for non-folders in_local, in_dbfs, in_filestore, in_lake, in_bucket - predicates for location of file BASE METHODS: exists() - returns True if file exists is_dir() - returns True if file exists and is a directory ls() - prints human readable list of contained files for folders, with file sizes tree(max_depth=5, max_files_per_dir=50) - prints the directory structure, up to `max_depth` and `max_files_per_dir` files in each directory cp(destination, recurse=False) - same as dbutils.fs.cp(str(self), str(destination), recurse) rm(recurse=False) - same as dbutils.fs.rm(str(self), recurse) mkdirs() - same as dbutils.fs.mkdirs(str(self)) iterdir() - sorted generator over files (also DBPath instances) - similar to Path.iterdir() reiterdir(regex) - sorted generator over files (DBPath) that match `bool(re.findall(regex, file))` IO METHODS: open(method='rt', encoding='utf-8') - context manager for working with any DB API file locally read_text(encoding='utf-8') - reads the file as text and returns contents read_bytes() - reads the file as bytes and returns contents write_text(text) - writes text to the file write_bytes(bytedata) - writes bytes to the file download_url() - for FileStore records returns a direct download URL make_download_url() - copies a file to FileStore and returns a direct download URL backup() - creates a backup copy in the same folder, named by following convention {filename}[.extension] -> {filename}_YYYYMMDD_HHMMSS[.extension] restore(timestamp) - restore a previous backup of this file by passing backup timestamp string (`'YYYYMMDD_HHMMSS'`) CLASS METHODS: set_spark_session(spark) - necessary to call upon initialization clear_tmp_download_cache() - clear all files created using `make_download_url()` temp_file - context manager that returns a temporary DBPath - set_base_download_url - call once upon initialization, sets base url for filestore direct downloads (e.g. 'https://adb-1234.5.azuredatabricks.net/files/') - set_protocol_temp_path - call once upon initialization for each filesystem you want to create temp files/dirs in ('dbfs' and 'file' are set by default). ``` ## Feedback All feedback is extremely welcome, please raise an issue on github or contact me at adam.volny@datasentics.com ## Contribution Contributions, extensions are welcome, don't hesitate to post a PR and we will discuss adding the feature. ### Local Environment Setup The following software needs to be installed first: * [Miniconda package manager](https://docs.conda.io/en/latest/miniconda.html) * [Git for Windows](https://git-scm.com/download/win) or standard Git in Linux (_apt-get install git_) Clone the repo now and prepare the package environment: * On **Windows**, use [Git Bash](docs/git-bash.png). * On **Linux/Mac**, the use standard console ```bash $ git clone git@github.com:DataSentics/datasentics-lab.git $ cd datasentics-lab $ ./env-init.sh ``` After the environment setup is complete, activate the Conda environment: ```bash $ conda activate ./.venv ``` ### Semantic Commit Messages We decided to use semantic commit messages for easier long-term maintenance. We're looking forward to your contributions! Format: `<type>(<scope>): <subject>` `<scope>` is optional ## Example ``` feat: add hat wobble ^--^ ^------------^ | | | +-> Summary in present tense. | +-------> Type: chore, docs, feat, fix, refactor, style, or test. ``` More Examples: - `feat`: (new feature for the user, not a new feature for build script) - `fix`: (bug fix for the user, not a fix to a build script) - `docs`: (changes to the documentation) - `style`: (formatting, missing semi colons, etc; no production code change) - `refactor`: (refactoring production code, eg. renaming a variable) - `test`: (adding missing tests, refactoring tests; no production code change) - `cicd`: (updating workflows; no production code change) - `release`: (changing version in pyproject.toml and commit message: "release: vMAJOR.MINOR.PATCH") References: - https://www.conventionalcommits.org/ - https://seesparkbox.com/foundry/semantic_commit_messages - http://karma-runner.github.io/1.0/dev/git-commit-msg.html

نیازمندی

مقدار	نام
>=2.25.1,<3.0.0	requests
>=12.8.0,<13.0.0	azure-storage-blob

زبان مورد نیاز

مقدار	نام
>=3.7.1,<4.0.0	Python

نحوه نصب

نصب پکیج whl datasentics-lab-0.1.3:

pip install datasentics-lab-0.1.3.whl

نصب پکیج tar.gz datasentics-lab-0.1.3:

pip install datasentics-lab-0.1.3.tar.gz