معرفی شرکت ها


cleanlab-studio-1.0.0


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Client interface for all things Cleanlab Studio
ویژگی مقدار
سیستم عامل -
نام فایل cleanlab-studio-1.0.0
نام cleanlab-studio
نسخه کتابخانه 1.0.0
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Cleanlab Inc
ایمیل نویسنده team@cleanlab.ai
آدرس صفحه اصلی https://github.com/cleanlab/cleanlab-studio
آدرس اینترنتی https://pypi.org/project/cleanlab-studio/
مجوز MIT
# cleanlab-studio [![Build Status](https://github.com/cleanlab/cleanlab-studio/workflows/CI/badge.svg)](https://github.com/cleanlab/cleanlab-studio/actions?query=workflow%3ACI) [![PyPI](https://img.shields.io/pypi/v/cleanlab-studio.svg)][PyPI] Command line and Python library interface to [Cleanlab Studio](https://cleanlab.ai/studio/). Upload datasets and download cleansets (cleaned datasets) from Cleanlab Studio in a single line of code! - [Installation](#installation) - [Quickstart](#quickstart) - [Advanced Usage](#advanced-usage) ## Installation You can install the Cleanlab Studio client [from PyPI][PyPI] with: ```bash pip install cleanlab-studio ``` If you already have the client installed and wish to upgrade to the latest version, run: ```bash pip install --upgrade cleanlab-studio ``` ## Quickstart ### Python API You can find your API key at https://app.cleanlab.ai/account. ```python import cleanlab_studio # create your Cleanlab Studio API client with your API key, found here: https://app.cleanlab.ai/account studio = Studio(<your api key>) # upload your dataset via a filepath, Pandas DataFrame, or PySpark DataFrame! dataset_id: str = studio.upload_dataset(<your dataset>, <your dataset name>) # navigate to Cleanlab Studio, create a project, and improve your labels # download your cleanset or apply corrections to your local Pandas or PySpark dataset! # you can find your cleanset ID by clicking on the Export Cleanset button in your project cleanset = studio.download_cleanlab_columns(<your cleanset id>) corrected_dataset = studio.apply_corrections(<your dataset>, <your cleanset id>) ``` ### CLI 1. If this is your first time using the Cleanlab CLI, authenticate with `cleanlab login`. You can find your API key at https://app.cleanlab.ai/account. 2. Upload your dataset (image, text, or tabular) using `cleanlab dataset upload`. 3. Create a project in Cleanlab Studio. 4. Improve your dataset in Cleanlab Studio (e.g., correct some labels). 5. Download your cleanset with `cleanlab cleanset download`. ## Dataset Structure Cleanlab Studio supports the following upload types: - Text/Tabular - CSV - JSON - XLS/XLSX - Pandas DataFrame *(Python library only)* - PySpark DataFrame *(Python library only)* - more to come! - Image - CSV (external media) - JSON (external media) - XLS/XLSX (external media) - Pandas DataFrame (external media) *(Python library only)* - PySpark DataFrame (external media) *(Python library only)* - Simple ZIP upload - Metadata ZIP upload - more to come! Information on dataset structuring can be found by clicking the tutorial on https://app.cleanlab.ai/upload! ## Advanced Usage ### Schema #### Python API All schema information will be inferred by default when uploading a dataset through the Python API. We provide some options to override the inferred schema if necessary: - To override the dataset modality, supply a `modality` kwarg to `studio.upload_dataset()`. Supported modalities include "text", "tabular", and "image" - To override the ID column, supply an `id_column` kwarg to `studio.upload_dataset()` - To override column types in your dataset, supply a `schema_overrides` kwarg to `studio.upload_dataset()` in the following format: ``` { <name_of_column_to_override>: { "data_type": <desired_data_type>, "feature_type": <desired_feature_type>, }, ... } ``` #### CLI To specify the column types in your dataset, create a JSON file named `schema.json`. If you would like to edit an inferred schema (rather than starting from scratch) follow these steps: 1. Kick off a dataset upload using: `cleanlab dataset upload` 2. Once schema generation is complete, you'll be asked whether you'd like to use our inferred schema. Enter `n` to decline 3. You'll then be asked whether you'd like to save the inferred schema. Enter `y` to accept. Then enter the filename you'd like to save to (`schema.json` by default) 4. Edit the schema file as you wish 5. Kick off a dataset upload again using: `cleanlab dataset upload --schema_path [path to schema file]` Your schema file should be formatted as follows: ```json { "metadata": { "id_column": "tweet_id", "modality": "text", "name": "Tweets.csv" }, "fields": { "tweet_id": { "data_type": "string", "feature_type": "identifier" }, "sentiment": { "data_type": "string", "feature_type": "categorical" }, "sentiment_confidence": { "data_type": "float", "feature_type": "numeric" }, "retweet_count": { "data_type": "integer", "feature_type": "numeric" }, "text": { "data_type": "string", "feature_type": "text" }, "tweet_created": { "data_type": "boolean", "feature_type": "boolean" }, "tweet_created": { "data_type": "string", "feature_type": "datetime" }, }, "version": "0.1.12" } ``` This is the schema of a hypothetical dataset `Tweets.csv` that contains tweets, where the column `tweet_id` contains a unique identifier for each record. Each column in the dataset is specified under `fields` with its data type and feature type. #### Data types and Feature types **Data type** refers to the type of the field's values: string, integer, float, or boolean. Note that the integer type is partially *strict*, meaning floats that are equal to integers (e.g. `1.0`, `2.0`, etc) will be accepted, but floats like `0.8` and `1.5` will not. In contrast, the float type is *lenient*, meaning integers are accepted. Users should select the float type if the field may include float values. Note too that integers can have categorical and identifier feature types, whereas floats cannot. For booleans, the list of accepted values are: true/false, t/f, yes/no, 1/0, 1.0/0.0. **Feature type** refers to the secondary type of the field, relating to how it is used in a machine learning model, such as whether it is: - a categorical value - a numeric value - a datetime value - a boolean value - text - an identifier — a string / integer that identifies some entity - a filepath value (only valid for image datasets) Some feature types can only correspond to specific data types. The list of possible feature types for each data type is shown below | Data type | Feature type | |:-----------|:-----------------------------------------------------| | string | text, categorical, datetime, identifier, filepath | | integer | categorical, datetime, identifier, numeric | | float | datetime, numeric | | boolean | boolean | The `datetime` type should be used for datetime strings, e.g. "2015-02-24 11:35:52 -0800", and Unix timestamps (which will be integers or floats). Datetime values must be parseable by [polars.from_epoch](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.from_epoch.html) for integer/floats or [polars.Expr.str.strptime](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.str.strptime.html) for strings. `version` indicates the version of the Cleanlab CLI package version used to generate the schema. [PyPI]: https://pypi.org/project/cleanlab-studio/


نیازمندی

مقدار نام
>=1.0.0 pandas
>=0.7.0 pyexcel
>=3.8.1 aiohttp
>=8.1.0 Click
>=0.4.4 colorama
>=0.7.0 pyexcel-xls
>=0.6.0 pyexcel-xlsx
>=2.27.1 requests
>=4.64.0 tqdm
>=3.1.4 ijson
>=0.6.0 jsonstreams
<3.0.0,>=2.13.0 semver
>=9.2.0 Pillow
==3.0.10 openpyxl
>=0.20.0 validators


زبان مورد نیاز

مقدار نام
>=3.8 Python


نحوه نصب


نصب پکیج whl cleanlab-studio-1.0.0:

    pip install cleanlab-studio-1.0.0.whl


نصب پکیج tar.gz cleanlab-studio-1.0.0:

    pip install cleanlab-studio-1.0.0.tar.gz