معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Package to help with normalizing data needed for the platform!

ویژگی	مقدار
سیستم عامل	-
نام فایل	DataNormalizer-1.3.4
نام	DataNormalizer
نسخه کتابخانه	1.3.4
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Realstats
ایمیل نویسنده	info@realstats.com
آدرس صفحه اصلی	-
آدرس اینترنتی	https://pypi.org/project/DataNormalizer/
مجوز	MIT

# General Information The DataNormalizer package is made to help data scientists with validating and normalising data that they are going to import. Right now the library is able to validate columns and check datasets based on a set of rules like custom data types. ```python import DataNormalizer ``` ### Setting variables The DataNormalizer library might need the details of the target app. These setting are set through properties. ```python Clappform.Auth(baseURL="https://dev.clappform.com/", username="user@email.com", password="password") Normalise = DataNormalizer.Normalise() Normalise.app_data = Clappform.App("appname").ReadOne(extended=True) Normalise.dataframe = pandas.read_excel("../data.xlsx") Normalise.rules = json.load(open('../rules.json')) ``` ### checkRules Function that will check the custom rules against your dataframe. Requires dataframe and rules. Returns a dataframe ```python Normalise = DataNormalizer.Normalise() Normalise.dataframe = pandas.read_excel("../data.xlsx") Normalise.rules = json.load(open('../rules.json')) result = Normalise.checkRules() ``` Rules are added in a JSON file. Every column has its own rule, however rules without a column name are seen as global rules. ```json [ { "reset_coverage":"True", "action": "np.nan", "verbose": "to_file" }, { "column": "city", "check_coverage": "10", "selection": [ "Aa en Hunze", "Aalsmeer", "Aalten", "Achtkarspelen"] }, { "column": "postalCode", "type": "postal_code" } ] ``` Supported keys are | keys | value | explanation | Scope | | ------------------- | ----------------------------------------------------------- | ---------------------------------------------------------------------------------------- | --------------- | | verbose | to_file / silent... | How do you want to be notified of errors? | Column / Global | | column | gemeente | On which column does this rule apply | Column | | type | postal_code / int / string... | What should the values of this column be | Column | | action | np-nan | What to do with the value if incorrect | Column / Global | | selection | [ "Aa en Hunze", "Aalsmeer", "Aalten"] | The values must be one of these values | Column | | one_hot_encoding | "prefix" or "" with no prefix | Concat pandas dummies on the dataframe | Column | | normalise | ["remove_special"] or ["remove_special", "capitalize"] | Apply one or more normalisation rules | Column | | normalise_columns | ["remove_special"] or ["remove_special", "capitalize"] | Apply one or more normalisation rules on all columns (global rule) or on a single column | Column / Global | | concat | {"name": "uniqueID", "columns": ["col1", "col2"]} | Concatenate columns together (for unique ID generation) | Global | | operator | {"name": "divide", "columns": ["A", "B"], "type": "divide"} | Apply operator on two columns, result to new column | Global | | drop_duplicates | ["col1", "col2"] | Drop duplicates based on a subset of column values | Global | | shared_colname_drop | "anything" | If multiple shared column names, keep one | Global | | timestamp | "%Y-%m-%d" | Map values to a datetime object, skip rule immediatly if one value fails | Column | | range_time | ["2017-12-01 00:00:00", "2012-12-01 00:00:00"] | Converts data to timestamp and checks range, combining with timestamp improves speed | Column | | fillna | value | Fill every NaN value to a key | Column / Global | | range | ["-inf", 2010] / [2010, "inf"] | Number range, left <= value >= right | Column | | mapping | {"bad": "0","moderate": "1"} | Map row values to something else | Column / Global | | column_mapping | {"postcode": "postal_code","stad": "city"} | Map column values to something else | Column / Global | | regex | [1-9][0-9]?$^100$ | Column value should look like this regex | Column | | check_coverage | 50 | Take a smaller sample of the column, in percentage | Column / Global | | reset_coverage | True / False | If an error is found in the sample, fall back to 100% | Column / Global | Supported values for types | type | explanation | | -------------------- | ------------------------------------------------------------------------------------- | | int | accepts ints and floats get decimal removed | | positive-int | same as int but only positive and zero | | negative-int | same as int but only negative | | string | characters accepted | | float | decimal numbers accepted | | boolean | makes lowercase and accepts true / false | | postal_code | Removes special chars, accepts 1111ab and 1111AB | | street | Accepts letters, spaces and '. Makes first character and characters after ' uppercase | | latitude / longitude | accepts 32.111111 format | | letters | only accepts letters | Normalise options for column and rows | value | explanation | | -------------------- | -------------------------------- | | capitalize | Make first character uppercase | | lowercase | Make whole string lowercase | | uppercase | Make whole string uppercase | | remove_special | Remove special characters | | remove_whitespace | Remove whitespaces | | spaces_to_underscore | Make every space an underscore _ | | spaces_to_hyphen | Make every space a hyphen - | Operator options | operator | explanation | | -------- | -------------------------------------------------------------- | | divide | Divide two columns on row level and put result to new column | | multiply | Multiply two columns on row level and put result to new column | Fillna options | action | explanation | value | | -------------- | --------------------------------------------------------- | --------------------------- | | fillna | Fill NaN value with something else | Some thing to fill NaN with | | fillna_diffcol | Fill NaN with the value of a different column on that row | Other column name | | fillna_mean | Fill NaN with mean of column | Doesn't matter | | fillna_median | Fill NaN with median of column | Doesn't matter | Supported values for action | action | explanation | | ------ | ------------------------------- | | np.nan | Replaces mismatches with np.nan | | drop | Drop the row | Supported values for verbose | action | explanation | | ---------- | -------------------------------------------------- | | to_console | (DEFAULT) Print errors to console | | to_file | Print errors to a file with a timestamp.txt format | | silent | Dont print | ### obtainKeys Function that will find keys needed for the app, needs app_data. Returns keys ```python Validate = DataNormalizer.Validate() Validate.app_data = Clappform.App("appname").ReadOne(extended=True) Validate.obtainKeys() ``` ### matchKeys Function that will find missing keys, needs app_data and dataframe. Returns missing and additional keys ```python Validate = DataNormalizer.Validate() Validate.app_data = Clappform.App("appname").ReadOne(extended=True) Validate.dataframe = pandas.read_excel("../data.xlsx") Validate.matchKeys() ``` ### fixMismatch Function that will suggest changes to your dataset based on missing keys, needs app_data and dataframe. Lowering the strictness will increase the amount of matches with possible keys. Needs app_data and dataframe. Interaction via terminal. ```python Validate = DataNormalizer.Validate() Validate.app_data = Clappform.App("appname").ReadOne(extended=True) Validate.dataframe = pandas.read_excel("../data.xlsx") Validate.fixMismatch(strictness = 0.8) ```

نیازمندی

مقدار	نام
-	numpy
-	pandas
-	mock
-	openpyxl

نحوه نصب

نصب پکیج whl DataNormalizer-1.3.4:

pip install DataNormalizer-1.3.4.whl

نصب پکیج tar.gz DataNormalizer-1.3.4:

pip install DataNormalizer-1.3.4.tar.gz