معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

DATAFRAMES COMPARISON

ویژگی	مقدار
سیستم عامل	-
نام فایل	comp-df-syren-0.2.1
نام	comp-df-syren
نسخه کتابخانه	0.2.1
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	seshkalyan
ایمیل نویسنده	dseshkalyan@gmail.com
آدرس صفحه اصلی	-
آدرس اینترنتی	https://pypi.org/project/comp-df-syren/
مجوز	SYREN CLOUD TECHNOLOGIES

## What is this about? Imagine you want to compare various iterations of a table or data file. You may need to ensure that your versions of a list that is maintained by more than one person or system are same. When you have to solve this manually, it is difficult. So, let's make use of this library right now! ## How does it work? The script takes an entry's ID value from the first table, looks for it in the second table and sees if it can find it there. If it doesn't, it will list this entry as an entry that was found in table 1, but not in table 2. This will happen vice versa for table 2 against table 1. Moreover, the script compares values of entries with the same ID, and if they are different, the script lists these differences. ## What does it require? The table entries MUST have a unique identifier, or key value. For example, this could be an order number, or an employee number - but it could also be other characters, as long as it is unique to an entry. This key value MUST be located in the first column of your table. The orders of the other columns, as well as the orders of the rows in general, is completely irrelevant.You have a use case, too, we hope this library will save you some time! Version Release Beta ========== 0.0.1 (29/06/2022) ------------------- pip install pandas pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. pip install numpy NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. import pandas as pd import numpy as np '''// DESCRIPTION: The comparison requires comparing table1 against table2 and vice versa in order to find values that are in one table, but not the other,therefore, the files are imported based on current comparison mode, and in mode 2to1, their order is switched to avoid differences being identified based on different data formats: convert numbers to equal format/ "syntax", then to string ///''' def toString(x): # print('from ', x, type(x)) if type(x) == str: return x if type(x) == np.float64 or type(x) == np.int64: x.tolist() if x == None: x = 'NaN' x = str(float(x)) if x[-1] == '0': x = x[0:-2] # removing .0 from the float # print('to ', x, type(x)) return str(x) '''///----Comparison funct----///''' def runComparison(findDifferingValues, fileName1, fileName2, df1, df2): df_comparison = pd.DataFrame() # tables have to be initalized because they are later referred to for concatenation df_entrynotFound = pd.DataFrame() # in both tables, the first column must be the one that contains the entries' ID/key keyCol = df2.iloc[:, 0] # 1st column in the table we're comparing to keyColArray = list(map(toString, keyCol.values)) # converting entries to string for comparison # print('to str: ', test, pd.Series(keyColArray)) keyCol = pd.Series(keyColArray) '''//---compare rows funct---//''' def compareRows(rowA): key = toString(rowA[0]) # saving the key value from 1st table so I can get corresponding value from 2nd table deviations = pd.DataFrame() # here's where I'll store the info on deviating entries # now going through every col entry in rowA for category, value in rowA.items(): if category not in df2.columns: # if category(column) does not exist in table 2, ignore & break break valueBList = pd.Series(df2[category].where( keyCol == key).values) # identifiying value for same ID/col in table2 > array with comparison resut for each row valueB = list(filter(lambda x: pd.isna(x) == False, valueBList)) # from results array, filtering out only those values that are not 'NaN' if (pd.isna(valueBList.values)).all(): # if the result contains only NaN values > cell is empty: if (pd.isna(value) or value == float( 'nan')): # see if corresponding value in rowA is NaN, too, in which case don't list as difference break else: # if corresponding value in rowA is not NaN, redefine valueB so it's not an empty array valueB = ['#empty!'] value = toString(value) valueB[0] = toString(valueB[0]) # print(value, valueB[0]) if value != valueB[ 0]: # now actually comparing value of key/colName in table1 vs same key/colName in table2 # print('deviating value: ', valueB[0]) # print(f'Difference found in entry with ID {key}, in category: {category}') #optional: log the info newRow = [[key, category, value, str(valueB[0])]] # print(newRow) #activate when debugging add = pd.DataFrame(newRow, columns=[('Entry ID in ' + fileName1), 'Difference found in', ('Value in ' + fileName1), ('Value in ' + fileName2)]) deviations = pd.concat([deviations, add]) return deviations #--------------------------------end of compare rows funct----------------------------------//''' for currentRow, values in df1.iterrows(): # going through each row in table 1 rowDf1 = pd.Series(df1.iloc[currentRow]) # grab values from current row of table 1, save as Series key = toString(rowDf1.values[0]) # set entry in 1st column as key to look for in table 2 '''//now, look for this key in table 2. If it is not found, add to "not found" sheet for final results file//''' if df2.loc[keyCol == key].empty: newRow = [[key, rowDf1.index[1], rowDf1.values[1]]] result = pd.DataFrame(newRow, columns=[('Key not found in ' + fileName2), ('name next col ' + fileName1), ('value next col ' + fileName1)]) df_entrynotFound = pd.concat([df_entrynotFound, result]) # add result to compare df # print(f'ID no. {key} not found in table 2') #optional: log the info '''//if it is found: identify differences and store them in table for for final results file, except for comparison mode '2to1' where differences have already been found + listed and we are only interested in entries that are in table2 but not table1 //''' if df2.loc[keyCol == key].empty == False and findDifferingValues == True: result = compareRows( rowDf1) # pass rowDf1 to compare funct to compare it to corresponding row in table2 df_comparison = pd.concat([df_comparison, result]) # print(result) return [df_comparison, df_entrynotFound] '''//-----End of comparison funct-------//''' '''//save results as CSV files, as well as different sheets in same Excel file. Use excel format and writer for that. Index not included in files//''' def saveToFile(data, name, writer): data.to_excel(writer, sheet_name=str(name), index=False) #data.to_csv(str(name), index=False) ---------- Usage ------------------- Install the published library: pip install comp-df-syren==0.2.0

نحوه نصب

نصب پکیج whl comp-df-syren-0.2.1:

pip install comp-df-syren-0.2.1.whl

نصب پکیج tar.gz comp-df-syren-0.2.1:

pip install comp-df-syren-0.2.1.tar.gz