معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Convert several file data sources (like typical CSV or Excel) to python objects in a easy manner.

ویژگی	مقدار
سیستم عامل	-
نام فایل	filedatasource-1.1.9
نام	filedatasource
نسخه کتابخانه	1.1.9
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	José Manuel Gómez Soriano
ایمیل نویسنده	jmgomez.soriano@gmail.com
آدرس صفحه اصلی	https://github.com/jmgomezsoriano/filedatasource
آدرس اینترنتی	https://pypi.org/project/filedatasource/
مجوز	lGPLv3

# File data-source The easiest way to parse CSV and Excel files, even compressed CSV files. For example, with 2 or 1 line you can read a Excel or CSV file, even if it is a compressed CSV file: ```python from filedatasource import open_reader, excel2objects # Read a whole Excel file and obtain a list of objects, # with the fieldnames as object attributes with open_reader('data.csv.gz') as reader: objs = reader.read_objects() print(objs[0].name) # Or, even, easier: objs = excel2objects('data.xlsx') print(objs[0].name) # If you do not want to load all the file to memory, you can do with open_reader('data.csv.gz') as reader: for obj in reader: print(obj.name) ``` ## Install ```shell script pip install filedatasource ``` If you want to read an Excel file in xlsx format, you need to install the openpyxl module: ```shell script pip install openpyxl ``` If you want to create an Excel file in xlsx format, you need to install the xlsxwriter module: ```shell script pip install xlsxwriter ``` If you want to read an Excel file in old xls format, you need to install the __xlrd__ module: ```shell script pip install xlrd ``` Finally, if you want to write into an old xls format, you need to install the __xlwt__ module: ```shell script pip install xlwt ``` ## For what kind of file formats it is used **filedatasource** is a Python module that allows to extract the data form typical CSV or Excel files with the following format: ```text field_1,field_2,field_3,... cell_1_1,cell_1_2,cell_1_3,... cell_2_1,cell_2_2,cell_3_3,... ... ``` That means, the first row is used for the column names, and the rest of rows are the value of those columns. For example, with this **data.csv** file: ```text name,surname,address Jhon,Smith,Oxford street Maria,Ortega,George Washington street ``` Or an Excel file with the following content: |name |surname|address | |-----|-------|------------------------| |Jhon |Smith |Oxford street | |Maria|Ortega |George Washington street| **Note**: This module does not work with other formats. If you need to parse more complex files, you need to use Panda or native modules as openpyxl or xlsxwriter. This module is thought for parsing or generating typical data sources based on files in data scientists processes. ## How to read The standard way to read a CSV is: ```python from filedatasource import CsvReader with CsvReader('data.csv') as reader: for person in reader: print(person.name, person.surname, person.address) ``` If the file name ends with .gz then, __CsvReader()__ will decompress the file during the reading. By default, __ExcelReader__ or __CsvReader__ obtain objects with the fieldnames as attributes. **Note**: All the data extracted with CsvReader will be of type str. ### Important note This method is not recommended when the column names are not valid Python identifiers because this method will change the no valid identifiers following these rules: * If the field name has a character that is not valid as Python identifier, it will be replaced by _. * If the field name starts with a number, then the character 'n' is added at the beginning. For example, if you have the following data file in CSV: ```text 1d,D&D,Figures 20,Main dice all ... ``` The identifier for the object attributes will be __n1d__, __D_D__ and __Figures__, respectively. The last field does not change because it is a valid Python identifier. For this reason in these cases it is better to use the parameter __mode__, which allows to change the mode to obtain the information in each iteration. This way, you can obtain a dictionary (__mode=ReadMode.DICT__) or a list (__mode=ReadMode.LIST__) instead of a Python object. For example, this sentences do the same that the previous code does: ```python from filedatasource import CsvReader, ReadMode # To get a dictionary instead an objects in each iteration with CsvReader('data.csv', mode=ReadMode.DICT) as reader: for person in reader: print(person['name'], person['surname'], person['fullname']) # To get a list of values instead of an object with attributes with CsvReader('data.csv', mode=ReadMode.LIST) as reader: for person in reader: print(person[0], person[1], person[2]) ``` All these operations can be done with Excel files using __ExcelReader__. For example, to read the **data.xlsx** with the above content you can do the following: ```python from filedatasource import ExcelReader with ExcelReader('data.xslx') as reader: for person in reader: print(person.name, person.surname, person.address) ``` If the file name ends with **.xls** instead of **.xlsx**, this class will read it correctly too. Regardless of using __CsvReader__ or __ExcelReader__, you can use the __tqdm()__ method to measure the time that you process will take. For example: ```python from filedatasource import ExcelReader, CsvReader from tqdm import tqdm with ExcelReader('data.xslx') as reader: for person in tqdm(reader, desc='Reading the Excel file'): print(person.name, person.surname, person.address) with CsvReader('data.xslx') as reader: for person in tqdm(reader, desc='Reading the CSV file'): print(person.name, person.surname, person.address) ``` However, the __CsvReader__ needs, previously, to pre-read the entire file to calculate the number of rows, and it might take a bit longer. This problem does not occur with the Excel file because it is fully loaded into memory. In an Excel file you can know the list of sheet names by means of sheets(): ```python from filedatasource import sheets # It detects the type of the Excel file by the extension of the filename. sheets('test/Example.xlsx') sheets('test/Example.xls') ``` ## How to write a data file With **filedatasource** is very easy to write, both, a CSV file (compressed or not) and an Excel file (xls or xlsx). Only you need to use __CsvWriter__ or __ExcelWriter__ writing the rows by parameters, with a list of values or with a dictionary. For example: ```python from filedatasource import CsvWriter # Writing a row by means of parameters, a list of values or a dictionary. with CsvWriter('data.csv.gz', fieldnames=['a', 'b', 'c']) as writer: writer.write_row(a=1, b=2, c=3) writer.write([2, 4, 7]) writer.write({'a': 4, 'b': 8, 'c': 31}) ``` If the file name ends with .gz, it will be compressed automatically. Moreover, you can store an object like this: ```python from filedatasource import CsvWriter class Example(object): def __init__(self, a: int, b: int, c: int): self.a = a self.b = b self.c = c # Writing with Python objects with CsvWriter('data.csv.gz', fieldnames=['a', 'b', 'c']) as writer: writer.write(Example(1, 2, 3)) writer.write(Example(4, 5, 6)) ``` If you in your object class defines properties, you can use it directly as __fieldnames__ parameter. For example: ```python from filedatasource import CsvWriter class Employee(object): @property def full_name(self) -> str: return self.name + ' ' + self.surname @property def name(self) -> str: return self.__name @property def surname(self) -> str: return self.__surname def __init__(self, name: str, surname: str): self.__name = name self.__surname = surname # Writing an object and defining the fieldnames automatically from the object property with CsvWriter('data.csv', fieldnames=Employee) as writer: writer.write(Employee('John', 'Smith')) writer.write(Employee('Maria', 'Ortega')) ``` By default, if the file 'data.csv' exists, it will be removed. In order to avoid it and add the rows at the end of the file, you can use the __mode__ parameter with the value __Mode.APPEND__ like this: ```python from filedatasource import CsvWriter, Mode # Creating a CsvWriter in APPEND mode with CsvWriter('data.csv', fieldnames=['a', 'b', 'c'], mode=Mode.APPEND) as writer: writer.write_dicts([ {'a': 10, 'b': 11, 'c': 12}, {'a': 13, 'b': 14, 'c': 15} ]) from filedatasource import ExcelWriter, Mode # Creating a ExcelWriter in APPEND mode (important note below) with ExcelWriter('data.xlsx', fieldnames=['a', 'b', 'c'], mode=Mode.APPEND) as writer: writer.write_dicts([ {'a': 10, 'b': 11, 'c': 12}, {'a': 13, 'b': 14, 'c': 15} ]) ``` **Important note**: If an Excel file has more than one sheet, the append mode will remove all the rest sheets. ## Working with Excel sheets __ExcelReader__ only works with one Excel sheet at a time. You can select the sheet to read using the parameter __sheet__ using the sheet name or its number (starting by 0). For example, if we have the file **Example.xls** with two sheets, the first one named **Clients**, and the second one named **Suppliers**, you can do the following: ```python from filedatasource import ExcelReader # Reading the first sheet of the file 'Example.xls' with ExcelReader('Example.xls', sheet=0) as reader: for row in reader: print(row) # Reading the second sheet of the file 'Example.xls' with ExcelReader('Example.xls', sheet=1) as reader: for row in reader: print(row) # Reading the sheet named 'Clients' of the file 'Example.xls' with ExcelReader('Example.xls', sheet='Clients') as reader: for row in reader: print(row) # Reading the sheet named 'Suppliers' of the file 'Example.xls' with ExcelReader('Example.xls', sheet='Suppliers') as reader: for row in reader: print(row) ``` ## Reading and writing whole files With **filedatasource** you can read or write a whole data file in two lines using __DataWriter.write_lists()__, __DataWriter.write_dicts()__, and __DataWriter.write_objects()__ for writing list of lists, dictionaries and object, respectively; and __DataReader.read_lists()__, __DataReader.read_rows()__, and __DataReader.read_objects()__ for getting a list of lists, dictionaries or objects with the all the file data. For example, writing several rows: ```python from filedatasource import CsvWriter with CsvWriter('data.xlsx', fieldnames=['a', 'b', 'c']) as writer: writer.write_lists([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]) writer.write_dicts([ {'a': 10, 'b': 11, 'c': 12}, {'a': 13, 'b': 14, 'c': 15} ]) writer.write_objects([ Example(16, 17, 18), Example(19, 20, 21), Example(22, 23, 24) ]) ``` All these methods also work with __ExcelWriter__. For reading a whole file, you only need to write the following sentences: ```python from filedatasource import CsvReader, ExcelReader # Read a whole compressed CSV file and get a list of lists with CsvReader('data.csv.gz') as reader: lists = reader.read_lists() fieldnames = reader.fieldnames # This line is only necessary if you want to get the fieldnames for read_lists() print(fieldnames[0], lists[0][0]) # Read a whole CSV file and obtain a list of dictionaries, with the fieldnames as key and the data the values. with CsvReader('data.csv') as reader: dicts = reader.read_rows() print(lists[0]['name']) # Read a whole Excel file and obtain a list of objects, with the fieldnames as object attributes with ExcelReader('data.xlsx') as reader: objs = reader.read_objects() print(lists[0].name) ``` ## Convert from CSV to Excel and vice-versa With these tools, it is very simply to convert from a CSV to an Excel file and viceversa: ```python from filedatasource import ExcelWriter, CsvReader with CsvReader('data.csv') as reader: with ExcelWriter('data.xlsx', fieldnames=reader.fieldnames) as writer: for obj in reader: writer.write(obj) ``` However, it can be made easier: ```python from filedatasource import ExcelWriter, CsvReader with CsvReader('data.csv') as reader: with ExcelWriter('data.xlsx', fieldnames=reader.fieldnames) as writer: writer.import_reader(reader) ``` What's more, it can be even easier: ```python from filedatasource import convert convert('data.csv', 'data.xlsx') ``` ## Compare the content of two files You can use __equals()__ to check if two file data sources have the same content independently if they are the same file type or different. However, if one of the files are CSV and the other Excel, and some columns are not strings, the comparison are going to fail. It is better to compare the same type of files or to work only with strings to be sure the comparison is correct. The syntax os __equals()__ is very easy: ```python from filedatasource import equals print(equals('test.xlsx', 'test.csv.gz')) ``` ## Reading CSV with field types In Excel files you can save information maintaining their original data types. For example, if you store a row with different field types, you can retrieve the information with the same vale types. For instance: ```python from filedatasource import ExcelWriter, ExcelReader with ExcelWriter('data.xlsx', fieldnames=['a', 'b', 'c', 'd']) as writer: writer.write(['One', 2, 3.0, True]) with ExcelReader('data.xlsx') as reader: print(reader.read_list()) # --> print ['One', 2, 3.0, True] ``` However, a CSV file does not contain any information about field types. And using CsvWriter and CsvReader in the same way you will get the list ['One', '2', '3.0', 'True']. That means, all the values in string format. To solve this, you can use the **types** parameter as following: ```python from filedatasource import CsvWriter, CsvReader with CsvWriter('data.csv', fieldnames=['a', 'b', 'c', 'd']) as writer: writer.write(['One', 2, 3.0, True]) # Define how the data type of each field using a list with CsvReader('data.csv', types=[str, int, float, bool]) as reader: print(reader.read_list()) # -> print ['One', 2, 3.0, True]) # Define how the data type of each field using a dictionary with CsvReader('data.csv', types={'a': str, 'b': int, 'c': float, 'd': bool}) as reader: print(reader.read_list()) # -> print ['One', 2, 3.0, True]) ``` This way, you can define the data values in the correct field type. If there are fields without defining, then a string value will be returned for those fields. ## Writers with predefined fieldnames If you are using the same writer several times, you can use a subclass in order to avoid define the __fieldnames__ several times overwriting the fieldnames property. For example: ```python from filedatasource import CsvWriter, Mode from typing import List class TestWriter(CsvWriter): @property def fieldnames(self) -> List[str]: return ['a', 'b', 'c'] with TestWriter('data.csv') as writer: writer.write_lists([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]) with TestWriter(('data.csv', mode=Mode.APPEND) as writer: writer.write_dicts([ {'a': 10, 'b': 11, 'c': 12}, {'a': 13, 'b': 14, 'c': 15} ]) ``` However, it could be easier using just one line using the functions __load()__ and __save()__, __save_dicts()__ or __save_lists()__. This method load and/or save Excel or CSV files based on the file extension. ```python from filedatasource import load, save, save_dicts, save_lists, ReadMode # Obtain a list of objects from a CSV file and print the first row. objs = load('data.csv') print(objs[0].a, objs[0].b, objs[0].c) # Obtain a list of dictionaries from a compressed CSV file and print the first row. dicts = load('data.csv.gz', mode=ReadMode.DICT) print(dicts[0]['a'], dicts[0]['b'], dicts[0]['c']) # Obtain a list of lists from an Excel file and print the first row. lists = load('data.xlsx', mode=ReadMode.LIST) print(lists[0][0], lists[0][1], lists[0][2]) # Write a list of objects in a XLS file data_objects = [ Example(16, 17, 18), Example(19, 20, 21), Example(22, 23, 24) ] save('data.xls', data_objects) # Write a list of dictionaries in a XLS file data_dict = [ {'a': 10, 'b': 11, 'c': 12}, {'a': 13, 'b': 14, 'c': 15} ] save_dicts('data.xls', data_dict) # Write a list of lists in a XLSX file with the column names 'a', 'b' and 'c' data_list = [ [1, 2, 3], [4, 5, 6], [7, 8, 9] ] save_lists('data.xlsx', data_list, fieldnames=['a', 'b', 'c']) ``` It is possible to use the function __load_objs()__, __load_dicts()__ and __load_lists()__ instead of the function __load()__ with the parameter __mode__. Also, it is possible to use the functions __list2csv()__, __dict2csv()__, __objects2csv()__, __list2excel()__, __dict2excel()__ and __objects2excel()__ in order to force the format of the output file independently of the file extension: ```python from filedatasource import list2csv, dict2csv, objects2csv from filedatasource import list2excel, dict2excel, objects2excel data_list = [ [1, 2, 3], [4, 5, 6], [7, 8, 9] ] data_dict = [ {'a': 10, 'b': 11, 'c': 12}, {'a': 13, 'b': 14, 'c': 15} ] data_objects = [ Example(16, 17, 18), Example(19, 20, 21), Example(22, 23, 24) ] # Write a list of lists into a CSV file list2csv('data.csv', data_list, fieldnames=['a', 'b', 'c']) # Write a list of dicts into a CSV file dict2csv('data.csv', data_dict, fieldnames=['a', 'b', 'c']) dict2csv('data.csv', data_dict) # Use the key as fieldnames # Write a list of objects into a compressed CSV file objects2csv('data.csv.gz', data_objects, fieldnames=['a', 'b', 'c']) objects2csv('data.csv.gz', data_objects) # Use object attributes as fieldnames # Write a list of lists into an Excel file list2excel('data.xlsx', data_list, fieldnames=['a', 'b', 'c']) # Write a list of dicts into an Excel file dict2excel('data.xlsx', data_dict, fieldnames=['a', 'b', 'c']) # Write a list of objects into a XLS file objects2excel('data.xls', data_objects, fieldnames=['a', 'b', 'c']) ``` And, you can also use the equivalent functions __csv2list()__, __csv2dict()__, __csv2objects()__, __excel2list()__, __excel2dict()__, and __excel2objects()__. For example: ```python from filedatasource import csv2list, csv2dict, excel2objects # Read a whole compressed CSV file and get a list of lists lists = csv2list('data.csv.gz') print(lists[0][0]) # Read a whole CSV file and obtain a list of dictionaries, with the fieldnames as key and the data the values. dicts = csv2dict('data.csv') print(lists[0]['name']) # Read a whole Excel file and obtain a list of objects, with the fieldnames as object attributes objs = excel2objects('data.xlsx') print(lists[0].name) ``` ## Builders You have two builders to create readers or writers based on the file extension with default arguments. ```python from filedatasource import open_reader, open_writer # Read a CSV with open_reader('data.csv') as reader: ... # Read a compressed CSV with open_reader('data.csv.gz') as reader: ... # Read the first sheet from an Excel with open_reader('data.xlsx') as reader: ... # Write a CSV with open_writer('data.csv', fieldnames=[...]) as writer: ... # Write a compressed CSV with open_writer('data.csv.gz', fieldnames=[...]) as writer: ... # Write an Excel file creating a default sheet with open_writer('data.xlsx', fieldnames=[...]) as writer: ... ``` Using the __open_reader()__ you can use the __mode__ parameter to set how get each row. ```python from filedatasource import open_reader, ReadMode # Reading the file and returning objects with open_reader('data.xlsx') as reader: for obj in reader: print(obj.a, obj.b, obj.c) # Read the file and returning dictionaries with open_reader('data.csv', mode=ReadMode.DICT) as reader: for d in reader: print(d['a'], d['b'], d['c']) # Read the file and returning lists of values with open_reader('data.csv.gz', mode=ReadMode.LIST) as reader: for lst in reader: print(lst[0], lst[1], lst[2]) ``` ## Size calculation It is possible to know how many rows has a data source for both, writer or reader. For example: ```python from filedatasource import open_reader, open_writer # How to know the number of rows in a reader with data with open_reader('data.csv') as reader: print(len(reader)) # How to know the number of rows in a writer with open_writer('data.csv', fieldnames=['a', 'b', 'c']) as writer: print(len(writer)) # Print 0 because the file is opened for write. writer.write(a=1, b=2, c=3) writer.write(a=4, b=5, c=6) print(len(writer)) # Print 2 because has been added two rows. ``` **Note**: The file length calculation does not work with readers oppened with a file stream instead of a file path or with a writer opened with append mode and from a file stream.

زبان مورد نیاز

مقدار	نام
>=3.6	Python

نحوه نصب

نصب پکیج whl filedatasource-1.1.9:

pip install filedatasource-1.1.9.whl

نصب پکیج tar.gz filedatasource-1.1.9:

pip install filedatasource-1.1.9.tar.gz