معرفی شرکت ها


filedep-0.0.4


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

filedep: A small python tool to check file dependency
ویژگی مقدار
سیستم عامل -
نام فایل filedep-0.0.4
نام filedep
نسخه کتابخانه 0.0.4
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Francis Cong
ایمیل نویسنده flcong@outlook.com
آدرس صفحه اصلی https://github.com/flcong/filedep
آدرس اینترنتی https://pypi.org/project/filedep/
مجوز MIT
# filedep: A small python tool to check file dependency ## Motivation When doing empirical analysis, you may encounter the following issue about file dependency. * Suppose `code.py` (or `code.do`, `code.sas`, `code.m`, `code.R`, etc) reads data from `indata.csv`, does some data cleaning, and then saves the intermediary data as `outdata.csv`. * After using `outdata.csv` to run some statistical tests, you want to change the data cleaning procedure a bit, so you modify `code.py`. * If there are only one code file and two data files, you will easily remember to re-run `code.py` to update the output data `outdata.csv`. * However, suppose that `outdata.csv` is then used by `code2.py` to write `finaldata.csv`. Then, people may easily forget to re-run `code2.py` as well to update `finaldata.csv`. * As a result, this may cause the illusion that results change after you run the same set of code twice. For example, you forget to update `finaldata.csv` initially, but then accidentally update `finaldata.csv` some time later. Then, you find that results change after you run the same set of code. To resolve this issue, I build this simple package to check file dependencies based on last modified time. Users can define file dependencies, such as `code.py` using `pre1.csv` and `pre2.csv` as input to write `post1.csv` and `post2.csv`. Then, the function in the package will check if the last modified times of both `pre1.csv` and `pre2.csv` are before that of `code.py` and the last modified times of both `post1.csv` and `post2.csv` are after that of `code.py`. If any file dependency is broken, the broken ones will be printed or saved to a file. ## Installation Use `pip` to install the package as follows: ```python pip install filedep ``` ## Usage Import the package using ```python import filedep ``` The key function is `check_dep(deps, outfile=sys.stdout, reterr=False)`. The first argument is a list of dependencies (defined below). The second argument specifies where to print error information if any file dependency is broken. The default is `sys.stdout`. The third argument specifies if broken dependencies are returned from the function. This is mainly for testing purposes. The default is `False`, i.e., broken dependencies are only printed. The file dependencies have to be provided by the user using the format defined below. In the `template` folder, there is a template to define dependencies and use `check_dep()` function to check. ### Example 1. No broken file dependencies The following code creates several empty files: ```python import filedep import time import os from os.path import join as pj PATH = r'C:\test_check_dep' if not os.path.exists(PATH): os.mkdir(PATH) def touch(filepath): if os.path.exists(filepath): os.utime(filepath) else: with open(filepath, 'a') as f: pass # Touch files in a specific order touch(pj(PATH, 'pre11.csv')) time.sleep(.1) touch(pj(PATH, 'pre12.csv')) time.sleep(.1) touch(pj(PATH, 'code1.py')) time.sleep(.1) touch(pj(PATH, 'post11.csv')) time.sleep(.1) touch(pj(PATH, 'post12.csv')) time.sleep(.1) # Define dependencies deps = [ ( [ pj(PATH, 'pre11.csv'), pj(PATH, 'pre12.csv') ], pj(PATH, 'code1.py'), [ pj(PATH, 'post11.csv'), pj(PATH, 'post12.csv'), ], ), ] filedep.check_dep(deps) ``` In `deps`, we define a single dependency as follows: under the directory `C:\test_check_dep`, `code1.py` reads `pre11.csv` and `pre12.csv` to produce `post11.csv` and `post12.csv`. Then, the last modified times of both `pre11.csv` and `pre12.csv` must be before that of `code1.py` and those of both `post11.csv` and `post12.csv` must be after that of `code1.py` Since the dependency is satisfied by construction, the output is ``` All file dependencies are verified! ``` ### Example 2. Broken file dependencies The following code creates several empty files and define two broken dependencies: ```python import filedep import time import os from os.path import join as pj PATH = r'C:\test_check_dep' if not os.path.exists(PATH): os.mkdir(PATH) def touch(filepath): if os.path.exists(filepath): os.utime(filepath) else: with open(filepath, 'a') as f: pass # Touch files in a specific order touch(pj(PATH, 'pre11.csv')) time.sleep(.1) touch(pj(PATH, 'pre12.csv')) time.sleep(.1) touch(pj(PATH, 'code1.py')) time.sleep(.1) touch(pj(PATH, 'post11.csv')) time.sleep(.1) # Note code1.py is newer than post11.csv touch(pj(PATH, 'code1.py')) time.sleep(.1) touch(pj(PATH, 'post12.csv')) time.sleep(.1) # Define dependencies deps = [ ( [ pj(PATH, 'pre11.csv'), ], pj(PATH, 'code1.py'), [ pj(PATH, 'post11.csv'), ], ), ( [ pj(PATH, 'pre11.csv'), pj(PATH, 'pre12.csv'), ], pj(PATH, 'code1.py'), [ pj(PATH, 'post11.csv'), pj(PATH, 'post12.csv'), ], ) ] filedep.check_dep(deps) ``` Here, we define 2 dependencies. The second one is the same as that in the previous example, but the first one defines a simpler dependency: `code1.py` uses `pre11.csv` to produce `post11.csv`. Since by construction `post11.csv` is "touched" before `code1.py`, both dependencies are broken. Hence, the output is ``` There are 2 broken file dependencies!!! [1] Last Modified Time Input: C:\test_check_dep\pre11.csv : 2021-10-14 14:25:11.011976 Code: C:\test_check_dep\code1.py : 2021-10-14 14:25:11.451668 Output: C:\test_check_dep\post11.csv : 2021-10-14 14:25:11.342247 [2] Last Modified Time Input: C:\test_check_dep\pre11.csv : 2021-10-14 14:25:11.011976 C:\test_check_dep\pre12.csv : 2021-10-14 14:25:11.125543 Code: C:\test_check_dep\code1.py : 2021-10-14 14:25:11.451668 Output: C:\test_check_dep\post11.csv : 2021-10-14 14:25:11.342247 C:\test_check_dep\post12.csv : 2021-10-14 14:25:11.559796 ``` where the last modified date of each file in each broken dependency is shown. ## Format of file dependency The first argument of `check_dep()` is a list of dependencies. Its format should be as follows: * It is a list of tuples. * Each tuple has three elements. - The first element is a list of `str`. - The second element is a `str`. - The third element is a list of `str`. - Each `str` is an absolute path of an existing file. As an example, the following code defines two dependencies: ```python deps = [ ( ['pre1.txt'], 'code1.py', ['post1.txt'] ), ( ['pre21.txt', 'pre22.txt'], 'code2.py', ['post21.txt', 'post22.txt'] ) ] ``` * The first one says that `code1.py` uses `pre1.txt` as input and outputs `post1.txt`. As a result, the last modified date of the three files should satisfy `pre1.txt<=post1.txt` and `code1.py<=post1.txt`. * The second one says that `code2.py` uses `pre21.txt` and `pre22.txt` as input and outputs `post21.txt` and `post22.txt`. As a result, the last modified date of the three files should satisfy `max(pre21.txt,pre22.txt,code1.py)<=min(post21.txt,post22.txt)` where `max` (`min`) represent the maximum (minimum) date.


نحوه نصب


نصب پکیج whl filedep-0.0.4:

    pip install filedep-0.0.4.whl


نصب پکیج tar.gz filedep-0.0.4:

    pip install filedep-0.0.4.tar.gz