معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

异步高并发dblp爬虫，慎用

ویژگی	مقدار
سیستم عامل	-
نام فایل	dblp-crawler-1.6.2.1
نام	dblp-crawler
نسخه کتابخانه	1.6.2.1
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	yindaheng98
ایمیل نویسنده	yindaheng98@163.com
آدرس صفحه اصلی	https://github.com/yindaheng98/dblp-crawler
آدرس اینترنتی	https://pypi.org/project/dblp-crawler/
مجوز	-

# dblp-crawler Asynchronous high-concurrency dblp crawler, use with caution! 异步高并发dblp爬虫，慎用！ ## Install ```sh pip install dblp-crawler ``` ## Usage ### Help ```sh python -m dblp_crawler -h usage: __main__.py [-h] [-y YEAR] -k KEYWORD [-p PID] [-j JOURNAL] {networkx,neo4j} ... positional arguments: {networkx,neo4j} sub-command help networkx networkx help neo4j neo4j help optional arguments: -h, --help show this help message and exit -y YEAR, --year YEAR Only crawl the paper after the specified year. -k KEYWORD, --keyword KEYWORD Specify keyword rules. -p PID, --pid PID Specified author pids to start crawling. -j JOURNAL, --journal JOURNAL Specify author journal keys to start crawling. ``` ```sh python -m dblp_crawler networkx -h usage: __main__.py networkx [-h] --dest DEST optional arguments: -h, --help show this help message and exit --dest DEST Path to write results. ``` ```sh python -m dblp_crawler neo4j -h usage: __main__.py neo4j [-h] [--auth AUTH] --uri URI optional arguments: -h, --help show this help message and exit --auth AUTH Auth to neo4j database. --uri URI URI to neo4j database. ``` ### Config environment variables * `DBLP_CRAWLER_MAX_CACHE_DAYS_PERSON`: * save cache for a person page for how many days * default: `30` * `DBLP_CRAWLER_MAX_CACHE_DAYS_JOURNAL`: * save cache for a journal page for how many days * default: `300` * `DBLP_CRAWLER_MAX_CACHE_DAYS_JOURNAL_LIST` * save cache for a journal list page for how many days * default: `300` ### Write to a JSON file e.g. write to `summary.json`: ```sh python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu networkx --dest summary.json ``` ### Write to a Neo4J database e.g. write to `neo4j://10.128.202.18:7687`: ```sh python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu neo4j --uri neo4j://10.128.202.18:7687 ``` ### Only crawl the paper after specified year e.g. crawl the paper after 2016 (include 2016) ```sh python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu -y 2016 networkx --dest summary.json ``` ### Keywords with two or more words e.g. super resolution (publications with title contains both "super" and "resolution" will be selected) ```sh python -m dblp_crawler -k video -k edge -p l/JiangchuanLiu -k "'super','resolution'" networkx --dest summary.json ``` ### Init authors from journal e.g. init authors from ACM MM (`db/conf/mm` is the key for ACM MM in dblp: "https://dblp.org/db/conf/mm/index.xml") ```sh python -m dblp_crawler -k video -k edge -j db/conf/fast networkx --dest summary.json ``` ### Init authors from journal in some variables e.g. there is a `CCF_A` in `dblp_crawler.data` contains keys of CCF A conferences ```sh python -m dblp_crawler -k video -k edge -j "importlib.import_module('dblp_crawler.data').CCF_A" networkx --dest summary.json ``` ## `dblp_crawler.filter` Usage ### Help ```sh python -m dblp_crawler.filter -h usage: __main__.py [-h] -i INPUT -o OUTPUT -f FILTER optional arguments: -h, --help show this help message and exit -i INPUT, --input INPUT Input file path. -o OUTPUT, --output OUTPUT Output file path. -f FILTER, --filter FILTER Filter functions. ``` ### Use internal filter e.g. `drop_old_publications` is an internal function that drop publication by year ```sh python -m dblp_crawler.filter -i summary.json -o summary.filter.json -f "lambda summary: drop_old_publications(summary, 2016)" ``` ### Use multiple filters e.g. `drop_old_publications` is an internal function that drop publications by year; `drop_nodes_by_all_publications` is an internal function that drop nodes by the sum of publications ```sh python -m dblp_crawler.filter -i summary.json -o summary.filter.json \ -f "lambda summary: drop_old_publications(summary, 2016)" \ -f "lambda summary: drop_old_person_publications(summary, 2018)" \ -f "lambda summary: drop_old_cooperation(summary, 2018)" \ -f "lambda summary: drop_nodes_by_all_publications(summary, 4)" \ -f "lambda summary: drop_edges_by_all_publications(summary, 4)" ``` ### Use your own filter e.g. another method to use `-f "lambda summary: drop_old_publications(summary, 2016)"` ```sh python -m dblp_crawler.filter -i summary.json -o summary.filter.json -f "lambda summary: importlib.import_module('dblp_crawler.filter').drop_old_publications(summary, 2016)" ```

نحوه نصب

نصب پکیج whl dblp-crawler-1.6.2.1:

pip install dblp-crawler-1.6.2.1.whl

نصب پکیج tar.gz dblp-crawler-1.6.2.1:

pip install dblp-crawler-1.6.2.1.tar.gz