معرفی شرکت ها


delver-0.1.6


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Modern user friendly web automation and scraping library.
ویژگی مقدار
سیستم عامل -
نام فایل delver-0.1.6
نام delver
نسخه کتابخانه 0.1.6
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Nuncjo
ایمیل نویسنده zoreander@gmail.com
آدرس صفحه اصلی https://github.com/nuncjo/delver
آدرس اینترنتی https://pypi.org/project/delver/
مجوز MIT license
Delver ====== Programmatic web browser/crawler in Python. **Alternative to Mechanize, RoboBrowser, MechanicalSoup** and others. Strict power of Request and Lxml. Some features and methods usefull in scraping "out of the box". Install ------- .. code:: shell $ pip install delver Documentation ------------- http://delver.readthedocs.io/en/latest/ Quick start - usage examples ---------------------------- - `Basic examples <#basic-examples>`__ - `Form submit <#form-submit>`__ - `Find links narrowed by filters <#find-links-narrowed-by-filters>`__ - `Download file <#download-file>`__ - `Download files list in parallel <#download-files-list-in-parallel>`__ - `Xpath selectors <#xpath-selectors>`__ - `Css selectors <#css-selectors>`__ - `Xpath result with filters <#xpath-result-with-filters>`__ - `Use examples <#use-examples>`__ - `Scraping Steam Specials using XPath <#scraping-steam-specials-using-xpath>`__ - `Simple tables scraping out of the box <#simple-tables-scraping-out-of-the-box>`__ - `User login <#user-login>`__ - `One Punch Man Downloader <#one-punch-man-downloader>`__ -------------- Basic examples -------------- Form submit ----------- .. code:: python >>> from delver import Crawler >>> c = Crawler() >>> response = c.open('https://httpbin.org/forms/post') >>> forms = c.forms() # Filling up fields values: >>> form = forms[0] >>> form.fields = { ... 'custname': 'Ruben Rybnik', ... 'custemail': 'ruben.rybnik@fakemail.com', ... 'size': 'medium', ... 'topping': ['bacon', 'cheese'], ... 'custtel': '+48606505888' ... } >>> submit_result = c.submit(form) >>> submit_result.status_code 200 # Checking if form post ended with success: >>> c.submit_check( ... form, ... phrase="Ruben Rybnik", ... url='https://httpbin.org/forms/post', ... status_codes=[200] ... ) True Find links narrowed by filters ------------------------------ .. code:: python >>> c = Crawler() >>> c.open('https://httpbin.org/links/10/0') <Response [200]> # Links can be filtered by some html tags and filters # like: id, text, title and class: >>> links = c.links( ... tags = ('style', 'link', 'script', 'a'), ... filters = { ... 'text': '7' ... }, ... match='NOT_EQUAL' ... ) >>> len(links) 8 Download file ------------- .. code:: python >>> import os >>> c = Crawler() >>> local_file_path = c.download( ... local_path='test', ... url='https://httpbin.org/image/png', ... name='test.png' ... ) >>> os.path.isfile(local_file_path) True Download files list in parallel ------------------------------- .. code:: python >>> c = Crawler() >>> c.open('https://xkcd.com/') <Response [200]> >>> full_images_urls = [c.join_url(src) for src in c.images()] >>> downloaded_files = c.download_files('test', files=full_images_urls) >>> len(full_images_urls) == len(downloaded_files) True Xpath selectors --------------- .. code:: python c = Crawler() c.open('https://httpbin.org/html') p_text = c.xpath('//p/text()') Css selectors ------------- .. code:: python c = Crawler() c.open('https://httpbin.org/html') p_text = c.css('div') Xpath result with filters ------------------------- .. code:: python c = Crawler() c.open('https://www.w3schools.com/') filtered_results = c.xpath('//p').filter(filters={'class': 'w3-xlarge'}) Using retries ------------- .. code:: python c = Crawler() # sets max_retries to 2 means that after there will be max two attempts to open url # if first attempt will fail, wait 1 second and try again, second attempt wait 2 seconds # and then try again c.max_retries = 2 c.open('http://www.delver.cg/404') Use examples ------------ Scraping Steam Specials using XPath ----------------------------------- .. code:: python from pprint import pprint from delver import Crawler c = Crawler(absolute_links=True) c.logging = True c.useragent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" c.random_timeout = (0, 5) c.open('http://store.steampowered.com/search/?specials=1') titles, discounts, final_prices = [], [], [] while c.links(filters={ 'class': 'pagebtn', 'text': '>' }): c.open(c.current_results[0]) titles.extend( c.xpath("//div/span[@class='title']/text()") ) discounts.extend( c.xpath("//div[contains(@class, 'search_discount')]/span/text()") ) final_prices.extend( c.xpath("//div[contains(@class, 'discounted')]//text()[2]").strip() ) all_results = { row[0]: { 'discount': row[1], 'final_price': row[2] } for row in zip(titles, discounts, final_prices)} pprint(all_results) Simple tables scraping out of the box ------------------------------------- .. code:: python from pprint import pprint from delver import Crawler c = Crawler(absolute_links=True) c.logging = True c.useragent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" c.open("http://www.boxofficemojo.com/daily/") pprint(c.tables()) User login ---------- .. code:: python from delver import Crawler c = Crawler() c.useragent = ( "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/60.0.3112.90 Safari/537.36" ) c.random_timeout = (0, 5) c.open('http://testing-ground.scraping.pro/login') forms = c.forms() if forms: login_form = forms[0] login_form.fields = { 'usr': 'admin', 'pwd': '12345' } c.submit(login_form) success_check = c.submit_check( login_form, phrase='WELCOME :)', status_codes=[200] ) print(success_check) One Punch Man Downloader ------------------------ .. code:: python import os from delver import Crawler class OnePunchManDownloader: """Downloads One Punch Man free manga chapers to local directories. Uses one main thread for scraper with random timeout. Uses 20 threads just for image downloads. """ def __init__(self): self._target_directory = 'one_punch_man' self._start_url = "http://m.mangafox.me/manga/onepunch_man_one/" self.crawler = Crawler() self.crawler.random_timeout = (0, 5) self.crawler.useragent = "Googlebot-Image/1.0" def run(self): self.crawler.open(self._start_url) for link in self.crawler.links(filters={'text': 'Ch '}, match='IN'): self.download_images(link) def download_images(self, link): target_path = '{}/{}'.format(self._target_directory, link.split('/')[-2]) full_chapter_url = link.replace('/manga/', '/roll_manga/') self.crawler.open(full_chapter_url) images = self.crawler.xpath("//img[@class='reader-page']/@data-original") os.makedirs(target_path, exist_ok=True) self.crawler.download_files(target_path, files=images, workers=20) downloader = OnePunchManDownloader() downloader.run() ======= History ======= 0.1.3 (2017-10-03) ------------------ * First release on PyPI.


نحوه نصب


نصب پکیج whl delver-0.1.6:

    pip install delver-0.1.6.whl


نصب پکیج tar.gz delver-0.1.6:

    pip install delver-0.1.6.tar.gz