معرفی شرکت ها


adf2pdf-0.8.2


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Automate the workflow around ADF scanning, OCR and PDF creation
ویژگی مقدار
سیستم عامل -
نام فایل adf2pdf-0.8.2
نام adf2pdf
نسخه کتابخانه 0.8.2
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Georg Sauthoff
ایمیل نویسنده mail@gms.tf
آدرس صفحه اصلی https://github.com/gsauthof/adf2pdf
آدرس اینترنتی https://pypi.org/project/adf2pdf/
مجوز -
adf2pdf - a tool that turns a batch of paper pages into a PDF with a text layer. By default, it detects empty pages (as they may easily occur during duplex scanning) and excludes them from the OCR and the resulting PDF. For that, it uses [Sane's][5] [scanimage][6] for the scanning, [Tesseract][4] for the [optical character recognition][ocr] (OCR), and the Python packages [img2pdf][9], [Pillow (PIL)][10] and [PyPDF2][11] for some image-processing tasks and PDF mangling. Example: $ adf2pdf contract-xyz.pdf 2017, Georg Sauthoff <mail@gms.tf> ## Features - Automatic document feed (ADF) support - Fast empty page detection - Overlaying of scanning, image processing, OCR and PDF creation to minimize the total runtime - Fast creation of small PDFs using the fine [img2pdf][9] package - Only use of safe compression methods, i.e. no error-prone symbol segmentation style compression like [JBIG2][12] or JB2 that is used in [Xerox photocopiers][12] and the DjVu format. ## Install Instructions Adf2pdf can be directly installed with [`pip`][13], e.g. $ pip3 install --user adf2pdf or $ pip3 install adf2pdf See also the [PyPI adf2pdf project page][14]. Alternatively, the Python file `adf2pdf.py` can be directly executed in a cloned repository, e.g.: $ ./adf2pdf.py report.pdf In addition to that, one can install the development version from a cloned work-tree like this: $ pip3 install --user . ## Hardware Requirements A scanner with automatic document feed (ADF) that is supported by Sane. For example, the [Fujitsu ScanSnap S1500][1] works well. That model supports duplex scanning, which is quite convenient. ## Example continued Running _adf2pdf_ for a 7 page example document takes 150 seconds on an i7-6600U (Intel Skylake, 4 cores) CPU (using the ADF of the Fujitsu ScanSnap S1500). With the defaults, _adf2pdf_ calls `scanimage` for duplex scanning into 600 dpi lineart (black and white) images. In this example, 6 pages are empty and thus automatically excluded, i.e. the resulting PDF then just contains 8 pages. The resulting PDF contains a text layer from the OCR such that one can search and copy'n'paste some text. It is 1.1 MiB big, i.e. a page is stored in 132 KiB, on average. ## Software Requirements The script assumes Tesseract version 4, by default. Version 3 can be used as well, but the [new neural network system in Tesseract 4][8] just performs magnitudes better than the old OCR model. Tesseract 4.0.0 was released in late 2018, thus, distributions released in that time frame may still just include version 3 in their repositories (e.g. Fedora 29 while Fedora 30 features version 4). Since version 4 is so much better at OCR I can't recommend it enough over the stable version 3. Tesseract 4 notes (in case you need to build it from the sources): - [Build instructions][2] - warning: if you miss the `autoconf-archive` dependency you'll get weird autoconf error messages - [Data files][3] - you need the training data for your languages of choice and the OSD data Python packages: - [img2pdf][9] (Fedora package: python3-img2pdf) - [Pillow (PIL)][10] (Fedora package: python3-pillow-devel) - [PyPDF2][11] (Fedora package: python3-PyPDF2) [1]: http://www.fujitsu.com/us/products/computing/peripheral/scanners/product/eol/s1500/ [2]: https://github.com/tesseract-ocr/tesseract/wiki/Compiling-–-GitInstallation [3]: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files [4]: https://en.wikipedia.org/wiki/Tesseract_(software) [5]: https://en.wikipedia.org/wiki/Scanner_Access_Now_Easy [6]: http://www.sane-project.org/man/scanimage.1.html [7]: https://en.wikipedia.org/wiki/Optical_character_recognition [8]: https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00 [9]: https://pypi.org/project/img2pdf/ [10]: http://python-pillow.github.io/ [11]: https://github.com/mstamy2/PyPDF2 [12]: https://en.wikipedia.org/wiki/JBIG2 [13]: https://en.wikipedia.org/wiki/Pip_(package_manager) [14]: https://pypi.org/project/adf2pdf/ [ocr]: https://en.wikipedia.org/wiki/Optical_character_recognition


زبان مورد نیاز

مقدار نام
>=3 Python


نحوه نصب


نصب پکیج whl adf2pdf-0.8.2:

    pip install adf2pdf-0.8.2.whl


نصب پکیج tar.gz adf2pdf-0.8.2:

    pip install adf2pdf-0.8.2.tar.gz