The print/ocrmypdf port
ocrmypdf-16.11.0 – add an OCR text layer to scanned PDF files (cvsweb github mirror)
Description
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy+pasted. - Generates a searchable PDF/A file from a regular PDF - Places OCR text accurately below the image to ease copy / paste - Keeps the exact resolution of the original embedded images - When possible, inserts OCR information as a "lossless" operation without disrupting any other content - Optimizes PDF images, often producing files smaller than the input file - If requested, deskews and/or cleans the image before performing OCR - Validates input and output files - Distributes work across all available CPU cores - Uses Tesseract OCR engine to recognize more than 100 languages (use "pkg_info -Q tesseract" to locate language packs to install) - Keeps your private data private - Scales properly to handle files with thousands of pages - Battle-tested on millions of PDFs ocrmypdf # it's a scriptable command line program -l eng+fra # it supports multiple languages --rotate-pages # it can fix pages that are misrotated --deskew # it can deskew crooked PDFs! --title "My PDF" # it can change output metadata --jobs 4 # it uses multiple cores by default --output-type pdfa # it produces PDF/A by default input_scanned.pdf # takes PDF input (or images) output_searchable.pdf # produces validated PDF outputWWW: https://ocrmypdf.readthedocs.io/
Maintainer
The OpenBSD ports mailing-list
Categories
Build dependencies
Run dependencies
- devel/py-deprecation
- devel/py-pluggy
- devel/py-rich
- devel/py-tqdm
- graphics/img2pdf
- graphics/pngquant
- graphics/py-Pillow
- graphics/tesseract/tessdata,-osd
- graphics/tesseract/tesseract
- lang/python/3
- print/ghostscript/gnu
- print/py-pikepdf
- print/py-reportlab
- print/unpaper
- sysutils/py-packaging
- textproc/py-coloredlogs
- textproc/py-pdfminer
Test dependencies
- devel/py-deprecation
- devel/py-hypothesis
- devel/py-pluggy
- devel/py-rich
- devel/py-test
- devel/py-test-xdist
- devel/py-tqdm
- graphics/img2pdf
- graphics/pngquant
- graphics/py-Pillow
- graphics/tesseract/tessdata,-osd
- graphics/tesseract/tesseract
- lang/python/3
- print/ghostscript/gnu
- print/py-pikepdf
- print/py-reportlab
- print/unpaper
- sysutils/py-packaging
- textproc/py-coloredlogs
- textproc/py-pdfminer