django商城系统开发

OCRmyPDF—可智能识别PDF文本和图片信息的工具

OCRmyPDF向扫描的PDF文件添加了OCR文本层，使它们可以被搜索或复制粘贴。

ocrmypdf                      # it's a scriptable command line program
   -l eng+fra                 # it supports multiple languages
   --rotate-pages             # it can fix pages that are misrotated
   --deskew                   # it can deskew crooked PDFs!
   --title "My PDF"           # it can change output metadata
   --jobs 4                   # it uses multiple cores by default
   --output-type pdfa         # it produces PDF/A by default
   input_scanned.pdf          # takes PDF input (or images)
   output_searchable.pdf      # produces validated PDF output

主要特性

•从普通PDF生成可搜索的PDF/A文件 •在图像下方准确放置OCR文本，以便于复制/粘贴 •保持原始嵌入图像的确切分辨率 •在可能的情况下，将OCR信息作为“无损”操作插入，不会干扰其他内容 •优化PDF图像，通常产生的文件比输入文件小 •如果需要，可以在执行OCR之前对图像进行纠偏和/或清洁 •验证输入和输出文件 •在所有可用的CPU核心之间分配工作 •使用Tesseract OCR引擎识别超过100种语言 •保护您的私人数据安全。 •能够正确处理包含数千页的文件。 •在数百万PDF文件上经过实战测试。

安装

支持 Linux、Windows、macOS 和 FreeBSD。Docker 映像也可用于 x64 和 ARM。

操作系统	安装命令
Debian、Ubuntu	apt install ocrmypdf
Windows 子系统 Linux	apt install ocrmypdf
Fedora	dnf install ocrmypdf
macOS（Homebrew）	brew install ocrmypdf
macOS （nix）	nix-env -i ocrmypdf
LinuxBrew	brew install ocrmypdf
FreeBSD	pkg install py-ocrmypdf
conda	conda install ocrmypdf
Ubuntu	snap install ocrmypdf