Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

OCR PDF and convert to searchable document in C# and VB.NET

This sample shows how to create a searchable PDF document from a non-searchable (image-based) document using Docotic.Pdf library and Tesseract OCR Engine.

Follow these steps to do OCR when a PDF page does not contain searchable text:

  1. Save the page as high-resolution image using Docotic.Pdf. Higher resolution leads to better recognition quality.
  2. Recognize the image using Tesseract OCR engine.
  3. Insert recognized text chunks back to PDF using Docotic.Pdf.

If your documents contain text in language(s) other than English, provide Language Data Files for Tesseract 4.00 for the language(s) of your document.

Also ensure that you have Visual Studio 2015-2019 x86 & x64 runtimes installed.

See also