• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

OCR PDFs that are a mix of Text and Images?

New Here ,
Feb 17, 2022 Feb 17, 2022

Copy link to clipboard

Copied

I have some PDFs that I would like to OCR using the API, but there's already some text layers in it. The API seems to skip the entire PDF instead of OCRing the image data.

Views

194

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 28, 2022 Feb 28, 2022

Copy link to clipboard

Copied

LATEST

The best way to accomplish this is to use the Properties API to detect which pages are image only, then use the split API to pul those pages out of the PDF, send them to the OCR API, then use the Insert API to add them back to the original PDF. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources