Skip to main content
Participant
February 17, 2022
Question

OCR PDFs that are a mix of Text and Images?

  • February 17, 2022
  • 1 reply
  • 435 views

I have some PDFs that I would like to OCR using the API, but there's already some text layers in it. The API seems to skip the entire PDF instead of OCRing the image data.

    This topic has been closed for replies.

    1 reply

    Joel Geraci
    Community Expert
    Community Expert
    February 28, 2022

    The best way to accomplish this is to use the Properties API to detect which pages are image only, then use the split API to pul those pages out of the PDF, send them to the OCR API, then use the Insert API to add them back to the original PDF.