Skip to main content
Participant
October 27, 2019
Question

How to import ready OCR text (XML or TXT) in a PDF?

  • October 27, 2019
  • 2 replies
  • 1147 views
Suggested my issue on uservoice:
 
Following scenario: I have an image-only PDF file (a scanned book) with 500 pages and 500 Alto-XML files with OCR-Text for each corresponding page of that PDF File. That OCR-XML files were exported from the original searchable OCR-PDF-file. I don't have that source OCR-PDF file, it comes from a German library (StaBi Berlin). Unfortunately, they don't offer to download the OCR-PDF file directly. You can just download an image-only PDF file of a book and the corresponding OCR-XML-files from separately. Or all OCR-Text in one txt-file. (If you don't believe me, see for yourself: See here
You can change the language to english on the bottom right corner)
 
So now I am looking for a way to import those 500 XML-Files back to each corresponding page of that image-only PDF so that I get a searchable OCR-PDF file in the end. Is there a way to do it with Acrobat DC (or, if not, maybe with assistant tools?)
 
Best regards,
Minsutoreru
    This topic has been closed for replies.

    2 replies

    jane-e
    Community Expert
    Community Expert
    October 27, 2019

    Hi Minsutoreru,

    While Adobe Acrobat does not support this, you can put in a feature request here:

    https://acrobat.uservoice.com

    Post your link back in this thread so others might see it and vote on it.

    ~ Jane

    Participant
    October 29, 2019
    Legend
    October 27, 2019

    Acrobat has no tools to do this.  It does have its own OCR, best for low volume use.