Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

How to import ready OCR text (XML or TXT) in a PDF?

New Here ,
Oct 27, 2019 Oct 27, 2019
Suggested my issue on uservoice:
 
Following scenario: I have an image-only PDF file (a scanned book) with 500 pages and 500 Alto-XML files with OCR-Text for each corresponding page of that PDF File. That OCR-XML files were exported from the original searchable OCR-PDF-file. I don't have that source OCR-PDF file, it comes from a German library (StaBi Berlin). Unfortunately, they don't offer to download the OCR-PDF file directly. You can just download an image-only PDF file of a book and the corresponding OCR-XML-files from separately. Or all OCR-Text in one txt-file. (If you don't believe me, see for yourself: See here
You can change the language to english on the bottom right corner)
 
So now I am looking for a way to import those 500 XML-Files back to each corresponding page of that image-only PDF so that I get a searchable OCR-PDF file in the end. Is there a way to do it with Acrobat DC (or, if not, maybe with assistant tools?)
 
Best regards,
Minsutoreru
1.1K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 27, 2019 Oct 27, 2019

Acrobat has no tools to do this.  It does have its own OCR, best for low volume use. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 27, 2019 Oct 27, 2019

Hi Minsutoreru,

While Adobe Acrobat does not support this, you can put in a feature request here:

https://acrobat.uservoice.com

Post your link back in this thread so others might see it and vote on it.

~ Jane

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 29, 2019 Oct 29, 2019
LATEST
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines