Skip to main content
Participant
May 29, 2020
Question

OCR - parts of phrases gets moved to random places before or after

  • May 29, 2020
  • 0 replies
  • 284 views

The document is about 500 pages. I have tried to do OCR from the raw file both as image only and as editable text and image. 
Some pages are unproblematic (apart from single letters being miscoded). Others have a few movement mistakes, while some have loads of problems. Parts of phrases can be moved up to 30-40 lines away. 
Parts of the text is sometimes moved just a little bit, other times a lot. Some times it's even duplicated.

The problem is not after the export of the text - it's in the actual pdf. When searching for a moved part in the pdf, I will land on the same place as in the .docx and .txt files, thus I might have to search the pdf manually up to a page away. 

 

Sometimes just one or two letters are moved, but most of the times, multiple words are moved.

 

I have attached a couple of screenprints to show the problem. 

 

 

 

 

This topic has been closed for replies.