PDF Extract - Tables has poor OCR results
Hi,
Quite a number of tables seem to be extracted using OCR. Unfortunately, this results in errors in extraction - even when the documnet itself contains text and not scanned pages. These documents are German.
Common Mistakes
- Missing spaces e.g. 7bis 14 Tage , eskannunter
- Characters not converted with the accents e.g. Ü or Ö might become U or O.
- Superscripts are often used in table to refer to a key - these may be a list of numbers, leters or special symbols and may be sperated by commas
- numbers where there are special characters e.g. >169 , 4169 are often converted to all numbers.
This all adds up to a lack of confidence in table extraction. Are there any options to set langauge? How is the OCR done? Can settings be applied?
Kind Regards,
Jono
