Skip to main content
bflmpsvz2
Known Participant
September 27, 2024
Answered

Why this pdf need OCR?

  • September 27, 2024
  • 1 reply
  • 599 views

Let's have a PDF like this:

https://dwn.alza.cz/manual/74911

It is not scanned, because I can increase it with no limit (6400% without lost of quality).

Why it is not possible to recover text from it and the OCR is necessary?

This topic has been closed for replies.
Correct answer try67

The text in this file is a part of a vector image (as opposed to a bitmap image), which is why it doesn't pixelate when you zoom into it, but it also means it's not real text, and there can't be selected or copied. You have to run Text Recognition on it to convert it to "real" text.

1 reply

try67
Community Expert
try67Community ExpertCorrect answer
Community Expert
September 27, 2024

The text in this file is a part of a vector image (as opposed to a bitmap image), which is why it doesn't pixelate when you zoom into it, but it also means it's not real text, and there can't be selected or copied. You have to run Text Recognition on it to convert it to "real" text.

bflmpsvz2
bflmpsvz2Author
Known Participant
September 27, 2024

Thanks. It did not occur to me that the whole document could be a vector image. There are also pictograms and logo, which can be zoomed the same way as the text, so it must be everything created this way. Interesting.