Skip to main content
Inspiring
October 27, 2024
Question

PDF with unusable selectable text

  • October 27, 2024
  • 1 reply
  • 513 views

I received a PDF which contains selectable text. It is not an OCRed document. However, when searching for a word, it never finds it. When selecting text and pasting it in a text editor like Notepad, I see the text as one character per line. Hence, selecting "The quick brown fox" would paste as

 

T

h

e

q

u

i

c

k

b

r

o

w

n

f

o

x

 

Is there a way to fix that? One person elsewhere suggested that I convert to JPG, assemble the JPG and OCR. While it produces something more usable in terms of searching text or copying and pasting it, it produced a humongous file and I lost many important features like links and visual quality. I am looking for a tool, either in Acrobat or external, that could fix that. I mean, if OCRing an image can produce something usable, I am sure some tool could do just that whilst preserving the PDF intact.

This topic has been closed for replies.

1 reply

try67
Community Expert
Community Expert
October 27, 2024

Not really. This would require entering the "guts" of the PDF file and removing the line-break added after each character (for some reason). This is not a simple operation and it can cause a lot of other issues.

Recreating the file is your best option. If you use a lossless format like PNG (instead of a lossy one like JPG) it will solve the quality loss issue (or at least mitigate it). After creating the new file you can use the Replace Pages command on the old one to replace the old pages with your new ones. This will keep the links (and fields, etc.) intact and you'll end up with a very similar version to the original, only with better text this time.

Inspiring
October 27, 2024

There is a actually no line breaks after the characters. When I inspect the PDF using the "Content" pane navigation button (or the Edit PDF function), it is as if the text of the PDF was broken into individual letter and precisely placed one next to each other to give the illusion of a contiguous text.

However, I just discovered before your reply came in that if I use the "Fix potential font problems" fixup in the preflight tool, while it does not find any problem, it does fix the problem almost perfectly and the text of the newly created PDF is now selectable as complete words rather than individual letters (with a few words missing a space in between), but this is way better than the original PDF.

For now, this seems to solve my issue, but I would still like to get confirmation from other experts if the workaround I found is truly a proven solution or was it just luck this time around.