Skip to main content
Participant
September 30, 2024
Answered

Acrobat Pro OCR ends up with different fonts to system fonts.

  • September 30, 2024
  • 1 reply
  • 2859 views

I have scanned a document into a pdf file.  I then use Acrobat Pro Edit PDF tool, to recognise the text in the document.

It finds all the text in the document, but the fonts it uses/embeds  have a number at the end of the font name.

So for example,  text that is Times Roman font, will be Times Roman-13009 or Times Roman 13009 as the font name in Edit PDF font section.

What then happens, is that when I go to edit the text, such as delete or change a word, the new text doesn't match and looks different.

What is causing this to happen, why does the Recognise text function make the font name with a number after it?

Under Document Properties/Font, it  says  the font is embedded, which I am assuming it says that because my system does have standard Tines  Roman font installed. But having the number after it in the font name is like its a different font.

 

Does anyone know about this issue and how to fix it /prevent it from happening?  

 

Thankyou..

Correct answer Brad @ Roaring Mouse

This is under "How things work" .

When Acrobat does OCR on a scanned document, unless you set it to use a system font (see below), it will create a fake font outline that looks as close to the scanned version as possible. You can see this if you zoom in on the letters. It then subsets that new fake font and gives it a name; and there may be several. It may not even be the font in the scan. That is merely there so that the text is accessible, say if you export the text to Word, it has a real font to connect to it, and it tends to be something simple like Times, Arial or Helvetica.

Like any embedded subset font, there are only the characters USED in the subset. As soon as you type any character not already used, it will need to use a system font for those new characters. Like any PDF, you really should not be doing any extensive editing in it anyway. This is not the right place.

Now, you can CHANGE the font by selecting all text in a paragraph and change it to REAL Times (or any font) you have that's close, and then you can edit more successfully, and the resulting file will then embed and subset that font instead.

 

OR, you can tell Acrobat to use a system font when to recognizes text, like so:

 

This will use a common font (like Times) but it won't likely match the scan.

1 reply

Brad @ Roaring Mouse
Community Expert
Brad @ Roaring MouseCommunity ExpertCorrect answer
Community Expert
September 30, 2024

This is under "How things work" .

When Acrobat does OCR on a scanned document, unless you set it to use a system font (see below), it will create a fake font outline that looks as close to the scanned version as possible. You can see this if you zoom in on the letters. It then subsets that new fake font and gives it a name; and there may be several. It may not even be the font in the scan. That is merely there so that the text is accessible, say if you export the text to Word, it has a real font to connect to it, and it tends to be something simple like Times, Arial or Helvetica.

Like any embedded subset font, there are only the characters USED in the subset. As soon as you type any character not already used, it will need to use a system font for those new characters. Like any PDF, you really should not be doing any extensive editing in it anyway. This is not the right place.

Now, you can CHANGE the font by selecting all text in a paragraph and change it to REAL Times (or any font) you have that's close, and then you can edit more successfully, and the resulting file will then embed and subset that font instead.

 

OR, you can tell Acrobat to use a system font when to recognizes text, like so:

 

This will use a common font (like Times) but it won't likely match the scan.

Momo-50Author
Participant
September 30, 2024

Thankyou fo\r the detailed explanation, that has helped me solve my isssue.  I used the "Available System Font" option in Recognize text and the fonts are now using just the font name,  no numbers and edited text matches.

Brad @ Roaring Mouse
Community Expert
Community Expert
October 2, 2024

Wondering if you might know how to do this - As you mentioned above,  about selecting the text in a paragraph and then changing the font, what about if you want to change the font for multiple text on the page, but each text paragraph is in different boxes.   

 

Is there a way to select more than one text box so as to change font for multiple boxes?

Also is there a quick way to select multiple text boxes at once, instead of one by one ?

Thanks.

 


"Is there a way to select more than one text box so as to change font for multiple boxes?"

I don't think so.

I think you might need to rethink your approach. Rather than just trying to make a scanned PDF usable, I would just use it to export the recognized text (say as a Word file) and use that to rebuild a proper file, be it in InDesign or otherwise.