Copy link to clipboard
Copied
Hi Smart Acrobat People:
I am tasked with converting a PDF to FrameMaker and the original Word doc used to create the PDF is long gone. Engineers have been editing the PDF over the course of several years. I thought, no problem, convert the PDF to Word or RTF, clean up the doc and I'll be off and running.
Here is how a small section looks in Acrobat:
And the same section in Word:
I have tried:
Copy/Paste (paste as formatted text, paste as unformatting text, paste in InDesign, Word, FrameMaker actually gets worse, in that those crazy characters become ?s in boxes. Again, changing the font does not help.
This is a long document, so multiply the Headings issue by Hundreds of pages. Any ideas?
~Barb
Oh, no, Barb...don't do that!
The problem is that the fonts in the original PDF either 1) weren't embedded into the PDF at the time it was made, 2) weren't Unicode/OpenType fonts, or 3) both.
We have this sort of problem all the time when we make older PDFs accessible. Try this workaround:
Copy link to clipboard
Copied
Hi again:
I'm still struggling to figure out a way to move forward without having to retype hundreds of page of headings, in Spanish. As I look at the fonts, I see various encodings for Arial Bold (some Arial Bold is converting clearly and some is not), along with TrueType v Type 1 (CID) vs TrueType (CID).
Is there a clue in this dialog box?
I don't know anything about "Indentity-H", other than a post from @Dov Isaacs explaining that it is a perfectly valid encoding method per the PDF specification, but I while am surmising that those are the paragraphs that are mismapping, I recognize that this may be entirely incorrect.
Really, this comes down to is there any way to remap these characters, after the fact? Again, the original Word document is gone, and this one PDF is all we have to work with.
~Barb
Copy link to clipboard
Copied
It looks like I am talking to myself... but for others who encounter this issue in the future, I am able to recover the text using the following process:
I'll write an action to automate this, but if someone has a better way, please tell me!
~Barb
Copy link to clipboard
Copied
Oh, no, Barb...don't do that!
The problem is that the fonts in the original PDF either 1) weren't embedded into the PDF at the time it was made, 2) weren't Unicode/OpenType fonts, or 3) both.
We have this sort of problem all the time when we make older PDFs accessible. Try this workaround:
This is definitely caused by non-compliant fonts in the original PDF. Are you able to open the original PDF and check which fonts are being used where?
Copy link to clipboard
Copied
Ok, one more diagnostic task:
Can you post a screen capture of the original PDF's File Properties / Description panel? It will show which software was used to create the original PDF.
Copy link to clipboard
Copied
Thank you for stepping in. 😊
Oddly, the original app isn't listed. My first plan of attack was to request they they try again to locate the original file.
Am in the process of making sure everything is updated on Windows—I'll work through your list once it is and will check back in.
Copy link to clipboard
Copied
PDF Producer is the software utility that converts the source file into the PDF.
The screen capture is telling: they made a PDF from a PDF.
So whatever shortcomings were in the original were carried over into the newer version.
Your client needs a better workflow and training. If you'd like to do that, contact me offlist and we'll can be your backup tech support coaches on it.
Copy link to clipboard
Copied
I'm stuck at #2. Windows is showing that the Arial installed on my computer is TrueType font.
I can ask my client to purchase an OpenType version (i.e., https://www.linotype.com/145867/arial-family.html) but before I do that, I want to confirm that I'm understanding you correctly, and knowing that I have access to many other OpenType fonts through my CC subscription, that I can't map just map Arial to another OpenType font. But if this is the best option, then we will go this route.
The font calls for the crazy text are Arial Black and Arial, Arial Bold, Arial Italic and Arial Bold Italic.
~Barb
Copy link to clipboard
Copied
Barb, Microsoft uses the TTF extension on all its fonts, whether traditional TrueType or OpenType (TrueType flavored). So drill a bit deeper and see if you can see this dialogue box that confirms it. (Looking at the copyright date and file version, I'm going to assume you have an OpenType version).
Copy link to clipboard
Copied
Don't buy Arial (reg, ital, etc.). They're included free with MS products.
https://docs.microsoft.com/en-us/typography/font-list/arial See if you can download them from the MS website.
And don't forget to get a good copy of Arial Black. https://docs.microsoft.com/en-us/typography/font-list/arial-black
Once you confirm you have Unicode versions (with a post-2010 copyright date), go ahead and embed the fonts.
Copy link to clipboard
Copied
Thank you, thank you, thank you!
Embedding the fonts worked exactly as you said it would and solved the issue. I could never gotten here without your help. I🙏🏼
~Barb
Copy link to clipboard
Copied
@Barb Binder, you're welcome, my friend!
Glad to have been able to help.