Copy link to clipboard
Copied
When creating PDF from Postscript, ligatures like fi ffi fl etc... are mapped in a special way and if you try to copy the text from the resulting PDF and paste it elsewhere, you will end up with missing glyphs or special characters or extra spaces when the ligatures are. You can’t search the text either in Acrobat.
Is there a way to create a poscript files, that will correctly embeded and maps the ligatures so that user can extract or search the PDF?
Copy link to clipboard
Copied
My take is this... when you distill a PDF from PostScript there is no font or glyph remapping at all. If the PostScript contains a reference to the glyph called /fi then the PDF has a reference to the glyph called /fi. It displays fine, but extraction is an interesting problem. Essentially software has two choices
1. Export as the single glyph /fi. In Unicode this is U+FB01. This is entirely legal and correct, except that many fonts do not have this Unicode glyph, so there will be substitution or a missing character. However, if working with pro fonts the glyph may be there. Still confusing for a person who thinks (wrongly) that there are two glyphs. On Mac, Unicode isn't needed, because fi is in the default character set. Coming back to Windows an app may place both Unicode and non-Unicode on the clipboard, and could follow step 2 for the non-Unicode text.
2. Map to the two glyphs "f" and "i". This is arguably wrong, but it is likely to match user expectation more often.
Copy link to clipboard
Copied
Mapping / encoding or decoding, whatever! Extracting or searching text in PDF generated by Postscript is definately an issue.
Copy link to clipboard
Copied
There's an entirely separate issue that PostScript has no rule that the text is marked with recognised codes. I can write a PostScript file where the letters of the alphabet, instead of being called /a /b /c are called /fred /barney /wilma. This will show and print beautifully, but no text can be extracted. But this is not a PostScript issue; many other PDF generators will use arbitrary codes.
PDF includes a concent "ToUnicode CMap". This is extra information to give the Unicode value for every glyph. Works well, but most apps don't include it.
Copy link to clipboard
Copied
Why Chrome support "U+FB03 : LATIN SMALL LIGATURE FFI" and Adobe Acrobat does not?? The same about PDF-Xchange... See "sufficiently (ffi is one symbol here)" Also SOMEHOW it copies it as U+000E : <control> SHIFT OUT [SO], why??
here https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimPDF/pimeas.pdf also look https://www.babelstone.co.uk/Unicode/whatisit.html
Copy link to clipboard
Copied
Meanwhile Pdf-Xchange fixed all its issues with "U+FB03 : LATIN SMALL LIGATURE FFI"...
Find more inspiration, events, and resources on the new Adobe Community
Explore Now