Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
1

Ligatures in PDF from Postscript

Community Expert ,
Oct 24, 2019 Oct 24, 2019

When creating PDF from Postscript, ligatures like fi ffi fl etc... are mapped in a special way and if you try to copy the text from the resulting PDF and paste it elsewhere, you will end up with missing glyphs or special characters or extra spaces when the ligatures are. You can’t search the text either in Acrobat.

 

Is there a way to create a poscript files, that will correctly embeded and maps the ligatures so that user can extract or search the PDF? 

TOPICS
Create PDFs , General troubleshooting
3.7K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 24, 2019 Oct 24, 2019

My take is this... when you distill a PDF from PostScript there is no font or glyph remapping at all. If the PostScript contains a reference to the glyph called /fi then the PDF has a reference to the glyph called /fi. It displays fine, but extraction is an interesting problem. Essentially software has two choices

1. Export as the single glyph /fi. In Unicode this is U+FB01. This is entirely legal and correct, except that many fonts do not have this Unicode glyph, so there will be substitution or a missing character. However, if working with pro fonts the glyph may be there. Still confusing for a person who thinks (wrongly) that there are two glyphs. On Mac, Unicode isn't needed, because fi is in the default character set. Coming back to Windows an app may place both Unicode and non-Unicode on the clipboard, and could follow step 2 for the non-Unicode text.

2. Map to the two glyphs "f" and "i". This is arguably wrong, but it is likely to match user expectation more often.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 24, 2019 Oct 24, 2019

Mapping / encoding or decoding, whatever! Extracting or searching text in PDF generated by Postscript is definately an issue.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 24, 2019 Oct 24, 2019

There's an entirely separate issue that PostScript has no rule that the text is marked with recognised codes. I can write a PostScript file where the letters of the alphabet, instead of being called /a /b /c are called /fred /barney /wilma. This will show and print beautifully, but no text can be extracted. But this is not a PostScript issue; many other PDF generators will use arbitrary codes.

 

PDF includes a concent "ToUnicode CMap". This is extra information to give the Unicode value for every glyph. Works well, but most apps don't include it.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jun 30, 2020 Jun 30, 2020

Why Chrome support "U+FB03 : LATIN SMALL LIGATURE FFI" and Adobe Acrobat does not?? The same about PDF-Xchange... See "sufficiently (ffi is one symbol here)" Also SOMEHOW it copies it as  U+000E : <control> SHIFT OUT [SO], why??
here https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimPDF/pimeas.pdf also look https://www.babelstone.co.uk/Unicode/whatisit.html

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Mar 31, 2024 Mar 31, 2024
LATEST

Meanwhile Pdf-Xchange fixed all its issues with "U+FB03 : LATIN SMALL LIGATURE FFI"...

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines