• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
1

Ligatures in PDF from Postscript

Community Expert ,
Oct 24, 2019 Oct 24, 2019

Copy link to clipboard

Copied

When creating PDF from Postscript, ligatures like fi ffi fl etc... are mapped in a special way and if you try to copy the text from the resulting PDF and paste it elsewhere, you will end up with missing glyphs or special characters or extra spaces when the ligatures are. You can’t search the text either in Acrobat.

 

Is there a way to create a poscript files, that will correctly embeded and maps the ligatures so that user can extract or search the PDF? 

TOPICS
Create PDFs , General troubleshooting

Views

2.7K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 24, 2019 Oct 24, 2019

Copy link to clipboard

Copied

My take is this... when you distill a PDF from PostScript there is no font or glyph remapping at all. If the PostScript contains a reference to the glyph called /fi then the PDF has a reference to the glyph called /fi. It displays fine, but extraction is an interesting problem. Essentially software has two choices

1. Export as the single glyph /fi. In Unicode this is U+FB01. This is entirely legal and correct, except that many fonts do not have this Unicode glyph, so there will be substitution or a missing character. However, if working with pro fonts the glyph may be there. Still confusing for a person who thinks (wrongly) that there are two glyphs. On Mac, Unicode isn't needed, because fi is in the default character set. Coming back to Windows an app may place both Unicode and non-Unicode on the clipboard, and could follow step 2 for the non-Unicode text.

2. Map to the two glyphs "f" and "i". This is arguably wrong, but it is likely to match user expectation more often.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 24, 2019 Oct 24, 2019

Copy link to clipboard

Copied

Mapping / encoding or decoding, whatever! Extracting or searching text in PDF generated by Postscript is definately an issue.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 24, 2019 Oct 24, 2019

Copy link to clipboard

Copied

There's an entirely separate issue that PostScript has no rule that the text is marked with recognised codes. I can write a PostScript file where the letters of the alphabet, instead of being called /a /b /c are called /fred /barney /wilma. This will show and print beautifully, but no text can be extracted. But this is not a PostScript issue; many other PDF generators will use arbitrary codes.

 

PDF includes a concent "ToUnicode CMap". This is extra information to give the Unicode value for every glyph. Works well, but most apps don't include it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jun 30, 2020 Jun 30, 2020

Copy link to clipboard

Copied

Why Chrome support "U+FB03 : LATIN SMALL LIGATURE FFI" and Adobe Acrobat does not?? The same about PDF-Xchange... See "sufficiently (ffi is one symbol here)" Also SOMEHOW it copies it as  U+000E : <control> SHIFT OUT [SO], why??
here https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimPDF/pimeas.pdf also look https://www.babelstone.co.uk/Unicode/whatisit.html

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Mar 31, 2024 Mar 31, 2024

Copy link to clipboard

Copied

LATEST

Meanwhile Pdf-Xchange fixed all its issues with "U+FB03 : LATIN SMALL LIGATURE FFI"...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines