Skip to main content
Inspiring
June 30, 2020
Question

Bug in Unicode processing of a ligature ffi [U+FB03 : LATIN SMALL LIGATURE FFI] and others

  • June 30, 2020
  • 2 replies
  • 5879 views

Why Android Adobe PDF reader and Chrome support "U+FB03 : LATIN SMALL LIGATURE FFI" and Adobe Acrobat does not?? The same about PDF-Xchange, it works good... See "and all integers p and q with sufficiently" (ffi is one symbol here) in the first paragraph of https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimPDF/pimeas.pdf

Also SOMEHOW it copies it as  U+000E : <control> SHIFT OUT [SO], why?? Latex source https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimTeX/pimeas.tex
Also look here https://www.babelstone.co.uk/Unicode/whatisit.html

and here https://github.com/alif-type/libertinus/issues/143 (it has nice compilation of all (?) ligatures).
P.S. 

"Beta: Use Unicode UTF-8 for worldwide language support" or "Edit-->> Preferences-->> Language" do not fix the issue.

 

Likes

 
This topic has been closed for replies.

2 replies

ZBallingAuthor
Inspiring
April 7, 2024

This is now a worse issue since after Edge added support for pdf from Adobe this issue is there too (edge://flags, New Pdf Viewer).

ls_rbls
Community Expert
Community Expert
July 1, 2020

Can you confirm if this is also an issue that could be related to how Unicode is supported by the operating system where Acrobat is installed? 

 

Like for example, have you been able to test if the version of your Adobe Acrobat Pro DC behaves the same way  in a computer using macOS Catalina(or older version), MS Windows 8 and/or MS Windows 10.

 

Since you've mentioned about Android OS , maybe it is worth to look also at the operating system where it is running from.

 

Just recently the last update of June 2020  addressed an issue that was aimed at Acrobat running on MS Windows, in which  users were reporting back to the forums that the Weblink plug-in was not encoding/decoding URLs porpperly , for example.

 

This, however,  is not necessarily related to your inquiry, but the fact that UTF-8 encoded URLs were malformed to begin with, it made some sense to me to ask this question because the last update only fixed this problem of Acrobat Pro DC for Windows, not macOS.

 

Meanwhile, some other Acrobat users who have older versions of the product, like Acrobat Pro X, Acrobat Pro XI, Acrobat  DC 2017, have reported back as not  experiencing the URL issue. 

 

Have you been able to test or ask friends and/or other users if the  LATIN SMALL LIGATURE FFI ligarure issue manifests consistently accross all versions of their Acrobat?

 

ZBallingAuthor
Inspiring
July 1, 2020

Indeed, Android supports ligatures much better than current version of windows (1909, did not test 2004 yet) does. In particular it recognises Unicode ligatures as simultaneously one symbol and multiple symbols. So when you press backspace it will delete ffi (ligarture) and recreate ff (not ligarture). This is how it is supposed to work, so that search still works on multi codepoint Unicode and find letters in ligartures.

Obviously, this has nothing to do with URL processing that is a complex beast as well. Again, it is very dangerous that Acrobat processes Unicode incorrectly. I have no friends to test it with and I only use latest Acrobat DC. I have MacOS Catalina, but I only use windows 10 on my macbook, so sorry, but you will have to test it yourself.

 

I will ALSO POINT OUT that it is craziness that you use Acrobat for Android codebase that is different from Acrobat for Catalina (64 bit, hehe, so different) and windows 10.

ZBallingAuthor
Inspiring
July 5, 2020

I did a little more digging while I was helping another user with an OCR issue and  I noticed that the file that you shared is mainly based on scanned images.

 

So I opened up Acrobat and used the "Scan & OCR" tool to perform  a text recognition on this file, and chose to set the output to "Editable Text & Images".  An error message said "Acrobat could not perform recognition because: This page contains renderable text".

 

Then, I  noted that if one need to copy the "ffi" part from  the word "sufficiently" in that document, when you select a word and right-click on it, the context menu offers two copy options: 

 

  • Copy
  • Copy With Formatting

 

Copying the selected text just using "Copy" won't work because of the rendered text that was produced and laid out by the producing software on top of the scanned image layer.

 

Using "Copy With Formatting" instead, allows to copy the content to the clipboard as a text string, and be able to paste it in any other program or document as text (not as a ligature).

 

Now, opening the the Edit PDF tool, or right-clicking on the document and selecting "Edit Text" or "Edit Text & Images" allow to copy that ligature with no problem, recognized as a single symbol character, and also be able to  paste it  as is in other documents.

 

So the Unicode recognition is working.

 

Now that I noted this, I think there's really not a bug or  problem with the Unicode, since the issue is related to renderable text over  scanned images. Using the copy method described above really does the trick.

 

Any thoughts on this?  


Actually looks like Microsoft Word only supports ligtures in OpenType (not TrueType fonts). So, Georgia/Bookman Old Style are not automatically ligarture'd. You can check in Word with right click -> Font -> Advanced -> OpenType Fonts (ligatures). But still it works if you will copy to Word 0xFB03 (ffi) though it will use non-Georgia font (not that obvious as it still will write Georgia, indeed if future versions of fonts files will include ligatures binary definitions or Word will start supporting TrueType Collections it will start using Georgia font)... But then again maybe it is using Georgia)) There are rules that can do ligatures without fonts supporting them. Who knows.