Skip to main content
Inspiring
June 30, 2020
Question

Bug in Unicode processing of a ligature ffi [U+FB03 : LATIN SMALL LIGATURE FFI] and others

  • June 30, 2020
  • 2 replies
  • 5857 views

Why Android Adobe PDF reader and Chrome support "U+FB03 : LATIN SMALL LIGATURE FFI" and Adobe Acrobat does not?? The same about PDF-Xchange, it works good... See "and all integers p and q with sufficiently" (ffi is one symbol here) in the first paragraph of https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimPDF/pimeas.pdf

Also SOMEHOW it copies it as  U+000E : <control> SHIFT OUT [SO], why?? Latex source https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimTeX/pimeas.tex
Also look here https://www.babelstone.co.uk/Unicode/whatisit.html

and here https://github.com/alif-type/libertinus/issues/143 (it has nice compilation of all (?) ligatures).
P.S. 

"Beta: Use Unicode UTF-8 for worldwide language support" or "Edit-->> Preferences-->> Language" do not fix the issue.

 

Likes

 
This topic has been closed for replies.

2 replies

ZBallingAuthor
Inspiring
April 7, 2024

This is now a worse issue since after Edge added support for pdf from Adobe this issue is there too (edge://flags, New Pdf Viewer).

ls_rbls
Community Expert
Community Expert
July 1, 2020

Can you confirm if this is also an issue that could be related to how Unicode is supported by the operating system where Acrobat is installed? 

 

Like for example, have you been able to test if the version of your Adobe Acrobat Pro DC behaves the same way  in a computer using macOS Catalina(or older version), MS Windows 8 and/or MS Windows 10.

 

Since you've mentioned about Android OS , maybe it is worth to look also at the operating system where it is running from.

 

Just recently the last update of June 2020  addressed an issue that was aimed at Acrobat running on MS Windows, in which  users were reporting back to the forums that the Weblink plug-in was not encoding/decoding URLs porpperly , for example.

 

This, however,  is not necessarily related to your inquiry, but the fact that UTF-8 encoded URLs were malformed to begin with, it made some sense to me to ask this question because the last update only fixed this problem of Acrobat Pro DC for Windows, not macOS.

 

Meanwhile, some other Acrobat users who have older versions of the product, like Acrobat Pro X, Acrobat Pro XI, Acrobat  DC 2017, have reported back as not  experiencing the URL issue. 

 

Have you been able to test or ask friends and/or other users if the  LATIN SMALL LIGATURE FFI ligarure issue manifests consistently accross all versions of their Acrobat?

 

ZBallingAuthor
Inspiring
July 1, 2020

Indeed, Android supports ligatures much better than current version of windows (1909, did not test 2004 yet) does. In particular it recognises Unicode ligatures as simultaneously one symbol and multiple symbols. So when you press backspace it will delete ffi (ligarture) and recreate ff (not ligarture). This is how it is supposed to work, so that search still works on multi codepoint Unicode and find letters in ligartures.

Obviously, this has nothing to do with URL processing that is a complex beast as well. Again, it is very dangerous that Acrobat processes Unicode incorrectly. I have no friends to test it with and I only use latest Acrobat DC. I have MacOS Catalina, but I only use windows 10 on my macbook, so sorry, but you will have to test it yourself.

 

I will ALSO POINT OUT that it is craziness that you use Acrobat for Android codebase that is different from Acrobat for Catalina (64 bit, hehe, so different) and windows 10.

ls_rbls
Community Expert
Community Expert
July 1, 2020

So, I used an online Unicode converter and I noticed that when you convert this ffi Unicode text character (LATIN SMALL LIGATURE)  you'll get  ufb03  which  codebase belongs to UTF-16, not UTF-8.  

 

UTF-8 codebase, on the other hand,  returns  efac83  and this   ï¬ƒ  as UTF-8 text.

 

This is weird because the  UTF-8 specification should be backward compatible which also performs recognition with both Free Type and Open Type fonts. 

 

My guess is that the encoding/decoding  problem happens when UTF-8 is used and for some reason it becomes unmappable. 

 

In my humble opinion, I think that this may explain why it gives the impression that when you use Acrobat Reader in Android OS (and other platforms) it seems to work OK  because they're not using UTF-8. They're using UTF-16 instead.

 

To work around this in MS Windows try this: 

 

  • Go to Control Panel\All Control Panel Items\Region .

 

  • Under under the "Formats" tab select "Match Windows display language (recommended)" instead of "English (United States)".

 

  • Then click on the Administrative tab, and then click on the "Change system locale..." button.

 

A popup will open next.

 

  • In that Regions Settings popup, uncheck the box that says " Beta: Use Unicode UTF-8 for worldwide language support", then click OK and restart.

 

See slide:

 

 

Usually you change the "Change the system locale" setting if your non Unicode programs are set in a different language that doesn't support Unicode, but Adobe Acrobat supports Unicode in various many languages.

 

For this particular reason, I would also suggest to open Acrobat , and in Edit-->> Preferences-->> Language, instead of setting the application to English, select "Same as the operating system".

 

After these changes are done you will be able to copy the    ligature and paste it MS Word, notepad or even in Acrobat without it  being copied as U+000E (SHIFT OUT). It will (or should) be recognized as a single character symbol too. 

 

There is an interesting discussion in this thread about this particular ligature:

 https://apple.stackexchange.com/questions/130638/what-are-these-characters-from-the-os-x-keyboard