Inspiring

Question

Bug in Unicode processing of a ligature ﬃ [U+FB03 : LATIN SMALL LIGATURE FFI] and others

Forum|Forum|5 years ago
June 30, 2020
2 replies
5857 views

Why Android Adobe PDF reader and Chrome support "U+FB03 : LATIN SMALL LIGATURE FFI" and Adobe Acrobat does not?? The same about PDF-Xchange, it works good... See "and all integers p and q with suﬃciently" (ﬃ is one symbol here) in the first paragraph of https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimPDF/pimeas.pdf

Also SOMEHOW it copies it as U+000E : <control> SHIFT OUT [SO], why?? Latex source https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimTeX/pimeas.tex
Also look here https://www.babelstone.co.uk/Unicode/whatisit.html

and here https://github.com/alif-type/libertinus/issues/143 (it has nice compilation of all (?) ligatures).

P.S.

"Beta: Use Unicode UTF-8 for worldwide language support" or "Edit-->> Preferences-->> Language" do not fix the issue.

Likes

This topic has been closed for replies.

Z

ZBallingAuthor

Inspiring

This is now a worse issue since after Edge added support for pdf from Adobe this issue is there too (edge://flags, New Pdf Viewer).

ls_rbls

Community Expert

Can you confirm if this is also an issue that could be related to how Unicode is supported by the operating system where Acrobat is installed?

Like for example, have you been able to test if the version of your Adobe Acrobat Pro DC behaves the same way in a computer using macOS Catalina(or older version), MS Windows 8 and/or MS Windows 10.

Since you've mentioned about Android OS , maybe it is worth to look also at the operating system where it is running from.

Just recently the last update of June 2020 addressed an issue that was aimed at Acrobat running on MS Windows, in which users were reporting back to the forums that the Weblink plug-in was not encoding/decoding URLs porpperly , for example.

This, however, is not necessarily related to your inquiry, but the fact that UTF-8 encoded URLs were malformed to begin with, it made some sense to me to ask this question because the last update only fixed this problem of Acrobat Pro DC for Windows, not macOS.

Meanwhile, some other Acrobat users who have older versions of the product, like Acrobat Pro X, Acrobat Pro XI, Acrobat DC 2017, have reported back as not experiencing the URL issue.

Have you been able to test or ask friends and/or other users if the LATIN SMALL LIGATURE FFI ligarure issue manifests consistently accross all versions of their Acrobat?

Z

ZBallingAuthor

Inspiring

Indeed, Android supports ligatures much better than current version of windows (1909, did not test 2004 yet) does. In particular it recognises Unicode ligatures as simultaneously one symbol and multiple symbols. So when you press backspace it will delete ﬃ (ligarture) and recreate ff (not ligarture). This is how it is supposed to work, so that search still works on multi codepoint Unicode and find letters in ligartures.

Obviously, this has nothing to do with URL processing that is a complex beast as well. Again, it is very dangerous that Acrobat processes Unicode incorrectly. I have no friends to test it with and I only use latest Acrobat DC. I have MacOS Catalina, but I only use windows 10 on my macbook, so sorry, but you will have to test it yourself.

I will ALSO POINT OUT that it is craziness that you use Acrobat for Android codebase that is different from Acrobat for Catalina (64 bit, hehe, so different) and windows 10.

ls_rbls

Community Expert

So, I used an online Unicode converter and I noticed that when you convert this ﬃ Unicode text character (LATIN SMALL LIGATURE) you'll get ufb03 which codebase belongs to UTF-16, not UTF-8.

UTF-8 codebase, on the other hand, returns efac83 and this ï¬ƒ as UTF-8 text.

This is weird because the UTF-8 specification should be backward compatible which also performs recognition with both Free Type and Open Type fonts.

My guess is that the encoding/decoding problem happens when UTF-8 is used and for some reason it becomes unmappable.

In my humble opinion, I think that this may explain why it gives the impression that when you use Acrobat Reader in Android OS (and other platforms) it seems to work OK because they're not using UTF-8. They're using UTF-16 instead.

To work around this in MS Windows try this:

Go to Control Panel\All Control Panel Items\Region .

Under under the "Formats" tab select "Match Windows display language (recommended)" instead of "English (United States)".

Then click on the Administrative tab, and then click on the "Change system locale..." button.

A popup will open next.

In that Regions Settings popup, uncheck the box that says " Beta: Use Unicode UTF-8 for worldwide language support", then click OK and restart.

See slide:

Usually you change the "Change the system locale" setting if your non Unicode programs are set in a different language that doesn't support Unicode, but Adobe Acrobat supports Unicode in various many languages.

For this particular reason, I would also suggest to open Acrobat , and in Edit-->> Preferences-->> Language, instead of setting the application to English, select "Same as the operating system".

After these changes are done you will be able to copy the ﬃ ligature and paste it MS Word, notepad or even in Acrobat without it being copied as U+000E (SHIFT OUT). It will (or should) be recognized as a single character symbol too.

There is an interesting discussion in this thread about this particular ligature:

https://apple.stackexchange.com/questions/130638/what-are-these-characters-from-the-os-x-keyboard

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded