Inspiring

Question

Bug in Unicode processing of a ligature ﬃ [U+FB03 : LATIN SMALL LIGATURE FFI] and others

Forum|Forum|5 years ago
June 30, 2020
2 replies
5879 views

Why Android Adobe PDF reader and Chrome support "U+FB03 : LATIN SMALL LIGATURE FFI" and Adobe Acrobat does not?? The same about PDF-Xchange, it works good... See "and all integers p and q with suﬃciently" (ﬃ is one symbol here) in the first paragraph of https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimPDF/pimeas.pdf

Also SOMEHOW it copies it as U+000E : <control> SHIFT OUT [SO], why?? Latex source https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimTeX/pimeas.tex
Also look here https://www.babelstone.co.uk/Unicode/whatisit.html

and here https://github.com/alif-type/libertinus/issues/143 (it has nice compilation of all (?) ligatures).

P.S.

"Beta: Use Unicode UTF-8 for worldwide language support" or "Edit-->> Preferences-->> Language" do not fix the issue.

Likes

This topic has been closed for replies.

Z

ZBallingAuthor

Inspiring

This is now a worse issue since after Edge added support for pdf from Adobe this issue is there too (edge://flags, New Pdf Viewer).

ls_rbls

Community Expert

Can you confirm if this is also an issue that could be related to how Unicode is supported by the operating system where Acrobat is installed?

Like for example, have you been able to test if the version of your Adobe Acrobat Pro DC behaves the same way in a computer using macOS Catalina(or older version), MS Windows 8 and/or MS Windows 10.

Since you've mentioned about Android OS , maybe it is worth to look also at the operating system where it is running from.

Just recently the last update of June 2020 addressed an issue that was aimed at Acrobat running on MS Windows, in which users were reporting back to the forums that the Weblink plug-in was not encoding/decoding URLs porpperly , for example.

This, however, is not necessarily related to your inquiry, but the fact that UTF-8 encoded URLs were malformed to begin with, it made some sense to me to ask this question because the last update only fixed this problem of Acrobat Pro DC for Windows, not macOS.

Meanwhile, some other Acrobat users who have older versions of the product, like Acrobat Pro X, Acrobat Pro XI, Acrobat DC 2017, have reported back as not experiencing the URL issue.

Have you been able to test or ask friends and/or other users if the LATIN SMALL LIGATURE FFI ligarure issue manifests consistently accross all versions of their Acrobat?

Z

ZBallingAuthor

Inspiring

Indeed, Android supports ligatures much better than current version of windows (1909, did not test 2004 yet) does. In particular it recognises Unicode ligatures as simultaneously one symbol and multiple symbols. So when you press backspace it will delete ﬃ (ligarture) and recreate ff (not ligarture). This is how it is supposed to work, so that search still works on multi codepoint Unicode and find letters in ligartures.

Obviously, this has nothing to do with URL processing that is a complex beast as well. Again, it is very dangerous that Acrobat processes Unicode incorrectly. I have no friends to test it with and I only use latest Acrobat DC. I have MacOS Catalina, but I only use windows 10 on my macbook, so sorry, but you will have to test it yourself.

I will ALSO POINT OUT that it is craziness that you use Acrobat for Android codebase that is different from Acrobat for Catalina (64 bit, hehe, so different) and windows 10.

Z

ZBallingAuthor

Inspiring

I did a little more digging while I was helping another user with an OCR issue and I noticed that the file that you shared is mainly based on scanned images.

So I opened up Acrobat and used the "Scan & OCR" tool to perform a text recognition on this file, and chose to set the output to "Editable Text & Images". An error message said "Acrobat could not perform recognition because: This page contains renderable text".

Then, I noted that if one need to copy the "ffi" part from the word "sufficiently" in that document, when you select a word and right-click on it, the context menu offers two copy options:

Copy
Copy With Formatting

Copying the selected text just using "Copy" won't work because of the rendered text that was produced and laid out by the producing software on top of the scanned image layer.

Using "Copy With Formatting" instead, allows to copy the content to the clipboard as a text string, and be able to paste it in any other program or document as text (not as a ligature).

Now, opening the the Edit PDF tool, or right-clicking on the document and selecting "Edit Text" or "Edit Text & Images" allow to copy that ligature with no problem, recognized as a single symbol character, and also be able to paste it as is in other documents.

So the Unicode recognition is working.

Now that I noted this, I think there's really not a bug or problem with the Unicode, since the issue is related to renderable text over scanned images. Using the copy method described above really does the trick.

Any thoughts on this?

Actually looks like Microsoft Word only supports ligtures in OpenType (not TrueType fonts). So, Georgia/Bookman Old Style are not automatically ligarture'd. You can check in Word with right click -> Font -> Advanced -> OpenType Fonts (ligatures). But still it works if you will copy to Word 0xFB03 (ﬃ) though it will use non-Georgia font (not that obvious as it still will write Georgia, indeed if future versions of fonts files will include ligatures binary definitions or Word will start supporting TrueType Collections it will start using Georgia font)... But then again maybe it is using Georgia)) There are rules that can do ligatures without fonts supporting them. Who knows.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded