Not all language tagging makes it from Word to Acrobat when using Save as PDF
There is an issue which I do not know whether is a Word issue, an issue with the Save as PDF routine or an Acrobat issue. I will report it here in case it is an Acrobat issue but I will also report it to Microsoft:
Some language attributes that can be set in HTML and saved as a Word document do not make it over to Adobe Acrobat as part of the Save as PDF process while others do. For the ones that make it over, the way they are represented in the accessibility tag is inconsistent from language to language.
This is a problem because it results in documents that violate WCAG 2.1 3.1.1 language of page (A) and 3.1.2 language of parts (AA).
If Word sends a language code to acrobat through Save as PDF, I expect Acrobat to maintain the code in the tags without any alteration even if the particular language code is not supported in Acrobat.
Example one: code originating from HTML:
Given the following HTML code:
<!DOCTYPE html>
<!-- saved from url=(0022)https://www.sfmta.com/ -->
<html lang="en" dir="ltr" prefix="og: https://ogp.me/ns#" class="sfmta js tablesaw-enhanced" data-once="tablesaw-create"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head>
<body><div><p><span>☎</span> 311 (Outside SF 415.701.2311; TTY 415.701.2323) Free language assistance /
<span lang="zh-hant" dir="ltr">免費語言協助</span><span> /
</span><span lang="es" dir="ltr">Ayuda gratis con el idioma</span><span> / </span>
<span lang="ru" dir="ltr">Бесплатная помощь переводчиков</span><span> / </span>
<span lang="vi" dir="ltr">Trợ giúp Thông dịch Miễn phí</span><span> / </span>
<span lang="fr" dir="ltr">Assistance linguistique gratuite</span><span> / </span>
<span lang="ja" dir="ltr">無料の言語支援</span><span> / </span>
<span lang="ko" dir="ltr">무료 언어 지원</span><span> / </span>
<span lang="tl" dir="ltr">Libreng tulong para sa wikang Filipino</span>
</p></div>
</body></html>If I open the document in Word 365 and Save as PDF, when I open the document in Acrobat, the content retains or fails to retain the following language attributes in the accessibility tags:
- - The document language is not maintained (bad)
- - zh-hant is maintained as ZH-HANT (good)
- - es is maintained as Spanish (good, but inconsistent with ZH-HANT, so I'm not sure which is correct, or if both are)
- - ru is maintained as RU (good)
- - vi is maintained as VI (good)
- - fr is maintained as FR-FR (probably good, although it is making an assumption here that I am using France French and not Canadian French; that is probably true but it's still an assumption)
- - ja is maintained as Japanese (good, but inconsistent)
- - ka is maintained as Korean (good, but inconsistent)
- - tl is not maintained as TL (bad) (It is true that TL is Tagalog and not Filipino; we have a policy to call it Filipino but WebAIM advises the 3-letter language codes such as "fil" are problematic for screen readers)
Example two: Code originating as a Word document:
Given the following plain text, without any markup for language:
☎ 311 (Outside SF 415.701.2311; TTY 415.701.2323) Free language assistance / 免費語言協助 / Ayuda gratis con el idioma / Бесплатная помощь переводчиков / Trợ giúp Thông dịch Miễn phí / Assistance linguistique gratuite / 無料の言語支援 / 무료 언어 지원 / Libreng tulong para sa wikang Filipino
If I paste that text into a blank word document and then do the following steps:
1. Review, language, language, set proofing language, current document, English (United States), OK
2. Select "免費語言協助"
3. Review, language, language, set proofing language, selected text, Chinese (Traditional, Taiwan), OK
4. Select "Ayuda gratis con el idioma"
5. Review, language, language, set proofing language, selected text, Spanish (United States), OK
6. Select "Бесплатная помощь переводчиков"
7. Review, language, language, set proofing language, selected text, Russian, OK
8. Select "Trợ giúp Thông dịch Miễn phí"
9. Review, language, language, set proofing language, selected text, Vietnamese, OK
10. Select "Assistance linguistique gratuite"
11. Review, language, language, set proofing language, selected text, French (France), OK
12. Select "無料の言語支援"
13. Review, language, language, set proofing language, selected text, Japanese, OK
14. Select "무료 언어 지원"
15. Review, language, language, set proofing language, selected text, Korean, OK
16. Select "Libreng tulong para sa wikang Filipino"
17. Review, language, language, set proofing language, selected text, Filipino, OK
18. Save
19. Save as PDF
20. Check the tagging in Acrobat Pro
- - The enclosing paragraph tag is shown as being in English (US) (sort of good, but the enclosing document tag does not show a language so is technically a violation of WCAG 3.1.1 language of page)
- - Chinese (Traditional, Taiwan) is maintained as ZH-HANT (good)
- - Spanish (United States) is lost (bad)
- - Russian is maintained as RU (good)
- - Vietnamese is maintained as VI (good)
- - French (France) is maintained as FR-FR (good)
- - Japanese is maintained as Japanese (good, but inconsistent)
- - Korean is maintained as Korean (good, but inconsistent)
- - Filipino is lost (bad)
21. If I re-edit the word document, and change the review language for the Spanish text from Spanish (United States) to Spanish (Mexico), and we save as a PDF:
- - Spanish (Mexico) is maintained as ES-MX (good)
