Copy link to clipboard
Copied
Hi all,
I have been trying to prepare a document for any future text-to-speech software use, by correctly differentiating between english and hebrew words between the text.
Hebrew is however, not available as a language within the tag settings, despite being available as an adobe document language. There is also an option for "english with hebrew support" for acrobat as a whole, however I did not see it making any difference. What should be my approach here?
Thank you
Copy link to clipboard
Copied
Hi @Lucas30659270pj31 ,
Notice: this is a long, detailed answer!
First, there are two ways to designate language in a PDF. One is global that sets the primary language for the entire PDF file, the other is for selected paragraphs or words that use a secondary language.
Try this:
1. Set the primary language in File / Document Properties / in the Advanced tab. For this example, let's select English as the global language.
Note that Hebrew and English as pre-specified in the drop-down menu.
2. Set the secondary language for an entire paragraph of text, such as <P>, <H2>, etc.
Almost as easy to do!
Select the tag <P>, right-click, and select Properties.
From the drop-down menu for Language, choose the appropriate language. Note that Hebrew is not pre-specified.
For any language that is not listed in the Drop-Down menu, you'll need to reference the standard ISO language codes called ISO 639. You can't type in the language's name. See https://www.loc.gov/standards/iso639-2/php/code_list.php for a list of the ISO 639 language codes. When possible, use the ISO 2-letter code, not the 3-letter code. And these are case sensitive so carefully follow the chart.
3. Set the secondary language on one or more words within a paragraph, not the entire paragraph. In this case, the Language attribute is placed on a <Span> tag that surrournds the words.
Expand the <P> tag. Sometimes, there already will be an <Span> tag you can use, but in this example, we'll make one to fit our text.
Step 1: With the selection cursor, highlight the text that you want to be in the secondary language.
Step 2: Right-click the yellow content container (not the tag) and Step 3 choose Create Tag From Selection. This function is a bit buggy in Acrobat, and you might not get a separate <Span> tag as intended,
Right-click on the <Span> tag and select the language as detailed above.
4. Since the bug in method #3 above might not have been corrected by Adobe, use the following method to manually add a <Span> tag.
Step 1: From the Order Panel, open the Reading Order tools panel.
Step 2: With the crosshairs cursor, select all of the letters (and punctuation) to be in the secondary language. You might need to drag-marquee one or more times to get all of the characters.
Step 3: Once selected, designate them to be Text Paragraph. We'll change this later to <Span> and drag it into the right spot in the tag tree.
Step 4: Switch back to the Tag Tree and the new <P> tag will appear as a separate paragraph there.
Step 5: Right-Click, select Properties, and change it to a <Span> tag from the top menu.
Step 6: Then, drag/drop the new <Span> tag into the correct location in the tag tree, which is nested inside it's <P> and in the right reading order with the remaining text in the paragraph.
Step 7: And, of course, change the language on the <Span> tag as described above.
Phew! What a nightmare of steps!
I've attached a sample PDF of the completed tags.
All of this would be moot if both Adobe and Microsoft would fix the *&^%$#@! bug in the Word-to-PDF export.
In theory, you can set the language for both paragraph and character formatting styles in Word, and when exported to PDF, all of the language attributes are done.
But this has been a long-standing bug with both companies for a really long time. The language tags are not converted into the exported PDF and we end up with these manual workarounds in Acrobat (which has its own bugs) that are detailed above.
Sigh. I dream of the day when some other company from somewhere will give us tools for full accessibility and we can then ditch our Microsoft and Adobe programs, 2 companies who don't seem to give a damn about accessibility rights of ordinary human beings.
Hope this gets you out of the bind.
.
Copy link to clipboard
Copied
@Bevi Chagnon - PubCom.com ? Do you have an idea about this?
Copy link to clipboard
Copied
Hi @Lucas30659270pj31 ,
Notice: this is a long, detailed answer!
First, there are two ways to designate language in a PDF. One is global that sets the primary language for the entire PDF file, the other is for selected paragraphs or words that use a secondary language.
Try this:
1. Set the primary language in File / Document Properties / in the Advanced tab. For this example, let's select English as the global language.
Note that Hebrew and English as pre-specified in the drop-down menu.
2. Set the secondary language for an entire paragraph of text, such as <P>, <H2>, etc.
Almost as easy to do!
Select the tag <P>, right-click, and select Properties.
From the drop-down menu for Language, choose the appropriate language. Note that Hebrew is not pre-specified.
For any language that is not listed in the Drop-Down menu, you'll need to reference the standard ISO language codes called ISO 639. You can't type in the language's name. See https://www.loc.gov/standards/iso639-2/php/code_list.php for a list of the ISO 639 language codes. When possible, use the ISO 2-letter code, not the 3-letter code. And these are case sensitive so carefully follow the chart.
3. Set the secondary language on one or more words within a paragraph, not the entire paragraph. In this case, the Language attribute is placed on a <Span> tag that surrournds the words.
Expand the <P> tag. Sometimes, there already will be an <Span> tag you can use, but in this example, we'll make one to fit our text.
Step 1: With the selection cursor, highlight the text that you want to be in the secondary language.
Step 2: Right-click the yellow content container (not the tag) and Step 3 choose Create Tag From Selection. This function is a bit buggy in Acrobat, and you might not get a separate <Span> tag as intended,
Right-click on the <Span> tag and select the language as detailed above.
4. Since the bug in method #3 above might not have been corrected by Adobe, use the following method to manually add a <Span> tag.
Step 1: From the Order Panel, open the Reading Order tools panel.
Step 2: With the crosshairs cursor, select all of the letters (and punctuation) to be in the secondary language. You might need to drag-marquee one or more times to get all of the characters.
Step 3: Once selected, designate them to be Text Paragraph. We'll change this later to <Span> and drag it into the right spot in the tag tree.
Step 4: Switch back to the Tag Tree and the new <P> tag will appear as a separate paragraph there.
Step 5: Right-Click, select Properties, and change it to a <Span> tag from the top menu.
Step 6: Then, drag/drop the new <Span> tag into the correct location in the tag tree, which is nested inside it's <P> and in the right reading order with the remaining text in the paragraph.
Step 7: And, of course, change the language on the <Span> tag as described above.
Phew! What a nightmare of steps!
I've attached a sample PDF of the completed tags.
All of this would be moot if both Adobe and Microsoft would fix the *&^%$#@! bug in the Word-to-PDF export.
In theory, you can set the language for both paragraph and character formatting styles in Word, and when exported to PDF, all of the language attributes are done.
But this has been a long-standing bug with both companies for a really long time. The language tags are not converted into the exported PDF and we end up with these manual workarounds in Acrobat (which has its own bugs) that are detailed above.
Sigh. I dream of the day when some other company from somewhere will give us tools for full accessibility and we can then ditch our Microsoft and Adobe programs, 2 companies who don't seem to give a damn about accessibility rights of ordinary human beings.
Hope this gets you out of the bind.
.
Copy link to clipboard
Copied
Thank you - that is a brilliant answer.
Do we have any clue if the "english with hebrew support" language option actually does anything? It seems to make no difference whatsoever.
Thanks,
Lucas
Copy link to clipboard
Copied
Having been a beta tester for both Adobe and Microsoft for 35 years, my crystal ball broke a long long time ago. I no longer depend upon either company for my work, and I don't ever believe anything from any employee of these 2 companies.
The engineers are nice people, however. They're just not in charge of what they do. The marketing departments give the engineers their marching orders.