Highlighted

Word 365 to Acrobat: PDFMaker generates wrong tags

Explorer ,
Aug 31, 2020

Copy link to clipboard

Copied

I used the Save As Adobe PDF button in Word 365 to convert a file to Acrobat. When I open the PDF in Acrobat DC Pro, the tags are incorrect.

 

The first item in the TOC is tagged <TOCI> in the Tags Pane, but the rest have heading tags (e.g., H1, H2, H3). In the Page View, the tags say <Reference>. (The items with heading tags are not inside the <TOC> tag.) 

 

Random paragraphs are tagged as <TOCI> in the Tags pane when they should be <P>; a few headings also show the <TOCI>. However, in the Page View, they show as <P> (or the correct heading tag). 

 

Some figure captions are tagged <H2> in the Tags Pane when they normally come across as <P>. In the Page View, they show <P>. 

 

Figures come across tagged as <P> in both the Tags Pane and the Page View. When I try to change them to <Figure> (either by selecting and clicking Figure in the TURO, or by changing directly in the Tags Pane), Acrobat DC won't change them. 

 

What's going on, and how do I correct it? 

TOPICS
Create PDFs, Edit and convert PDFs, Standards and accessibility

Views

65

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Word 365 to Acrobat: PDFMaker generates wrong tags

Explorer ,
Aug 31, 2020

Copy link to clipboard

Copied

I used the Save As Adobe PDF button in Word 365 to convert a file to Acrobat. When I open the PDF in Acrobat DC Pro, the tags are incorrect.

 

The first item in the TOC is tagged <TOCI> in the Tags Pane, but the rest have heading tags (e.g., H1, H2, H3). In the Page View, the tags say <Reference>. (The items with heading tags are not inside the <TOC> tag.) 

 

Random paragraphs are tagged as <TOCI> in the Tags pane when they should be <P>; a few headings also show the <TOCI>. However, in the Page View, they show as <P> (or the correct heading tag). 

 

Some figure captions are tagged <H2> in the Tags Pane when they normally come across as <P>. In the Page View, they show <P>. 

 

Figures come across tagged as <P> in both the Tags Pane and the Page View. When I try to change them to <Figure> (either by selecting and clicking Figure in the TURO, or by changing directly in the Tags Pane), Acrobat DC won't change them. 

 

What's going on, and how do I correct it? 

TOPICS
Create PDFs, Edit and convert PDFs, Standards and accessibility

Views

66

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Aug 31, 2020 0
Adobe Community Professional ,
Aug 31, 2020

Copy link to clipboard

Copied

When you save as PDF, are you using the function built into Word, or are you using Acrobat's PDFMaker (the Acrobat ribbon in Word)? If it's Word, then unfortunately, this is not the right place to get answers, you will have to talk to Microsoft. If it's Acrobat, then we need to dig further to see what's going on. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 31, 2020 0
Explorer ,
Aug 31, 2020

Copy link to clipboard

Copied

Fair question. I tried it both ways, with the same result. 

 

Guy

 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 31, 2020 0
Adobe Community Professional ,
Aug 31, 2020

Copy link to clipboard

Copied

That's interesting, and points to a problem in Word, and not in the PDF generation. The two different PDF generators you've tried don't share any code, so if both come up with the same tag tree, then the problem is with the information that the Word document provides when the file is exported to PDF. Unless somebody here can help you with a Word problem, I would suggest that you ask this question in a forum that's about MS Word. I am not familiar enough with all details regarding tagging in Word - all I know is that the outline level gets used to determine what tags to use. My more in depth tagging experience is limited to Adobe InDesign. As far as general troubleshooting goes, I would check to see if this happens with all documents, or just with one or a small number. If not all documents are affected,  I would look into recreating at least part of the document from scratch, including recreating paragraph styles, to see if something in the document got messed up. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 31, 2020 1
Adobe Community Professional ,
Aug 31, 2020

Copy link to clipboard

Copied

Ah, I see you now have Bevi's attention, she is the expert when it comes to accessiblity. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 31, 2020 1
Adobe Community Professional ,
Aug 31, 2020

Copy link to clipboard

Copied

Well, I had to "like" that comment, Karl! (Who's no chump change, himself.)

 

Bevi Chagnon | Designer & Technologist for Accessible InDesign + PDFs | Books @ www.PubCom.com/books — NEW! Accessible InDesign + PDF

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 31, 2020 0
Adobe Community Professional ,
Aug 31, 2020

Copy link to clipboard

Copied

<QUOTE> "I used the Save As Adobe PDF button in Word 365 to convert a file to Acrobat. When I open the PDF in Acrobat DC Pro, the tags are incorrect."

 

I'm assuming you're on a Windows computer, and that this command was under the File menu.

 

Can you try making a PDF using the Acrobat Ribbon?

First, check the Preferences in the ribbon and make sure your accessibility settings are correct.

Export Preferences for Accessible tagged PDF from MS Word / Windows.Export Preferences for Accessible tagged PDF from MS Word / Windows.

 

Then, Create PDF.

Acrobat PDF Maker ribbon in MS Word / Windows.Acrobat PDF Maker ribbon in MS Word / Windows.

 

Bevi Chagnon | Designer & Technologist for Accessible InDesign + PDFs | Books @ www.PubCom.com/books — NEW! Accessible InDesign + PDF

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 31, 2020 0
Adobe Community Professional ,
Aug 31, 2020

Copy link to clipboard

Copied

@inkedguy, after checking your export settings, drill down into your Word file and ensure these were done correctly. Given the hodgepodge of errors you're having, I'm suspecting it might be due to how your Word document was formatted.

 

One key rule: paragraph formatting styles must be applied to all of your text. The styles trigger the correct tags in the exported PDF.

 

Check these items in Word:

 

  1. Word's TOC utility was used to create the TOC, and was not made by hand.  References ribbon/tab | Table of Contents icon. For now, choose one of the defaults: Automatic Table 1 or Automatic Table 2.
  2. Check which paragraph formatting styles were applied to your headings (and other text, if you have time).
    Open the Styles Pane, and from it, Open the Styles Inspector. As you click inside each heading or paragraph of text, verify that the style Heading 1 was applied to the heading you want to be tagged <H1> in the PDF. Similar for the remaining headings, Heading 2 style = <H2>, Heading 3 style = <H3>, you get the pattern!
  3. You might need to clear out any residual formatting on those paragraphs in order to get the tags to come out correctly. If that's the case, select the paragraph of text, and click the Clear All formatting button from either the Styles Pane or the Styles Inspector. Then reapply the correct paragraph style to the paragraph.

 

Notes: In order to generate a TOC with the correct tags <TOC> | <TOCI> plus the accessible links tags, you must:

  • Use the correct heading paragraph styles to format your document's headings,
  • Use Word's TOC utility to generate the TOC, and
  • Don't manually edit the TOC after it is created. It's a generated part of the file and you don't want to mess with it.

 

In order to generate the correct heading tags in a PDF <H1>, <H2>, etc., you must use the corresponding Heading 1, Heading 2, etc. paragraph formatting styles to format the heading paragraphs. There are no exceptions.

 

Bevi Chagnon | Designer & Technologist for Accessible InDesign + PDFs | Books @ www.PubCom.com/books — NEW! Accessible InDesign + PDF

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 31, 2020 0
Explorer ,
Aug 31, 2020

Copy link to clipboard

Copied

Hello again, Bevi. 

 

Yes, it's a Windows computer (government issue). I ended up with the same result the second time around, using the Acrobat tab in the ribbon. 

 

I've been seeing the heading tags in the TOC all along (like for the past year, both on my old laptop and this new one). The random TOCI tags are new. Does it matter that it says <TOCI> in the Tags Pane, but shows <P> in the Page View? Which one is the 'real' tag? 

 

This is not the first file where I couldn't get the <Figure> tag to stick; some of the others came from other people, and I don't know how they generated their files. I just don't know why it happens. 

 

For the record, I am using styles in Word 365, with all heading style applied correctly where needed. The TOC is generated, not hand-typed, using TOC styles. 

 

Karl may be right that it's a Word problem, but I'm certainly all ears to hear what you think, too. 

 

Guy

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 31, 2020 0
Adobe Community Professional ,
Aug 31, 2020

Copy link to clipboard

Copied

Most likely there's something wrong with your Word file. Word itself is fairly stable and accurate (albeit, not perfect, and you could have either an outdated version or a corrumpted version). We've been govt consultants for decades and have found the biggest culprits are:

  • User error (lack of knowlege of how to correctly construct a Word document for accessibility).
  • Code crud or outdated formatting left in the document from previous versions of Word software. This crud prevents the PDF-export utility from interpreting the document and tagging it correctly. More later about this...
  • Outdated software, either Word or PDF Maker.

 

How about a test to help diagnose where the problem lies? With the file, or with Word, or even with PDF Maker?

 

If possible, download these test Word and matching PDFs from our students' resource website: https://www.pubcom.com/testfiles/ 

 

  1. Open the first base.docx file in your version of Word and export it to PDF.
    1. Did you get the correct heading tags?
  2. Then, add a TOC to it with Word's TOC utility, and export this new version to PDF.
    1. Did you get the correct TOC/TOCI tags?

 

QUOTE: "Does it matter that it says <TOCI> in the Tags Pane, but shows <P> in the Page View? Which one is the 'real' tag?"

Page View  = Thumbnails/Pages pane, and I don't think that's what you mean.

Do you mean the Order panel (aka, architectural/construction order, or Z-order)?

 

Only the Tags Tree is required to meet PDF/UA-1 compliance, and its tags and reading order are primary.

 

The tags you see in the Arch/Const Order are usually not correct. It's an Acrobat bug, but since the Tags Tree supercedes everything for accessibility, it doesn't affect your compliance. Recommend that you change the options in the Order panel to show the numbered order rather than the tags.

 

Note, you do want to ensure that the Arch/Const Order has a decent reading order because many assistive technologies, as well as commonly used tech, uses it rather than the Tags Tree. But this is not required for PDF/UA-1 compliance, just a really smart best practice to ensure your government documents don't leave anyone out of the loop. See our recent blog about this at The 4 Reading Orders in Accessible PDFs 

Tags Tree from base Word.docx exported to PDF.Tags Tree from base Word.docx exported to PDF.

 

Order (Architectural/Construction Order) panel.Order (Architectural/Construction Order) panel.

 

Let us know what you find out with the test above.

 

Also open the  PDF's Properties panel and see which version of PDF Maker your system used.

File | Properties | Description Tab | PDF Producer.

 

Bevi Chagnon | Designer & Technologist for Accessible InDesign + PDFs | Books @ www.PubCom.com/books — NEW! Accessible InDesign + PDF

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 31, 2020 1
Explorer ,
Sep 01, 2020

Copy link to clipboard

Copied

Good morning, Bevi.

 

As previously stated, the heading styles (Heading 1 through Heading 5) were used appropriately and the TOC was generated, not hand-typed. The TOC was Custom, not Automatic 1, so I deleted it and regenerated it using Automatic 1. Then I generated a new PDF. The new PDF still had the random TOCI tags, as well as <H2> and <H3> tags in the table of contents, rather than <TOCI> tags. 

 

This is where it gets interesting. I decided to generate the PDF again, but left out the TOC pages. This time, the PDF didn't have the random TOCI tags in the text! However, it did still have have the problem where a figure has a <P> tag on it, and won't accept being changed to <Figure>; it just stays <P>.

 

RE: tags in the tags pane vs. tags in the page view... I was sure I was calling it the wrong thing. I meant the actual view of the page, when you have the TURO tool open and can see the tags next to each paragraph. See screen shot.

 

Tags in Tags Pane vs. tags displayed on paragraphTags in Tags Pane vs. tags displayed on paragraph

 

The tags in the Tags Pane say one thing, but the tags on the paragraph say something else. That's why I wondered which one was "real."

 

I will download those sample files from your link and see what happens when I PDF them. Will get back to you...

 

Guy Ivie

Technical Writer

US Army Corps of Engineers

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 01, 2020 0
inkedguy LATEST
Explorer ,
Sep 01, 2020

Copy link to clipboard

Copied

Latest update: I created PDFs from the test files. The TOCs look fine. However, the order of tags in paragraphs that contained links didn't match. See screen shot:

Tags in your PDF vs. tags in mineTags in your PDF vs. tags in mine

 

Then it occurred to me that I hadn't checked the Acrobat preferences in Word. I found only one difference there: mine had Enable Advanced Tagging checked. I unchecked it, re-generated the PDF, and voila! The tags looked the same as yours.

 

I also inserted a graphic into the test Word file, adding alt text and a caption. In the PDF, the figure had a <Figure> tag, although it was inside a <P> tag. (This was after changing the preferences.)

 

I regenerated the PDF from my work file, with the correct preferences set in the Acrobat tab. The random <TOCI> tags were gone. Figures appeared the same as the one I inserted into your test file ( image file inside a <Figure> tag, which was inside a <P> tag). And I got no argument when changing the <Figure> tag to something else and back again.

 

But still getting heading tags instead of <TOCI> tags. See screenshot:

 

<H> tags instead of <TOCI> tags.<H> tags instead of <TOCI> tags.

 

This was a freshly generated TOC, using the Automatic 2 TOC style. When I look at the heading styles, the inspector shows the heading style in the Paragraph Formatting box, plus <none> in the box below it, Default Paragraph in the Text Level Formatting box, and plus <none> in the box below that. In the generated TOC, those boxes show TOC 1 (or 2 or 3), plus <none>, Hyperlink, and plus <none>.

 

I am... baffled.

 

Guy Ivie

Technical Writer

US Army Corps of Engineers

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Sep 01, 2020 0