Copy link to clipboard
Copied
I'm having a PDF which contains a tagged structure which defines the logical reading order by specified MCID for each tech object. Text objects in page content are placed in different order.
Article 14.7.1 in ISO 32000 states:
"A PDF document’s logical structure shall be stored separately from its visible content, with pointers from each to the other. This separation allows the ordering and nesting of logical elements to be entirely independent of the order and location of graphics objects on the document’s pages."
Based on the above the logical order defined by tags does not have to reflect the page object order.
Acrobat DC (older versions too) seems to have problem with this and in the attached file it does not highlight the whole text on the page and is not able to copy the text in the correct reading order.
The following is a link to the file:
Is this a bug in Acrobat or am I missing something?
The file I'm testing is very minimal and contains the following tag structure. MCIDs of each text element goes from 0 to 8 from top to bottom which you can check with the page content below.
And the page content looks like this:
/P <</MCID 8>>BDC
BT
/F0 10 Tf
0.71 0 0 0.523223 429.119995 652.188049 Tm
( THE)Tj
0.714286 0 0 0.523223 446.160004 652.188049 Tm
( PUBLIC)Tj
ET
EMC
/P <</MCID 7>>BDC
BT
/F0 10 Tf
0.56 0 0 0.523223 419.040009 652.188049 Tm
( IN)Tj
ET
EMC
/P <</MCID 5>>BDC
BT
/F0 10 Tf
0.777143 0 0 0.545972 297.22287 651.764954 Tm
(BEFORE)Tj
0.533333 0 0 0.545972 325.200012 651.764954 Tm
( IT)Tj
0.48 0 0 0.545972 334.799988 651.764954 Tm
( IS)Tj
0.68 0 0 0.523223 343.440002 651.948059 Tm
( FILED)Tj
0.73 0 0 0.523223 367.919983 651.948059 Tm
( FOR)Tj
ET
EMC
/P <</MCID 6>>BDC
BT
/F0 10 Tf
0.8 0 0 0.523223 390.23999 651.948059 Tm
(RECORD)Tj
ET
EMC
/P <</MCID 4>>BDC
BT
/F0 10 Tf
0.795556 0 0 0.545972 254.373337 651.764954 Tm
(PROPERTY)Tj
ET
EMC
/P <</MCID 1>>BDC
BT
/F0 10 Tf
0.693333 0 0 0.545972 165.600006 651.524963 Tm
( AN)Tj
ET
EMC
/P <</MCID 3>>BDC
BT
/F0 10 Tf
0.546667 0 0 0.545972 217.439987 651.524963 Tm
( IN)Tj
0.744 0 0 0.545972 227.279999 651.524963 Tm
( REAL)Tj
ET
EMC
/P <</MCID 2>>BDC
BT
/F0 10 Tf
0.728889 0 0 0.545972 178.080002 651.524963 Tm
( INTEREST)Tj
ET
EMC
/P <</MCID 0>>BDC
BT
/F0 10 Tf
0.822222 0 0 0.545972 121.199997 651.284973 Tm
(TRANSFERS)Tj
ET
EMC
========
Acrobat selects the text like this (SelectAll):
Acrobat copies the text as follows:
THE PUBLIC IN
BEFORE IT IS FILED FOR
RECORD
PROPERTY AN IN REAL INTERESTTRANSFERS
Thanks
Jozef
Copy link to clipboard
Copied
Yes, this excerpt from the ISO standard seems to say this is OK. But we know it is not OK from an accessibility standpoint.
It is true that the page display and the order of objects in the Tags panel, Reading Order and Contents panels do not always match. But the Tags panel should match the display or actual "how you read the page" order.
If you consider reflow the Reading Order and Content panels should also match the "how you read it" order.
Your PDF does not not resemble the output from most applications capable of generating PDF.
How are you creating this PDF ?
Find more inspiration, events, and resources on the new Adobe Community
Explore Now