Skip to main content
Inspiring
May 15, 2024
Answered

Page Navigation markers for EPUB interrupt hyphenated words

  • May 15, 2024
  • 5 replies
  • 1868 views

When using the "Page Navigation" export option for reflowable EPUB exports, InDesign adds a <span> with a blank space character in the HTML export instead of the soft hyphen that was present in the print layout:

 

<p class="bodytext">Idel illandia suntusdae quoditint debitatissit quam quamus magnimus es quiaeper elor eper<span id="page2" role="doc-pagebreak" aria-label="2" epub:type="pagebreak"> </span>nat quo con re nimus?</p>

 

The W3C recommendation for page break markers is a self-closing <span> tag, which would look like this for the above example:

<span id="page2" role="doc-pagebreak" aria-label="2" epub:type="pagebreak"/>

 

<p class="bodytext">Idel illandia suntusdae quoditint debitatissit quam quamus magnimus es quiaeper elor eper<span id="page2" role="doc-pagebreak" aria-label="2" epub:type="pagebreak"/>nat quo con re nimus?</p>

 

The current behaviour creates errors on EPUB export which are not present in the indd file - please change this to conform to the W3C recommendation.

This topic has been closed for replies.
Correct answer James Gifford—NitroPress

Thank you for your replies, and for pointing me to the bug/feature forum - I'll make sure to post my observations there.

 

You are correct, the faults are caused by words hyphenated across source layout pages. Your suggestion to avoid hyphens at the end of a page would also be my preferred workaround for this problem as of now, although I haven't had time to thoroughly test turning of hyphenation for EPUB export with some of our longer books (800+ pages).

 

As a side note, word length and word composition is somewhat different in German, e.g. the two word Englisch "government building" would be the single word "Regierungsgebäude", or "tribal disputes" would be "Stammesstreitigkeiten". So even our fanciest imprints cannot always avoid hyphenation at page boundaries while keeping with other typographic customs, like avoiding widows and orphans etc. 

The above is especially true if we are considering translations, which place additional restrictions on our ability to alter the text to better fit the pages.

 

So all in all, hyphens across page boundaries are quite common here, and thus make our books particularly susceptible to this problem.


I have some familiarity with German; I am glad I rarely have to deal with its extremely long word structure. 🙂

 

But I think this is important: the problem is the specific collision of the page-nav marker and words hyphenated across a page break. If nothing else, a workaround could be crafted. (A script or other simple approach to un-hyphenating those words, despite some slight page flow changes, would probably be good enough for accessibility in the export.)

 

Of course, the real solution is for the insertion algorithm to spot such page-hyphenations and handle them gracefully, or even a simpler step to avoid breaking any hyphenated word. That doesn't seem difficult, but of course all of these features have to remain in lockstep with the evolving standards.

5 replies

Inspiring
June 17, 2024

This is a big as a result of the work of the EPUB working group. The engineers are aware of it and are working on a fix, as I understand it. There should be no space and the <span> should be self-closing. 

James Gifford—NitroPress
Legend
June 17, 2024

Yes, which is why my position has evolved to "users need to accommodate this change in their processes and workflow" over any notion that it's something to be fixed or needing a change (other than tweaks) in ID.

 

Since these files validate correctly under EPUBcheck, I'm not at all sure what the problems are — in this thread and related ones, I haven't seen a description of the actual problem, fault or roadblock the change has caused. It would seem that changes in the workflow somewhere between changing the document structure, staying with standards-compliant readers or... maybe avoiding certain practices or post-export modifications might end the issue.

 

I note that all of the posts and complaints are from non-English-based users; I wonder if there is a compounding problem with other-language or other-model (RTL) editions of ID.

Inspiring
June 6, 2024
Inspiring
May 16, 2024

I have discovered the same issue and I would also say that this is a bug in the generated code.

 

 

https://files.mastodon.social/media_attachments/files/112/345/333/359/158/515/original/1840a96d5408b89e.mp4 

https://mastodon.social/@rolanddreger/112345356612982742

 

Roland

James Gifford—NitroPress
Legend
May 22, 2024

Let me just add on here that besides that one new (and all but undocumented) checkbox in the Export | General menu, there is a vast new menu of Accessibility settings in the Export | Metadata menu (an entire new tab) — and among them is page navigation markers.

 

I am not very skilled with accessibility (although I try to avoid things I know cause hurdles for voice readers etc.), but it seems that those who are should be aware of this huge extension to InDesign's EPUB export system, and I'd bet a wooden penny that a solution, or at least a workaround/pointer/clue to the above problems lies within it.

James Gifford—NitroPress
Legend
June 6, 2024

Unfortunately, all this checkbox does is - correctly - add a metadata element declaring that the EPUB does contain page based navigation:

<meta property="schema:accessibilityFeature">pageNavigation</meta>

Unfortunately, this does not seem to be checked automatically when the checkbox for generating the page markers in the Genereal Menu is checked - another possible improvement I might suggest to Adobe 🙂


I don't know if I quite follow.

 

Are you saying that this box, when checked in the EPUB Export | Metadata | Accessibility pane, only adds the meta statement?

 

And does not check the Page Navigation box in the EPUB Export | General pane?

 

You seem to be correct on the latter; the boxes on the two pages are not sync'ed. But also note that under Accessibility, there are two checkboxes — one for "Page Navigation" and one for "Page Break Markers."

 

The only takeaway here is that the accessibility features are incompletely developed, which is already (generally) known.

James Gifford—NitroPress
Legend
May 15, 2024

The real problem is that hardly any two readers process EPUB the same way. Variations in actual code implementation have run a distant second in end-user faults and compliance issues, IME.

 

Adobe has worked mostly with the accessibility crowd in recent releases. (Which is causing its own level of chaos.) I wouldn't count on any changes just to conform to small points in the overall standard.

Inspiring
May 16, 2024

@James Gifford—NitroPress  schrieb:

The real problem is that hardly any two readers process EPUB the same way. Variations in actual code implementation have run a distant second in end-user faults and compliance issues, IME.

 

Adobe has worked mostly with the accessibility crowd in recent releases. (Which is causing its own level of chaos.) I wouldn't count on any changes just to conform to small points in the overall standard.


 

I am pretty confident that this is not an issue of readers / apps processing an EPUB in different ways.

 

I tested my example file on a PC with Kindle Previewer 3.81, Adobe Digital Editions 4.5, Thorium 2.2.0, as well as on a Kindle Paperwhite via Send to Kindle, a tolino (popular brand of readers in Germany, both with Adobe RMSDK rendering engine and new Readium-based rendering engine), an iPad with Apple Books and a Pocketbook InkPad Color. That just about covers all relevant applications and reader brands in our market, and every single one displayed this in the same way, with a superfluous space showing up where a hyphen originally appeared at the page break in InDesign.

 

Perhaps a better example using English text (from the "Georgia" file in the EPUB3 samples project) is in order, too:

 

In my new example file "georgia", the words "consists" and "headward" are both hyphenated by InDesign as "con-sists" and "head-ward" at the page breaks, but exported to EPUB as

con<span id="page2" role="doc-pagebreak" aria-label="2" epub:type="pagebreak"> </span>sists

and

head<span id="page3" role="doc-pagebreak" aria-label="3" epub:type="pagebreak"> </span>ward</span>

thus displaying as "con sists" and "head ward".

 

So, in my opinion, this is clearly a fault in the page break markers' HTML markup, which ironically results in an error for readers who are not visually impaired.

 

Whether the page break markup is a self-closing <span> tag like

<span id="page2" role="doc-pagebreak" aria-label="2" epub:type="pagebreak"/>

or the start/end tags variant

<span id="page2" role="doc-pagebreak" aria-label="2" epub:type="pagebreak"></span>

SHOULD be irrelevant, and most likely is despite different applications and readers rendering things differently, so I'd be happy with either implementation.

Adding an unintended space character between the start and end <span> tags on EPUB export, however, is just plain wrong.

Inspiring
May 16, 2024

And, in case you were wondering, document language or paragraph/character style language do not affect this.