Copy link to clipboard
Copied
If your document, project or organization has a modest number of page text elements that can be considered single characters, adequately identified by their Unicode names, and needing focused treatment in your Framemaker documents, this posting may be helpful. The scheme is likely portable to other DTP/WP apps, such as InDesign, but the Definition syntax is apt to vary.
Each candidate character is a single Framemaker Variable.
Name: U+FFFF FORMAL UNICODE NAME
Name: U+1FFFF FORMAL UNICODE NAME
Definition: <ChFmtAsNeeded>\uFFFF
or in semantic & overlay cases
Definition: <ChFmtAsNeeded>\xFF
…followed by more narrative discussion after list.
Name: U+1F16D CIRCLED CC
Definition: <CCicons>\x63
Issue: FM has no SMP support yet (FRMAKER-10976)
(and very few Unicode fonts populate 1F16D yet in any case)
This addresses the SMP (Supplementary Multi-lingual Plan) problem. FM does not yet support any codepoints higher than U+FFFF, including all but one of the icons you might need for a Creative Commons work. In this case, the work-around relies on an overlay font, invoked by a Character Format, that populates alternative ASCII codepoints, in this case "c", and after entry into the Variable dialog, c is what you'll see there.
Name: U+2011 NON-BREAKING HYPHEN
Definition: \x15
Issue: convenience (and absent Unicode semantics)
This avoids having to remember the keystroke sequence ␛-h, and makes the object easier to distinguish from a normal "-". FM can render an actual U+2011, but doesn't honor it for text layout, so this also forestalls trying to use one by accident. I suppose it might at some future time get honored. I have not tested how this renders to HTML (which has no &entity; for non-breaking hyphen). The \x15 will collapse to \+ after entry. Non-breaking space is in a similar situation.
Name: U+2013 EN DASH
Definition: \u2013
Issue: character stewardship
This is a completely generic instance of why I do this. It avoids having to look up the Unicode everytime I need one, and makes it easier to figure out what instances are in documents. The Definition collapses to "–" upon entry.
Name: U+2021 DOUBLE DAGGER
Definition: \u2021
Issue: FM specials collision:
Why isn't the Definition "\x0e"? Because Unicode defines that codepoint as
U+00E0 LATIN SMALL LETTER A WITH GRAVE
FM presently must be doing some magic to render a "‡". Will this always be the case?
Name: U+211E PRESCRIPTION TAKE
Definition: <Monospace>\u211e
Issue: Body font population
Issue: FM doesn't do fallback
The body font in use for the project doesn't populate ℞ (U+211E) for either the serif or the sans, but does for the "Code" instance, for which a Character Format named Monospace might be more portable.
Now, some organizations might actually prefer to render this in the incorrect but common form: Rx
If so, then use:
Name: U+211E PRESCRIPTION TAKE
Definition: <Serif>R<Subscript>x
This latter use is also an example of where you want a string to be treated as a single character, and that conjoined character has a Unicode identity. A similar case might be °F, due to your main font not yet populating the U+2109 pre-conjoined form. A further case is where you need to use a Unicode combining character, because your font doesn't populate for the precomposed codepoint.
Name: U+0020 SPACE
Definition: \x20
Issue: at-risk characters:
Use Variables to prevent FM from destructively changing characters you want left alone, such as deliberate duplicate space collapse, too-smart quotes, etc.
Other candidate characters might include, but not be limited to:
I personally use this technique for any and all non-keyboard characters, but only for documents that are not planned for translation (or are, but the affected characters are never translated).
The U+1234 notation is the conventional manner of expressing Unicode code points, and keeps these variables all together. The leading “U” causes these entries to {mostly} sink to the bottom of the Variable Catalog, and {mostly} self-sort by code point.
The “+FFFF” or “+1FFFF” part is going to be either 4 (BMP) or 5 (SMP) hexadecimal digits. Use leading zeros for low-order BMP code points for consistency and self-sort. The +1FFFF SMPs, alas, will sort into the U+1000…U+1FFFF BMP range.
The “FORMAL UNICODE NAME
” part is the assigned name string at unicode.org (their Code Charts page). It's often quicker to find the U+ and name strings at third party Unicode search sites, such as fileformat.info. Use these all-caps strings literally, so that all document stewards are on the same page on this.
I recommend creating/recording all such Variables in a separate document (which can then be used for Import). I use the template.fm file that I create for each project. These vars are all in a table, that includes the Names and Definitions as plain text, providing a legible record of what they are supposed to be, plus perhaps comments on use intent. Having the Defs as plain text alas, is necessary to deal with FM's annoying habit of collapsing \x## notations to various things, and converting \u notations to the actual Unicode glyph (even when it's ambiguous, illegible or completely invisible).
Copy link to clipboard
Copied
Thanks Bob for this 'tutorial'!