• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

[Unicode] Character Variables in Framemaker

Community Expert ,
Aug 22, 2021 Aug 22, 2021

Copy link to clipboard

Copied

[Unicode] Character Variables in Framemaker

About

If your document, project or organization has a modest number of page text elements that can be considered single characters, adequately identified by their Unicode names, and needing focused treatment in your Framemaker documents, this posting may be helpful. The scheme is likely portable to other DTP/WP apps, such as InDesign, but the Definition syntax is apt to vary.

Simple easy-to-remember scheme

Each candidate character is a single Framemaker Variable.

Name: U+FFFF FORMAL UNICODE NAME
Name: U+1FFFF FORMAL UNICODE NAME

Definition: <ChFmtAsNeeded>\uFFFF
  or in semantic & overlay cases
Definition: <ChFmtAsNeeded>\xFF

Examples by way of illustrating why…

…followed by more narrative discussion after list.


Name: U+1F16D CIRCLED CC
Definition: <CCicons>\x63

Issue: FM has no SMP support yet (FRMAKER-10976)
(and very few Unicode fonts populate 1F16D yet in any case)
This addresses the SMP (Supplementary Multi-lingual Plan) problem. FM does not yet support any codepoints higher than U+FFFF, including all but one of the icons you might need for a Creative Commons work. In this case, the work-around relies on an overlay font, invoked by a Character Format, that populates alternative ASCII codepoints, in this case "c", and after entry into the Variable dialog, c is what you'll see there.


Name: U+2011 NON-BREAKING HYPHEN
Definition: \x15

Issue: convenience (and absent Unicode semantics)
This avoids having to remember the keystroke sequence ␛-h, and makes the object easier to distinguish from a normal "-". FM can render an actual U+2011, but doesn't honor it for text layout, so this also forestalls trying to use one by accident. I suppose it might at some future time get honored. I have not tested how this renders to HTML (which has no &entity; for non-breaking hyphen). The \x15 will collapse to \+ after entry. Non-breaking space is in a similar situation.


Name: U+2013 EN DASH
Definition: \u2013

Issue: character stewardship
This is a completely generic instance of why I do this. It avoids having to look up the Unicode everytime I need one, and makes it easier to figure out what instances are in documents. The Definition collapses to "–" upon entry.


Name: U+2021 DOUBLE DAGGER
Definition: \u2021

Issue: FM specials collision:
Why isn't the Definition "\x0e"? Because Unicode defines that codepoint as
U+00E0 LATIN SMALL LETTER A WITH GRAVE
FM presently must be doing some magic to render a "‡". Will this always be the case?


Name: U+211E PRESCRIPTION TAKE
Definition: <Monospace>\u211e

Issue: Body font population
Issue: FM doesn't do fallback
The body font in use for the project doesn't populate ℞ (U+211E) for either the serif or the sans, but does for the "Code" instance, for which a Character Format named Monospace might be more portable.

Now, some organizations might actually prefer to render this in the incorrect but common form: Rx
If so, then use:
Name: U+211E PRESCRIPTION TAKE
Definition: <Serif>R<Subscript>x

This latter use is also an example of where you want a string to be treated as a single character, and that conjoined character has a Unicode identity. A similar case might be °F, due to your main font not yet populating the U+2109 pre-conjoined form. A further case is where you need to use a Unicode combining character, because your font doesn't populate for the precomposed codepoint.


Name: U+0020 SPACE
Definition: \x20

Issue: at-risk characters:
Use Variables to prevent FM from destructively changing characters you want left alone, such as deliberate duplicate space collapse, too-smart quotes, etc.


Other candidate characters might include, but not be limited to:

  1. PUA: Private Use Area characters,
    such as entity or product logos/icons, and where there exists a consistent naming convention for the codepoints.
  2. Special set-off:
    It's a normal character, but always needs some formatting applied in the project(s). Hand-applied Character Formats can be fragile; overrides even moreso.
  3. Time travel:
    There are shops still on FM7.2 and earlier, which had no Unicode support. This technique gets them ready for an Import»Formats»Variables quick upgrade. The FM7 imports are going to be heavily overlay fonts. The FM8+ imports can be mostly \u. This is in fact how I developed it, and was able to upgrade an FM7 project to FM2019 with an Import.
  4. Never translated?
    You may not want to use this for routine foreign characters, unless they are always used in the same way in all editions.

General Discussion

I personally use this technique for any and all non-keyboard characters, but only for documents that are not planned for translation (or are, but the affected characters are never translated).

The U+1234 notation is the conventional manner of expressing Unicode code points, and keeps these variables all together. The leading “U” causes these entries to {mostly} sink to the bottom of the Variable Catalog, and {mostly} self-sort by code point.

The “+FFFF” or “+1FFFF” part is going to be either 4 (BMP) or 5 (SMP) hexadecimal digits. Use leading zeros for low-order BMP code points for consistency and self-sort. The +1FFFF SMPs, alas, will sort into the U+1000…U+1FFFF BMP range.

The “FORMAL UNICODE NAME” part is the assigned name string at unicode.org (their Code Charts page). It's often quicker to find the U+ and name strings at third party Unicode search sites, such as fileformat.info. Use these all-caps strings literally, so that all document stewards are on the same page on this.

I recommend creating/recording all such Variables in a separate document (which can then be used for Import). I use the template.fm file that I create for each project. These vars are all in a table, that includes the Names and Definitions as plain text, providing a legible record of what they are supposed to be, plus perhaps comments on use intent. Having the Defs as plain text alas, is necessary to deal with FM's annoying habit of collapsing \x## notations to various things, and converting \u notations to the actual Unicode glyph (even when it's ambiguous, illegible or completely invisible).

Views

373

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 23, 2021 Aug 23, 2021

Copy link to clipboard

Copied

LATEST

Thanks Bob for this 'tutorial'!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines