Escape characters in MIF files

Report · Apr 16, 2017

Our team is trying to implement import and export of the MIF files in our translation software, and we have several additional questions about using escape characters in the MIF files:

1. Should we set escape characters before non-ASCII characters in the MIF files made with the recent FrameMaker versions?

2. If yes, which notation should we use \x or \u?

3. Are \x and \u in the MIF files equal for the FrameMaker? I.e. \xA0 and \u00A0 are regarded as the same character, right?

4. Your MIF specification says that all FrameMaker characters with values above the standard ASCII range (greater than \x7f) are represented in a string by using \xnn notation, where nn represents the hexadecimal code for the character. The hexadecimal digits must be followed by a space. We have file samples which do not meet the requirement about following space but work well with the FrameMaker. Could you please let us know whether this requirement is mandatory?

We would appreciate any help.

Report · Apr 17, 2017

Think you may want to shoot this inquiry to Adobe Support - the forums are user-to-user.

Report · Apr 17, 2017

Hi Jeff,
It was the first thing I did, but I got no answer. The guys from the @AdobeCare twitter suggested me to post my questions here.

Report · Apr 17, 2017

Try tcssup@adobe.com<mailto:tcssup@adobe.com> – you might get a better response.

Report · Apr 17, 2017

I tried it too (this email was recommended by the chat support), it didn't work. No answer since April 12

Report · Apr 17, 2017

re: I.e. \xA0 and \u00A0 are regarded as the same character, right?

Probably not, and the example used for your conjecture shows why.

\xa0 is FM legacy hex code for dagger (†),
which is Unicode \u2020.

Unicode \u00a0 is non-breaking space,
whereas in legacy FM notation for that might be \x11
(and anything below \x20 is treated as Control, and may or may not map to a specific Unicode code point, or HTML named Entity on output).

Without having tested it extensively, my guess is that FM treats \x## notation as internal-only, and may* convert any that are displayable non-ASCII to Unicode on output.

There are a number of separate issues here, not all of which have obvious likely implementations.

When opening a pre-FM8 file in FM8 or later, most non-ASCII "Standard" characters must be converted to Unicode, because few (perhaps none) of them have hex codes/ANSI numbers that match their Unicode code points.

If a codepage (overlay, non-Unicode) font had been applied, changing the hex code might be destructive. This font overlay might be triggered by the Paragraph Format, Character Format or local Font override.

If I were writing scripts for FM these days, I have goal of favoring Unicode, and avoiding hex codes, where possible. And it might not always be possible, as for many in the \x00-\x20 range, for example \x15 (non-breaking hyphen), which has no Unicode equivalent (that has to be implemented with CSS). Heck, I'm not even sure that \x11 (non-breaking space) gets rendered as \u00a0 and/or entity {ampersand}nbsp;

* I say "may" because it remains necessary in the Unicode age to use codepage fonts in some cases, such as where no Unicode font exists for the typeface, or the entire character set has no Unicode codepoints (yet, maybe ever).

Report · Apr 18, 2017

Dear Bob Niland (Error 7103), many thanks for your reply and detailed explanation. It is very helpful!

Since

FM treats \x## notation as internal-only, and may* convert any that are displayable non-ASCII to Unicode on output.

do you think it is somehow possible to get any information about the conversation algorithm from the \x## notation to Unicode?

Report · Apr 18, 2017

re: do you think it is somehow possible to get any information about the conversation algorithm from the \x## notation to Unicode?

I can't help, because I'm still using a pre-Unicode version of FM, so can't test it. It's really Adobe's job, probably in the "Windows Character Sets" document they separately supply with each rev of FM.

What that document needs is two more columns, and a few (maybe numerous) more rows.

One column would be for how the control and legacy hex codes render to Ps/PDF. Some of the control codes are just for text placement, and are largely irrelevant once the page is rendered. Nothing at all might really need to be output for some of them.

Another column would be for how the hex codes render to HTML/XML paths. Here, placement is up to the viewing engine, so those hinted characters do need to be rendered.

Because many native Unicode characters have formatting implications (e.g. non-breaking), Adobe needs to indicate which of those FM fully honors. This might have to include a long list of combining (retreating diacriticals) that are not presently supported.

Adobe Community

Escape characters in MIF files