Copy link to clipboard
Copied
Hi,
There is a change from FM11 to FM2015 with XML import regarding carriage returns. I was wondering if someone could explain it. I'm not sure if it is related to the other FM2015 whitespace discussions to date.
Here is my XML file with zero whitespace (but carriage returns), opened with the default app and no validation:
<?xml version="1.0" encoding="UTF-8"?><test>
<p>TEST</p>
<p>TEST</p><p>TEST</p>
</test>
In FM11, it opens as I would expect:
In FM2015, I get a paragraph with a single whitespace where the carriage return is:
This doesn't happen when I use a structure app with normal validation. Can anyone explain what has changed?
Thanks,
Russ
Russ,
I opened your test file in FM 2015 with the default setting (On) for RemoveExtraWhiteSpacesOnXMLImport and with it set to Off. The behavior I got was slightly different than you reported, but I believe it is correct.
When the option is On, the document window shows 3 pgfs; the Structure View shows a <test> with 3 <p>s.
When the option is Off, the document window shows not the 4 pgfs you reported, but 6. Three of these contain a <p> element with the text "TEST"; the others are text rang
...Copy link to clipboard
Copied
Russ,
I opened your test file in FM 2015 with the default setting (On) for RemoveExtraWhiteSpacesOnXMLImport and with it set to Off. The behavior I got was slightly different than you reported, but I believe it is correct.
When the option is On, the document window shows 3 pgfs; the Structure View shows a <test> with 3 <p>s.
When the option is Off, the document window shows not the 4 pgfs you reported, but 6. Three of these contain a <p> element with the text "TEST"; the others are text ranges within the <test> element, each containing a single space. All of these text ranges correspond to line breaks in the input document: after the <test> start-tag, after the end-tag for the first <p>, and after the end-tag for the last <p>.
The XML recommendation mandates that all white space is significant. Therefore, FM is correct not to discard it. Converting the line breaks to spaces is consistent with FM's treatment of line breaks within a paragraph.
When you open the same file using a DTD that does not permit text ranges between <p> elements, FM does not create the text ranges.
While surprising at first, I believe the behavior is correct. I therefore didn't bother testing in FM 11 or FM 12.
The catch is that if you want white space after xrefs treated correctly, you need to turn the option off and if you want to avoid line breaks coming in as data characters, you need to turn the option on. Solutions are:
1) Use a DTD
2) Preprocess the input to remove line breaks that format the XML
3) Preprocess the input to change a space after an xref to a character reference
4) Hope Adobe fixes the bug soon
--Lynne
Copy link to clipboard
Copied
Hi Lynne,
Thanks for the detailed reply. Everything you say makes sense and I didn't really think about how FM would use DTD rules to interpret linebreaks. That's very interesting.
The reason for this issue is because I have a routine that cobbles together XML files that are composites from other XML instances authored in FrameMaker. These are schema-controlled, but when I put the content together, I get problems like duplicate IDs and general invalidity due to some amateur XSLT. So, I just remove the schema reference, which seemed to work as I expected in FM11. But clearly something is different now. The management of linebreaks has changed, for whatever reason.
As the keeper of the code that drives all this stuff, it was a simple matter to remove all line breaks before writing the composite XML file. That appears to have solved the problem. The XML files still have no schema or DTD declaration, but FM chooses the correct structure app for them and they look the way I want. I guess at some point the EDD takes over and completes the expected rendering. I wonder if that was the change... the consideration of EDD rules.
Russ
Copy link to clipboard
Copied
Russ,
The RemoveExtraWhiteSpacesOnXMLImport option is described in the FM 11 INI Reference--I don't know if the bug with white space after xrefs exists in FM 11. I looked briefly at the FM 11, 12, and 2015 documentation and didn't see anything that suggests a change In handling of white space in XML. I don't have time to do any testing now.
In any case, when FM opens an XML document, it does apply the format rules from the EDD. (For performance reasons, it actually turns formatting off until the entire document has been imported and then formats the entire document.)
Are you combining documents with XSLT or something else? You can avoid ID conflicts be appending a prefix or suffix to the original IDs. If you change IDREFs as well, xrefs should be preserved. For example, I've been working on a project in which end users may very well create a new book component from a copy of an existing one; likely resulting in duplicate IDs. I can use the root element of the book component or the ?FM Document PIs to locate the start of a component and then append a suffix such as a component counter to each ID and IDREF. That way, I ensure there are no duplicate IDs, but I still preserve xrefs.
--Lynne
Copy link to clipboard
Copied
Lynne, all good advice. The process is complicated with a lot of backend automation that is moving stuff around. I think it could be better engineered to avoid the problem I saw in the first place, but it was all working until FM2015 so I didn't think about it. Maybe I'll think about it some more because I think it the process would be more robust if all XML was valid against the same schema. Clearly things become more fragile when they are not as clearly defined.
Russ