Skip to main content
Participating Frequently
April 14, 2008
Question

CF8 - XmlFormat not escaping High ASCII characters

  • April 14, 2008
  • 7 replies
  • 1250 views
In CF8, we have a problem where XmlFormat is not escaping High ASCII characters. This was working just fine on our CF7 instance, but in CF8, it is not escaping all characters. I am aware of the long-standing problem with escaping Windows-1252 characters, but now we are experiencing an issue with basic high ASCII characters, like chr(233) and chr(244). Is anyone else experiencing this issue? We have not installed Update 1 to CF8 yet. I don't see a fix for this in the release note, but any word on if this is fixed by the updater?

Here is a test to demonstrate the issue:

<cfset myString = "The Islamic Republic of Mauritania's (République Islamique de Mauritanie) 2007 estimated population is 3,270,000. Cote d'Ivoire and Côte d'Ivoire">

<cfset myNewString = XmlFormat(myString)>

<cfoutput>#myNewString#</cfoutput>

    This topic has been closed for replies.

    7 replies

    Participating Frequently
    April 21, 2008
    Using ToString() does set the encoding of the XML file to UTF-8, but there could be some difference in the Java 6 processing of the XML... not sure. We installed CF8 Update 1 and the update to the JVM, but same issue still persists. We can work around it, but it is annoying at best.
    BKBK
    Community Expert
    Community Expert
    April 20, 2008
    Milpool2000 wrote:
    Adding the processingdirective does help show that these characters are being escaped, however, the behavior has changed somewhat between CF7 and CF8, as we were not using a processingdirective in CF7, and this was working as advertised.

    I see your point. Both MX7 and CF8 are supposed to use UTF-8 encoding by default to return text.

    We are using ToString() after creating the XML document with CFXML, and our process is the same as we were using in CF7.

    Can't say. ToString() has something to do with encoding, encoding has something to do with Java version and MX7 uses Java 1.4 whereas Coldfusion 8 uses Java 6. Could there be something there?

    Participating Frequently
    April 20, 2008
    BKBK,

    Thanks for the info. Adding the processingdirective does help show that these characters are being escaped, however, the behavior has changed somewhat between CF7 and CF8, as we were not using a processingdirective in CF7, and this was working as advertised.

    Where this is giving us a problem is after we create an XML document using CFXML, (ensuring that we XmlFormat any strings), we then validate that document against a schema, and we are all of a sudden getting errors during validation for invalid characters within the XML. We are using ToString() after creating the XML document with CFXML, and our process is the same as we were using in CF7. That is why I was curious if anyone else was having this same issue... because something definitely changed between CF7 and CF8 with XML processing.
    Inspiring
    April 20, 2008
    BKBK
    Community Expert
    Community Expert
    April 20, 2008
    Mike touched on the point. It has to do with a requirement for high ASCII characters in the range 128-255. In fact, the function xmlFormat does escape those characters in MX7 as well as in CF8, but there is a catch.

    The documentation gives a clue when it says the character is "replaced by unicode escape sequence". You should therefore ensure that the page encoding is unicode. One way to do so is, for example

    <cfprocessingdirective pageencoding="utf-8">

    <cfset myString = "The Islamic Republic of Mauritania's (République Islamique de Mauritanie) 2007 estimated population is 3,270,000. Also check Côte d'Ivoire">

    <cfset myNewString = XmlFormat(myString)>

    <cfoutput>#myNewString#</cfoutput>

    Participating Frequently
    April 19, 2008
    Michael,

    A bad choice of words on my part... I did not mean physically "remove" those characters, but in fact, escape them...

    Since moving to CF8, we are finding that XmlFormat is not "escaping" all characters in the High ASCII range of 128-255.

    Here is an example... if you run this in CF8, and if this is actually a bug, you should see that the é and the ô are not being escaped. I am just trying to find out if others are experiencing the same problem, or if in fact, this is a new bug..

    <cfset myString = "The Islamic Republic of Mauritania's (République Islamique de Mauritanie) 2007 estimated population is 3,270,000. Also check Côte d'Ivoire">

    <cfset myNewString = XmlFormat(myString)>

    <cfoutput>#myNewString#</cfoutput>
    Inspiring
    April 18, 2008
    Honestly not trying to be funny, but this is what XMLFormat is supposed to do. These get changed into their "escaped" characters:

    <
    >
    '
    "
    &

    And most painfully (the docs say - this problem kicked my butt before): "High ASCII characters in the range of 128-255".

    Hope this helps. Why are you using XMLFormat, might I ask? Maybe if you can post what you are trying to do, someone can give you hand.

    - Mike
    Participating Frequently
    April 18, 2008
    Anyone else having this problem?