Skip to main content
August 10, 2009
Question

à and XML

  • August 10, 2009
  • 4 replies
  • 3559 views

I am haing an issue with accent grave.

If I replace à, with à it inserts in the sql table properly, but the xml will show à So if I apply a style sheet to the xml, the resulting html will actually read à instead of à.

Now if I don't replace à with anything, the html will be fine but the xml will show a �.

Is there a way to force the xml NOT to escape the & when it is part of another escape (rather than just & on its own)?

-Robert

Message was edited by: robs67

Testing some more revealed that the issue seems to be when the offending charcter is grabbed from a sql table and put into an xml doc.  If I do this:  <CFXML variable="MyXML" caseSensitive="yes">  <TheXml>       <testNode1>             <cfoutput>#GET_oD.remarks#</cfoutput>       </testNode1>  </TheXml>  </CFXML>    then the xml will have a white question mark on a black diamond.

If I do this:

<CFXML variable="MyXML" caseSensitive="yes">  <TheXml>       <testNode1>             <cfoutput>à</cfoutput>       </testNode1>  </TheXml>  </CFXML>    then the xml will be just fine.

I have have "String Format: Enable High ASCII characters and Unicode for data sources configured for non-Latin characters" enabled in the CF Administrator and the data type of the column in the ms sql table is nvarchar.

Any help would be appreciated as this is driving me nuts.

    This topic has been closed for replies.

    4 replies

    Inspiring
    August 19, 2009

    There's some good material out there on this:

    http://www.cs.tut.fi/~jkorpela/chars.html  -- A Tutorial on Character-Code Issues

    http://www.cs.tut.fi/~jkorpela/html/chars.html  -- Using National and Special Characters in HTML


    Although these documents are not fully up-to-date with regards to current implementations, they do give a readable explanation of what are the issues involved.  The first document (Tutorial...) is particularly informative because it presents a list of "several things that you might see" and "what might have actually happened."

    In my very-limited experience, I've observed that it really depends ... not only on the character-encoding ("UTF-8" is normally adequate since it can represent both ASCII and UniCode) ... but also on the font that has been selected and in some cases the optional configuration parameters of the user's own browser.  And sometimes you've got to get down-and-dirty and look at the actual byte-sequence that's coming across.  Apparently "there's more than one way to do it."

    BKBK
    Community Expert
    Community Expert
    August 16, 2009

    What happens when you use UTF-8 encoding throughout, including at your database, and not replace any character entities?

    August 17, 2009

    BKBK:

    When I do that, the databse and html are correctly displaying the character but the xml has the black/white diamond question mark.

    August 17, 2009

    I should have also pointed out that when I do use xmlformat(), the charcater is then represented in the xml document in its hex format (and incorrectly in the html where the & is escaped as I said in my orginal post). I don't know if this matters though.

    Inspiring
    August 13, 2009

    You need to decide when you are going to do your encoding, and thereafter do it consistently.

    Typically what I have done is:

    1. Filter out any unwanted material from the user's input that might, for example, be part of an attack-vector when the results are redisplayed.
    2. Leave special characters as they are.
    3. Use <cfqueryparam> to allow character strings to be safely inserted into the database no matter what they contain.
    4. Upon display, use HTMLEditFormat() or its equivalent to translate special-characters into their corresponding HTML escapes.
      • I have not been pleased with the various global tags that are available to perform this sort of escaping over large blocks of code.  Maybe I haven't used them enough...
    August 13, 2009

    Thank you for the responses.  Unfortunately, when I use htmlEditFormat() or xmlFormat(), and transform the xml into html via xsl, the ampersands are escaped again.  So, for example, a " will be &quot;

    I don't know xsl at all and the xsl was written by someone who has since died.  Perhaps that's my issue.  Could it be the xsl is escaping the ampersands when they shouldn't be?

    -Robert

    August 12, 2009

    I general, I would recommend  persisting the data to the database in it's original format whenever possible.

    When you output the value, you can use xmlFormat() function in CF and this might do the trick for you.

    #xmlformat(GET_oD.remarks)#

    Byron Mann

    mannb@hostmysite.com

    byronosity@gmail.com

    Software Architect

    hosting.com | hostmysite.com

    http://www.hostmysite.com/?utm_source=bb