Copy link to clipboard
Copied
I am haing an issue with accent grave.
If I replace à, with à it inserts in the sql table properly, but the xml will show à So if I apply a style sheet to the xml, the resulting html will actually read à instead of à.
Now if I don't replace à with anything, the html will be fine but the xml will show a �.
Is there a way to force the xml NOT to escape the & when it is part of another escape (rather than just & on its own)?
-Robert
Message was edited by: robs67
Testing some more revealed that the issue seems to be when the offending charcter is grabbed from a sql table and put into an xml doc. If I do this: <CFXML variable="MyXML" caseSensitive="yes"> <TheXml> <testNode1> <cfoutput>#GET_oD.remarks#</cfoutput> </testNode1> </TheXml> </CFXML> then the xml will have a white question mark on a black diamond.
If I do this:
<CFXML variable="MyXML" caseSensitive="yes"> <TheXml> <testNode1> <cfoutput>à</cfoutput> </testNode1> </TheXml> </CFXML> then the xml will be just fine.
I have have "String Format: Enable High ASCII characters and Unicode for data sources configured for non-Latin characters" enabled in the CF Administrator and the data type of the column in the ms sql table is nvarchar.
Any help would be appreciated as this is driving me nuts.
Copy link to clipboard
Copied
I general, I would recommend persisting the data to the database in it's original format whenever possible.
When you output the value, you can use xmlFormat() function in CF and this might do the trick for you.
#xmlformat(GET_oD.remarks)#
Byron Mann
mannb@hostmysite.com
byronosity@gmail.com
Software Architect
hosting.com | hostmysite.com
http://www.hostmysite.com/?utm_source=bb
Copy link to clipboard
Copied
You need to decide when you are going to do your encoding, and thereafter do it consistently.
Typically what I have done is:
Copy link to clipboard
Copied
Thank you for the responses. Unfortunately, when I use htmlEditFormat() or xmlFormat(), and transform the xml into html via xsl, the ampersands are escaped again. So, for example, a " will be "
I don't know xsl at all and the xsl was written by someone who has since died. Perhaps that's my issue. Could it be the xsl is escaping the ampersands when they shouldn't be?
-Robert
Copy link to clipboard
Copied
What happens when you use UTF-8 encoding throughout, including at your database, and not replace any character entities?
Copy link to clipboard
Copied
BKBK:
When I do that, the databse and html are correctly displaying the character but the xml has the black/white diamond question mark.
Copy link to clipboard
Copied
I should have also pointed out that when I do use xmlformat(), the charcater is then represented in the xml document in its hex format (and incorrectly in the html where the & is escaped as I said in my orginal post). I don't know if this matters though.
Copy link to clipboard
Copied
When I do that, the databse and html are correctly displaying the character but the xml has the black/white diamond question mark.
The processing instructions of the XML should also contain encoding="UTF-8".
Copy link to clipboard
Copied
The black and white diamond question mark is still in the xml document even with the header,
<?xml version="1.0" encoding="UTF-8"?>.
Some more info I noticed:
In Firefox, the xml document will display (with the black/white diamond),
but in IE, it won't display in the browser and generates the error,
"An invalid character was found in text content. Error processing resource" but yet when viewing the source in IE, the character is properly displayed.
Copy link to clipboard
Copied
I changed the encoding to ISO-8859-1 and it worked. The xml displays the character properly in both IE and Firefox.
Are there any ramifications to leving the encoding ISO-8859-1?
Copy link to clipboard
Copied
The black and white diamond question mark is still in the xml document even with the header, <?xml version="1.0" encoding="UTF-8"?>.
Some more info I noticed:
In Firefox, the xml document will display (with the black/white diamond), but in IE, it won't display in the browser and generates the error,
"An invalid character was found in text content. Error processing resource" but yet when viewing the source in IE, the character is properly displayed.
I changed the encoding to ISO-8859-1 and it worked. The xml displays the character properly in both IE and Firefox.
Are there any ramifications to leving the encoding ISO-8859-1?
I would maintain the header, <?xml version="1.0" encoding="UTF-8"?>. I suspect this now boils down to a display issue. The final link in the chain might be the encoding in the view-menu of your browser. Change it to Unicode.
Copy link to clipboard
Copied
There's some good material out there on this:
http://www.cs.tut.fi/~jkorpela/chars.html -- A Tutorial on Character-Code Issues
http://www.cs.tut.fi/~jkorpela/html/chars.html -- Using National and Special Characters in HTML
Although these documents are not fully up-to-date with regards to current implementations, they do give a readable explanation of what are the issues involved. The first document (Tutorial...) is particularly informative because it presents a list of "several things that you might see" and "what might have actually happened."
In my very-limited experience, I've observed that it really depends ... not only on the character-encoding ("UTF-8" is normally adequate since it can represent both ASCII and UniCode) ... but also on the font that has been selected and in some cases the optional configuration parameters of the user's own browser. And sometimes you've got to get down-and-dirty and look at the actual byte-sequence that's coming across. Apparently "there's more than one way to do it."
Get ready! An upgraded Adobe Community experience is coming in January.
Learn more