Skip to main content
sdettling222
Known Participant
January 16, 2018
Question

XMLParse - Need to remove a Hex Character

  • January 16, 2018
  • 3 replies
  • 3664 views

Hello all,

I am running into a problem where I am parsing XML and running into an error:

An error occured while Parsing an XML document.An invalid XML character (Unicode: 0x17) was found in the element content of the document.

Therefore my approach is that I obviously have to remove some kind of odd character but I have no idea what it is. I would like to take the XML and perform something like the following:

<cfset XML = Replace(XML, "&","","All")>

Where "&" is the special character (0x17). Can anyone assist with what this character is or has had this problem? I wish I could post the XML but I cannot get at it because the page is breaking.

Any help would be greatly appreciated!!!!

Thanks

This topic has been closed for replies.

3 replies

EddieLotter
Inspiring
January 17, 2018

Please show the code you are using to get the XML data.

It will help remove the guesswork being used to try to help you.

Cheers

Eddie

sdettling222
Known Participant
January 17, 2018

Eddie,

Thank you for helping!!!

This is basically what we're doing:

<cfset XML = #getHTTPRequestData().content#>

<cfset xmlDoc = XmlParse(XML)>

It's breaking when XmlParse is called. I am trying to get the XML that is being posted, but it proving to be a challenge.

Would it be beneficial to perform the following:

<cfset XML = Replace(XML, "&#23;","","All")>

<cfset XML = Replace(XML, "&#x17;","","All")>

Thank you so much for your assistance!!!!

Community Expert
January 17, 2018

If those are the only things that need to be replaced, it would be beneficial to replace them, as they're not allowed XML metacharacters and XML doesn't have HTML character entities. But (a) they might not be the only things that need to be replaced, and (b) they're presumably there for some reason. So be careful!

Dave Watts, CTO, Fig Leaf Software

Dave Watts, Eidolon LLC
WolfShade
Legend
January 16, 2018

OR.. another thought.  If the XML is in a string format before it becomes an XML object, and IF you are using CF v10 or later, you can run the string through canonicalize(), set both flags to false, and then parse the XML.  That should get rid of any and all hex characters.

HTH,

^ _ ^

sdettling222
Known Participant
January 16, 2018

Thanks HTH for the assistance!!!

So you think something like this would be the best approach:

<cfset XML = #canonicalize(getHTTPRequestData().content)#>

Thanks!!!!

WolfShade
Legend
January 16, 2018

  HTH = Hope This Helps.  My pseudonym is WolfShade.

And, yes, if the XML is in string format, you can shove it through canonicalize() to remove all hex encoding (and other encoding) before parsing it or turning it into an XML object.  Precisely as you have coded it (minus the hashtags #, as those are not necessary unless used within a string or as display.)

V/r,

^ _ ^

WolfShade
Legend
January 16, 2018

CFDUMP the XML before parsing it.  You can display it, or email it to yourself, and look for the 0x17 (which is decimal "23").  Then you can decide how best to fix it.

V/r,

^ _ ^