Copy link to clipboard
Copied
Hello all,
I am running into a problem where I am parsing XML and running into an error:
An error occured while Parsing an XML document.An invalid XML character (Unicode: 0x17) was found in the element content of the document.
Therefore my approach is that I obviously have to remove some kind of odd character but I have no idea what it is. I would like to take the XML and perform something like the following:
<cfset XML = Replace(XML, "&","","All")>
Where "&" is the special character (0x17). Can anyone assist with what this character is or has had this problem? I wish I could post the XML but I cannot get at it because the page is breaking.
Any help would be greatly appreciated!!!!
Thanks
Copy link to clipboard
Copied
CFDUMP the XML before parsing it. You can display it, or email it to yourself, and look for the 0x17 (which is decimal "23"). Then you can decide how best to fix it.
V/r,
^ _ ^
Copy link to clipboard
Copied
OR.. another thought. If the XML is in a string format before it becomes an XML object, and IF you are using CF v10 or later, you can run the string through canonicalize(), set both flags to false, and then parse the XML. That should get rid of any and all hex characters.
HTH,
^ _ ^
Copy link to clipboard
Copied
Thanks HTH for the assistance!!!
So you think something like this would be the best approach:
<cfset XML = #canonicalize(getHTTPRequestData().content)#>
Thanks!!!!
Copy link to clipboard
Copied
HTH = Hope This Helps. My pseudonym is WolfShade.
And, yes, if the XML is in string format, you can shove it through canonicalize() to remove all hex encoding (and other encoding) before parsing it or turning it into an XML object. Precisely as you have coded it (minus the hashtags #, as those are not necessary unless used within a string or as display.)
V/r,
^ _ ^
Copy link to clipboard
Copied
Ha sorry about that. Thank you WolfShade for you help. Truly appreciate it.
Tried putting this in: <cfset XML = canonicalize(getHTTPRequestData().content, true, true)>
But it then gave me the following error:
An error occured while Parsing an XML document.The entity name must immediately follow the '&' in the entity reference.
Any ideas?
Thanks!!!!
Copy link to clipboard
Copied
According to this SO thread, all ampersands need to be replaced with either & or & (I did not know that.)
I don't work with XML, much.
So.. this confuses me. If your CF server is going to balk at 0x17 hex encoding, but it also balks at a plain ampersand, not sure what to do. You could try using REPLACE() after canonicalize(), but I'm not sure that would work. (shrug) Give it a shot.
<cfset XML = canonicalize(getHTTPRequestData().content,true,true) />
<cfset XML = replace(XML,'&','&','all') />
HTH,
^ _ ^
Copy link to clipboard
Copied
In XML, the ampersand is a metacharacter. It's used to introduce an XML entity. XML entities are pretty similar to HTML character entities, except there are only four of them. Read all about them here:
List of XML and HTML character entity references - Wikipedia
My guess is that what you're getting is actually not well-formed XML, so CF isn't going to be able to parse it unless you manually strip or replace the problematic parts. The character mentioned by the original poster is U+0017, which is an "end of transmission block" character:http://www.fileformat.info/info/unicode/char/17/index.htm
http://www.fileformat.info/info/unicode/char/17/index.htm
So, maybe this character is at the end of the file and can be removed prior to XML parsing? Maybe not. I think it would be useful to actually provide a sanitized version of the file in question here for people to look at.
Dave Watts, CTO, Fig Leaf Software
Copy link to clipboard
Copied
Please show the code you are using to get the XML data.
It will help remove the guesswork being used to try to help you.
Cheers
Eddie
Copy link to clipboard
Copied
Eddie,
Thank you for helping!!!
This is basically what we're doing:
<cfset XML = #getHTTPRequestData().content#>
<cfset xmlDoc = XmlParse(XML)>
It's breaking when XmlParse is called. I am trying to get the XML that is being posted, but it proving to be a challenge.
Would it be beneficial to perform the following:
<cfset XML = Replace(XML, "","","All")>
<cfset XML = Replace(XML, "","","All")>
Thank you so much for your assistance!!!!
Copy link to clipboard
Copied
If those are the only things that need to be replaced, it would be beneficial to replace them, as they're not allowed XML metacharacters and XML doesn't have HTML character entities. But (a) they might not be the only things that need to be replaced, and (b) they're presumably there for some reason. So be careful!
Dave Watts, CTO, Fig Leaf Software
Copy link to clipboard
Copied
sdettling222 wrote
It's breaking when XmlParse is called.
Write getHTTPRequestData().content to a file before you try to parse it as XML. Then open that file in a text editor and review it for XML correctness.
You can also run the file through an XML validator. There are plenty online.
Cheers
Eddie
Copy link to clipboard
Copied
Eddie/Dave/WolfShade,
Thank you all so much for your assistance. I was able to finally get the XML. Here it is:
<?xml version="1.0" ?>
<Event>
<ProductClass>O</ProductClass>
<Action>CON</Action>
<EventNumber>1</EventNumber>
<Significance>W</Significance>
<Phenomena>HZ</Phenomena>
<EventType>HZ</EventType>
<EventAction>W</EventAction>
<Sent>0001/01/01T0000Z</Sent>
<Expires>2018/01/18T1800Z</Expires>
<WFO>KJAN</WFO>
<LatLon></LatLon>
<CountyCodes>Morehouse, LA|West Carroll, LA|East Carroll, LA|Richland, LA|Madison, LA|Franklin, LA|Catahoula, LA|Tensas, LA|Concordia, LA</CountyCodes>
<FIPSCodes>LAZ007|LAZ008|LAZ009|LAZ015|LAZ016|LAZ023|LAZ024|LAZ025|LAZ026</FIPSCodes>
<Text>WWUS74 KJAN 181531
NPWJAN
URGENT - WEATHER MESSAGE
National Weather Service Jackson MS
931 AM CST Thu Jan 18 2018
ARZ074-075-LAZ007>009-015-016-023>026-MSZ018-019-025>066-072>074-
181800-
/O.CON.KJAN.HZ.W.0001.000000T0000Z-180118T1800Z/
Ashley-Chicot-Morehouse-West Carroll-East Carroll-Richland-
Madison LA-Franklin LA-Catahoula-Tensas-Concordia-Bolivar-
Sunflower-Leflore-Grenada-Carroll-Montgomery-Webster-Clay-Lowndes-
Choctaw-Oktibbeha-Washington-Humphreys-Holmes-Attala-Winston-
Noxubee-Issaquena-Sharkey-Yazoo-Madison MS-Leake-Neshoba-Kemper-
Warren-Hinds-Rankin-Scott-Newton-Lauderdale-Claiborne-Copiah-
Simpson-Smith-Jasper-Clarke-Jefferson-Adams-Franklin MS-Lincoln-
Lawrence-Jefferson Davis-Covington-Jones-Marion-Lamar-Forrest-
Including the cities of Crossett, North Crossett, Hamburg,
West Crossett, Dermott, Lake Village, Eudora, Bastrop, Oak Grove,
Epps, Lake Providence, Rayville, Delhi, Tallulah, Winnsboro,
Jonesville, Harrisonburg, Newellton, St. Joseph, Waterproof,
Vidalia, Ferriday, West Ferriday, Cleveland, Indianola,
Ruleville, Greenwood, Grenada, Vaiden, North Carrollton,
Carrollton, Winona, Eupora, Maben, Mathiston, West Point,
Columbus, Ackerman, Weir, Starkville, Greenville, Belzoni, Isola,
Durant, Tchula, Lexington, Pickens, Goodman, Kosciusko,
Louisville, Macon, Brooksville, Mayersville, Rolling Fork,
Anguilla, Yazoo City, Ridgeland, Madison, Canton, Carthage,
Philadelphia, Pearl River, De Kalb, Scooba, Vicksburg, Jackson,
Pearl, Brandon, Richland, Forest, Morton, Newton, Union, Decatur,
Conehatta, Meridian, Port Gibson, Crystal Springs, Hazlehurst,
Wesson, Magee, Mendenhall, Taylorsville, Raleigh, Bay Springs,
Heidelberg, Quitman, Stonewall, Shubuta, Fayette, Natchez, Bude,
Roxie, Meadville, Brookhaven, Monticello, New Hebron, Prentiss,
Bassfield, Collins, Mount Olive, Laurel, Columbia,
West Hattiesburg, Lumberton, Purvis, and Hattiesburg
931 AM CST Thu Jan 18 2018
...HARD FREEZE WARNING REMAINS IN EFFECT UNTIL NOON CST TODAY...
* TEMPERATURE...Temperatures will gradually come above freezing by
noon today. High temperatures will range between 40-45 degrees.
* IMPACTS...Prolonged exposure could lead to hypothermia and may
harm pets and livestock. Exposed plumbing is in danger of
being damaged.
PRECAUTIONARY/PREPAREDNESS ACTIONS...
A Hard Freeze Warning means a prolonged period of sub-freezing
temperatures is ongoing. These conditions will be dangerous to
people and pets without adequate shelter and could damage exposed
pipes.
&&
$$
SKH</Text>
<SourceFile>NPWJAN 0118181800</SourceFile>
</Event>
Do you think its breaking because there's no "</xml>" tag?
Thanks!!!
Copy link to clipboard
Copied
No, the first element is a processing element, it doesn't require a closing tag.
The text you posted appears to be valid XML, but copying and pasting the text probably stripped the problem character(s).
You need to write the bytes received from the request to a file and then interrogate that file for problem characters. If you find any then you will need to let the source of the XML file know that they are producing invalid XML.
Cheers
Eddie
Copy link to clipboard
Copied
Probably. It's also generally a good idea to use CDATA blocks within XML elements that contain large amounts of text, like so:
<Text><CDATA[[
... text goes here ...
]]></Text>
But I understand you may not have any control over what you get from someone else. Also, you may have lost the problem element during the copy and paste operation. I don't really see anything that's an obvious problem.
Dave Watts, CTO, Fig Leaf Software
Copy link to clipboard
Copied
CDATA has been deprecated. It's not dead, yet, but MDN warns that it could stop working at any time.
V/r,
^ _ ^
Copy link to clipboard
Copied
I don't know if that applies to XML, or just to the parsing DOM. In any event, as long as the original poster is consuming this XML within CF and not within JavaScript, I suspect it'll work out fine.https://en.wikipedia.org/wiki/Talk%3ACDATA#CDATA_Deprecated_in_DOM4?
Dave Watts, CTO, Fig Leaf Software