0
Remove NULL character in XML by RegEx??
Explorer
,
/t5/coldfusion-discussions/remove-null-character-in-xml-by-regex/td-p/233912
Jul 11, 2008
Jul 11, 2008
Copy link to clipboard
Copied
I have XML being returned that appears to have NULL
characters in it. When I try to use XMLParse() I get the following
error:
An invalid XML character (Unicode: 0x0) was found in the element content of the document.
If I save the XML to .txt file then read it again I can parse it. That's not really the way I want to do it though as it'll be slow. I'm sure this can be done through a regex. Any ideas?
An invalid XML character (Unicode: 0x0) was found in the element content of the document.
If I save the XML to .txt file then read it again I can parse it. That's not really the way I want to do it though as it'll be slow. I'm sure this can be done through a regex. Any ideas?
TOPICS
Advanced techniques
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting.
Learn more
mr. modus
AUTHOR
Explorer
,
/t5/coldfusion-discussions/remove-null-character-in-xml-by-regex/m-p/233913#M20735
Jul 12, 2008
Jul 12, 2008
Copy link to clipboard
Copied
I messed around with a RegEx and came up with this:
REReplace(thisXML,'[\x0]','','ALL')
It seems to work but I'm no unicode or regex expert. If someone who knows their stuff with RegEx and Unicode could review my RegEx and tell me if it's truly only removing NULLs that would be great.
REReplace(thisXML,'[\x0]','','ALL')
It seems to work but I'm no unicode or regex expert. If someone who knows their stuff with RegEx and Unicode could review my RegEx and tell me if it's truly only removing NULLs that would be great.
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting.
Learn more
New Here
,
LATEST
/t5/coldfusion-discussions/remove-null-character-in-xml-by-regex/m-p/233914#M20736
Jul 14, 2008
Jul 14, 2008
Copy link to clipboard
Copied
Well, it's a relatively simple regex
, so there isn't much to verifying it. You've got the
right expression for hex code 0. I'm not sure you need the brackets
at this point (indicating a character class), but it's easier to
start with them so that you don't need to remember them once you
find other characters to exclude.
As near as I can tell, it should be what you want. You may end up wanting a more complicated regex if you find other invalid characters you want to remove (like byte order marks), but that could be done in a separate statement.

As near as I can tell, it should be what you want. You may end up wanting a more complicated regex if you find other invalid characters you want to remove (like byte order marks), but that could be done in a separate statement.
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting.
Learn more

