Skip to main content
Inspiring
December 7, 2010
Question

xml parse removing apostrophes

  • December 7, 2010
  • 1 reply
  • 1436 views

I'm using this to parse an html file:

<cfscript>

    myxmldoc1 = XmlParse("D:\inetpub\website\editorial\#whichsection#\#Uploaded_Folder_Name#\#htmlFileName#");

    selectedElements = XmlSearch(GetTheTitle, "/html/body/h1");

    s = "";

    for (i = 1; i LTE ArrayLen(selectedElements); i = i + 1)

        s &= selectedElements.XmlText;

</cfscript>

Works great, but the html file is using "&rsquo;" to display apostrophes and for some reason those are getting stripped out when it gets parsed. Is that typical, and can it be fixed?

    This topic has been closed for replies.

    1 reply

    Inspiring
    December 7, 2010

    The HTML entity "&rsquo;" is not defined in XML.  You can work around this by replacing "&rsquo;" with "&#8217" before calling XMLParse().  You can replace any HTML entities which are not defined in XML by the equivelant XML entity reference "&#NNNN;" where NNNN is the character's unicode code point.

    For a list of HTML entities and their unicode code points see:

    http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

    Squiggy2Author
    Inspiring
    December 8, 2010

    Thanks! Since it was an html page I was digging apart, I found that I could just read the contents straight by reading it via cffile. Man... just when you think you're doing the right thing...