Skip to main content
csgaraglino
Known Participant
September 27, 2011
Question

xmlParse() is generating weird characters?

  • September 27, 2011
  • 3 replies
  • 6462 views

At: http://www.icontrolwebstudio.com/RSSFeedImport.cfm?rssfeedurl=http://steveholtonline.org/feed/&webid=msc245w3&showraw=1

I am parsing a WordPress Blog and I am getting weird characters in the descriptions?

Â

“

�

etc?

It lookslike it's replacing quotes, double quotes and the likes - Any advice on how to fix this issue?

I am using ColdFusion 8

    This topic has been closed for replies.

    3 replies

    BKBK
    Community Expert
    Community Expert
    September 28, 2011

    The problem is encoding. You seem to be using an encoding like Windows 1252. The weird characters should revert to readable characters when you convert to UTF-8 encoding.

    Two suggestions on correcting this:

    1) Make sure the XML declaration has UTF-8 encoding, thus

    <?xml version="1.0" encoding="UTF-8" ?>

    or 2) Place the following tag at the top of the page

    <cfprocessingdirective pageEncoding="UTF-8">

    or 3) To correct the encoding just for rendering the output in the browser, place this tag at the top of the page

    <cfcontent type="text/xml; charset=UTF-8"> (alternatively,

    <cfcontent type="application/xhtml+xml; charset=UTF-8"> )

    Edited: 3rd suggestion added

    Inspiring
    September 28, 2011
    or 2) Place the following tag at the top of the page

    <cfprocessingdirective pageEncoding="UTF-8">

    This is a compiler directive, and only pertinent if there's UTF-8-encoded text in the CFM file.  It's not relevant in this situation.

    --

    Adam

    BKBK
    Community Expert
    Community Expert
    September 28, 2011

    Adam Cameron. wrote:

    or 2) Place the following tag at the top of the page

    <cfprocessingdirective pageEncoding="UTF-8">

    This is a compiler directive, and only pertinent if there's UTF-8-encoded text in the CFM file.  It's not relevant in this situation.

    What you say is correct. However, I was after something else, which is relevant to this situation.

    ColdFusion's encoding is by default UTF-8. However, since the output contains unreadable characters, it means the effective encoding isn't UTF-8 (As I said, it is likely Windows 1252). This could mean that ColdFusion guessed the encoding from the byte-order-mark. If so, then using the cfprocessingdirective as suggested would force an error, giving us more information.  

    Inspiring
    September 28, 2011

    At: http://www.icontrolwebstudio.com/RSSFeedImport.cfm?rssfeedurl=http://s teveholtonline.org/feed/&webid=msc245w3&showraw=1

    I am parsing a WordPress Blog and I am getting weird characters in the descriptions?

    Â

    “

    �

    etc?

    What does the raw XML look like?

    Is it a case of CF8 munging it, or is the feed just munged?

    --

    Adam

    Owainnorth
    Inspiring
    September 28, 2011

    Okay, so Adam's gone from "jingoistically" to "munged" in the space of a week?

    Inspiring
    September 28, 2011

    Okay, so Adam's gone from "jingoistically" to "munged" in the space of a week?

    I am very very hungover.

    I suppose I could say my head is munged.  More-so than usual, I mean.

    --

    Adam

    12Robots
    Participating Frequently
    September 27, 2011

    Those are probably so-called "smart quotes" pasted in from a word document or some such.  The only way I have ever been able to deal with them is by find-and-replace with dumb quotes (&quot; and &apos;)

    csgaraglino
    Known Participant
    September 28, 2011

    Tried that, can't get replace to recognize the chars?

    12Robots
    Participating Frequently
    September 28, 2011

    Did you try doing the find with these ASCII values?

    chr(145), chr(146), chr(147), chr(148), chr(151) and then replac with &apos;, &apos; &quot;, &quot;, and &mdash; respectively?