Skip to main content
Participating Frequently
August 21, 2006
Answered

Decode special characters, such as ö

  • August 21, 2006
  • 6 replies
  • 2491 views
Hi,
I have a custom tag that is being passed a value for an attribute that is being encoded. For example the umlaut is being escaped in as ö. I need to take this text and output as a string without esacped characters.

Attached is an example that hopefully will clarify.

Is there a way to get the special characters to decode as text?

TIA,
Dave
    This topic has been closed for replies.
    Correct answer Newsgroup_User
    davidsatz wrote:
    > I need an easy way to transform the html entities into the real chars instead

    here's the dirty work. you'll need to find the entities via regex, strip out the
    "&" and ";" bits, take what's left & look up in the entityMap & finally
    replace w/the unicode code point that's returned.

    <cfscript>
    function createEntityMap() {
    /*
    author paul hastings
    date 22-aug-2006
    note maps HTML entities to unicode code points
    HTML entity data derived from Roedy Green's entities.java found at
    http://mindprod.com/products1.html#ENTITIES
    */
    var entities=structNew();
    entities["le"]=8804;
    entities["Yacute"]=253;
    entities["cup"]=8746;
    entities["sim"]=8764;
    entities["real"]=8476;
    entities["sub"]=8834;
    entities["gt"]=62;
    entities["lfloor"]=8970;
    entities["ordf"]=170;
    entities["sup"]=8835;
    entities["otimes"]=8855;
    entities["Ouml"]=246;
    entities["sube"]=8838;
    entities["Sigma"]=963;
    entities["reg"]=174;
    entities["Beta"]=946;
    entities["oplus"]=8853;
    entities["Pi"]=960;
    entities["ETH"]=240;
    entities["rfloor"]=8971;
    entities["shy"]=173;
    entities["Oslash"]=248;
    entities["Otilde"]=245;
    entities["ang"]=8736;
    entities["trade"]=8482;
    entities["fnof"]=402;
    entities["Chi"]=967;
    entities["upsih"]=978;
    entities["frac12"]=189;
    entities["rlm"]=8207;
    entities["Eacute"]=233;
    entities["permil"]=8240;
    entities["hearts"]=9829;
    entities["Icirc"]=238;
    entities["cent"]=162;
    entities["AElig"]=230;
    entities["Psi"]=968;
    entities["sum"]=8721;
    entities["divide"]=247;
    entities["iquest"]=191;
    entities["Ecirc"]=234;
    entities["ensp"]=8194;
    entities["empty"]=8709;
    entities["forall"]=8704;
    entities["emsp"]=8195;
    entities["Gamma"]=947;
    entities["lceil"]=8968;
    entities["dagger"]=8225;
    entities["not"]=172;
    entities["equiv"]=8801;
    entities["Acirc"]=226;
    entities["Agrave"]=224;
    entities["Eta"]=951;
    entities["alefsym"]=8501;
    entities["ordm"]=186;
    entities["piv"]=982;
    entities["bdquo"]=8222;
    entities["Delta"]=948;
    entities["or"]=8744;
    entities["acute"]=180;
    entities["deg"]=176;
    entities["cong"]=8773;
    entities["Ntilde"]=241;
    entities["lsaquo"]=8249;
    entities["clubs"]=9827;
    entities["hellip"]=8230;
    entities["Ograve"]=242;
    entities["Iuml"]=239;
    entities["diams"]=9830;
    entities["cedil"]=184;
    entities["amp"]=38;
    entities["Alpha"]=945;
    entities["Egrave"]=232;
    entities["darr"]=8659;
    entities["and"]=8743;
    entities["nsub"]=8836;
    entities["ne"]=8800;
    entities["Epsilon"]=949;
    entities["isin"]=8712;
    entities["Ccedil"]=231;
    entities["lsquo"]=8216;
    entities["copy"]=169;
    entities["Aacute"]=225;
    entities["Theta"]=952;
    entities["mdash"]=8212;
    entities["Euml"]=235;
    entities["Kappa"]=954;
    entities["notin"]=8713;
    entities["iexcl"]=161;
    entities["ge"]=8805;
    entities["Igrave"]=236;
    entities["harr"]=8660;
    entities["lowast"]=8727;
    entities["Ocirc"]=244;
    entities["infin"]=8734;
    entities["brvbar"]=166;
    entities["int"]=8747;
    entities["macr"]=175;
    entities["frac34"]=190;
    entities["curren"]=164;
    entities["asymp"]=8776;
    entities["Lambda"]=955;
    entities["frasl"]=8260;
    entities["circ"]=710;
    entities["crarr"]=8629;
    entities["OElig"]=339;
    entities["image"]=8465;
    entities["there4"]=8756;
    entities["lt"]=60;
    entities["minus"]=8722;
    entities["Atilde"]=227;
    entities["ldquo"]=8220;
    entities["nabla"]=8711;
    entities["exist"]=8707;
    entities["Auml"]=228;
    entities["Mu"]=956;
    entities["frac14"]=188;
    entities["nbsp"]=160;
    entities["Oacute"]=243;
    entities["bull"]=8226;
    entities["larr"]=8656;
    entities["laquo"]=171;
    entities["oline"]=8254;
    entities["ndash"]=8211;
    entities["euro"]=8364;
    entities["micro"]=181;
    entities["Nu"]=957;
    entities["cap"]=8745;
    entities["Aring"]=229;
    entities["Omicron"]=959;
    entities["Iacute"]=237;
    entities["perp"]=8869;
    entities["para"]=182;
    entities["rarr"]=8658;
    entities["raquo"]=187;
    entities["Ucirc"]=251;
    entities["Iota"]=953;
    entities["sbquo"]=8218;
    entities["loz"]=9674;
    entities["thetasym"]=977;
    entities["ni"]=8715;
    entities["part"]=8706;
    entities["rdquo"]=8221;
    entities["weierp"]=8472;
    entities["sup1"]=185;
    entities["sup2"]=178;
    entities["Uacute"]=250;
    entities["sdot"]=8901;
    entities["Scaron"]=353;
    entities["yen"]=165;
    entities["Xi"]=958;
    entities["plusmn"]=177;
    entities["yuml"]=376;
    entities["THORN"]=254;
    entities["rang"]=9002;
    entities["Ugrave"]=249;
    entities["radic"]=8730;
    entities["zwj"]=8205;
    entities["tilde"]=732;
    entities["uarr"]=8657;
    entities["times"]=215;
    entities["thinsp"]=8201;
    entities["sect"]=167;
    entities["rceil"]=8969;
    entities["szlig"]=223;
    entities["supe"]=8839;
    entities["Uuml"]=252;
    entities["rsquo"]=8217;
    entities["Zeta"]=950;
    entities["Rho"]=961;
    entities["lrm"]=8206;
    entities["Phi"]=966;
    entities["zwnj"]=8204;
    entities["lang"]=9001;
    entities["pound"]=163;
    entities["sigmaf"]=962;
    entities["uml"]=168;
    entities["prop"]=8733;
    entities["Upsilon"]=965;
    entities["Omega"]=969;
    entities["middot"]=183;
    entities["Tau"]=964;
    entities["sup3"]=179;
    entities["rsaquo"]=8250;
    entities["prod"]=8719;
    entities["quot"]=34;
    entities["prime"]=8243;
    entities["spades"]=9824;
    return entities;
    }
    entityMap=createEntityMap();
    cent=structFind(entityMap,"cent");
    writeoutput("#cent# #chr(cent)#");
    </cfscript>

    6 replies

    Inspiring
    August 24, 2006
    <newbie /> wrote:
    > But is there any reason why no one suggested creating their own function
    > to undo what HtmlEditFormat() does?

    good idea.
    August 23, 2006

    The solution provided is a great solution.

    But is there any reason why no one suggested creating their own function
    to undo what HtmlEditFormat() does?

    Meaning, why not just create an HtmlUnEditFormat() function for un-doing?

    For example:

    <cffunction name="HtmlUnEditFormat" access="public" returntype="string" output="no">
    <cfargument name="str" type="string" required="Yes" />

    <!--- add more as needed --->
    <cfreturn ReplaceList(arguments.str, " ,&lt;,&gt;", " ,<,>") />
    </cffunction>


    Good luck!
    davidsatzAuthor
    Participating Frequently
    August 23, 2006
    I was able to combine your two suggestions and create a nice little function to do this



    Thanks
    Newsgroup_UserCorrect answer
    Inspiring
    August 22, 2006
    davidsatz wrote:
    > I need an easy way to transform the html entities into the real chars instead

    here's the dirty work. you'll need to find the entities via regex, strip out the
    "&" and ";" bits, take what's left & look up in the entityMap & finally
    replace w/the unicode code point that's returned.

    <cfscript>
    function createEntityMap() {
    /*
    author paul hastings
    date 22-aug-2006
    note maps HTML entities to unicode code points
    HTML entity data derived from Roedy Green's entities.java found at
    http://mindprod.com/products1.html#ENTITIES
    */
    var entities=structNew();
    entities["le"]=8804;
    entities["Yacute"]=253;
    entities["cup"]=8746;
    entities["sim"]=8764;
    entities["real"]=8476;
    entities["sub"]=8834;
    entities["gt"]=62;
    entities["lfloor"]=8970;
    entities["ordf"]=170;
    entities["sup"]=8835;
    entities["otimes"]=8855;
    entities["Ouml"]=246;
    entities["sube"]=8838;
    entities["Sigma"]=963;
    entities["reg"]=174;
    entities["Beta"]=946;
    entities["oplus"]=8853;
    entities["Pi"]=960;
    entities["ETH"]=240;
    entities["rfloor"]=8971;
    entities["shy"]=173;
    entities["Oslash"]=248;
    entities["Otilde"]=245;
    entities["ang"]=8736;
    entities["trade"]=8482;
    entities["fnof"]=402;
    entities["Chi"]=967;
    entities["upsih"]=978;
    entities["frac12"]=189;
    entities["rlm"]=8207;
    entities["Eacute"]=233;
    entities["permil"]=8240;
    entities["hearts"]=9829;
    entities["Icirc"]=238;
    entities["cent"]=162;
    entities["AElig"]=230;
    entities["Psi"]=968;
    entities["sum"]=8721;
    entities["divide"]=247;
    entities["iquest"]=191;
    entities["Ecirc"]=234;
    entities["ensp"]=8194;
    entities["empty"]=8709;
    entities["forall"]=8704;
    entities["emsp"]=8195;
    entities["Gamma"]=947;
    entities["lceil"]=8968;
    entities["dagger"]=8225;
    entities["not"]=172;
    entities["equiv"]=8801;
    entities["Acirc"]=226;
    entities["Agrave"]=224;
    entities["Eta"]=951;
    entities["alefsym"]=8501;
    entities["ordm"]=186;
    entities["piv"]=982;
    entities["bdquo"]=8222;
    entities["Delta"]=948;
    entities["or"]=8744;
    entities["acute"]=180;
    entities["deg"]=176;
    entities["cong"]=8773;
    entities["Ntilde"]=241;
    entities["lsaquo"]=8249;
    entities["clubs"]=9827;
    entities["hellip"]=8230;
    entities["Ograve"]=242;
    entities["Iuml"]=239;
    entities["diams"]=9830;
    entities["cedil"]=184;
    entities["amp"]=38;
    entities["Alpha"]=945;
    entities["Egrave"]=232;
    entities["darr"]=8659;
    entities["and"]=8743;
    entities["nsub"]=8836;
    entities["ne"]=8800;
    entities["Epsilon"]=949;
    entities["isin"]=8712;
    entities["Ccedil"]=231;
    entities["lsquo"]=8216;
    entities["copy"]=169;
    entities["Aacute"]=225;
    entities["Theta"]=952;
    entities["mdash"]=8212;
    entities["Euml"]=235;
    entities["Kappa"]=954;
    entities["notin"]=8713;
    entities["iexcl"]=161;
    entities["ge"]=8805;
    entities["Igrave"]=236;
    entities["harr"]=8660;
    entities["lowast"]=8727;
    entities["Ocirc"]=244;
    entities["infin"]=8734;
    entities["brvbar"]=166;
    entities["int"]=8747;
    entities["macr"]=175;
    entities["frac34"]=190;
    entities["curren"]=164;
    entities["asymp"]=8776;
    entities["Lambda"]=955;
    entities["frasl"]=8260;
    entities["circ"]=710;
    entities["crarr"]=8629;
    entities["OElig"]=339;
    entities["image"]=8465;
    entities["there4"]=8756;
    entities["lt"]=60;
    entities["minus"]=8722;
    entities["Atilde"]=227;
    entities["ldquo"]=8220;
    entities["nabla"]=8711;
    entities["exist"]=8707;
    entities["Auml"]=228;
    entities["Mu"]=956;
    entities["frac14"]=188;
    entities["nbsp"]=160;
    entities["Oacute"]=243;
    entities["bull"]=8226;
    entities["larr"]=8656;
    entities["laquo"]=171;
    entities["oline"]=8254;
    entities["ndash"]=8211;
    entities["euro"]=8364;
    entities["micro"]=181;
    entities["Nu"]=957;
    entities["cap"]=8745;
    entities["Aring"]=229;
    entities["Omicron"]=959;
    entities["Iacute"]=237;
    entities["perp"]=8869;
    entities["para"]=182;
    entities["rarr"]=8658;
    entities["raquo"]=187;
    entities["Ucirc"]=251;
    entities["Iota"]=953;
    entities["sbquo"]=8218;
    entities["loz"]=9674;
    entities["thetasym"]=977;
    entities["ni"]=8715;
    entities["part"]=8706;
    entities["rdquo"]=8221;
    entities["weierp"]=8472;
    entities["sup1"]=185;
    entities["sup2"]=178;
    entities["Uacute"]=250;
    entities["sdot"]=8901;
    entities["Scaron"]=353;
    entities["yen"]=165;
    entities["Xi"]=958;
    entities["plusmn"]=177;
    entities["yuml"]=376;
    entities["THORN"]=254;
    entities["rang"]=9002;
    entities["Ugrave"]=249;
    entities["radic"]=8730;
    entities["zwj"]=8205;
    entities["tilde"]=732;
    entities["uarr"]=8657;
    entities["times"]=215;
    entities["thinsp"]=8201;
    entities["sect"]=167;
    entities["rceil"]=8969;
    entities["szlig"]=223;
    entities["supe"]=8839;
    entities["Uuml"]=252;
    entities["rsquo"]=8217;
    entities["Zeta"]=950;
    entities["Rho"]=961;
    entities["lrm"]=8206;
    entities["Phi"]=966;
    entities["zwnj"]=8204;
    entities["lang"]=9001;
    entities["pound"]=163;
    entities["sigmaf"]=962;
    entities["uml"]=168;
    entities["prop"]=8733;
    entities["Upsilon"]=965;
    entities["Omega"]=969;
    entities["middot"]=183;
    entities["Tau"]=964;
    entities["sup3"]=179;
    entities["rsaquo"]=8250;
    entities["prod"]=8719;
    entities["quot"]=34;
    entities["prime"]=8243;
    entities["spades"]=9824;
    return entities;
    }
    entityMap=createEntityMap();
    cent=structFind(entityMap,"cent");
    writeoutput("#cent# #chr(cent)#");
    </cfscript>
    davidsatzAuthor
    Participating Frequently
    August 22, 2006
    Paul - thank you so much for this function. I was hoping that there would be an easy way to "fix" this in ColdFusion. Now I know there a way that is not easy or necessarily something I want to implement in a custom tag that is called by every page on our site. This will be our fallback strategy if the vendor cannot fix the issues with the XSL for certain Germanic characters.
    Dave
    davidsatzAuthor
    Participating Frequently
    August 22, 2006
    I need an easy way to transform the html entities into the real chars instead
    Inspiring
    August 22, 2006
    davidsatz wrote:
    > Hi,
    > I have a custom tag that is being passed a value for an attribute that is
    > being encoded. For example the umlaut is being escaped in as &ouml;. I
    > need to take this text and output as a string without esacped characters.

    why are you using html entities instead of the real chars?

    > Attached is an example that hopefully will clarify.

    it doesn't. what do you want? the real chars instead of the html entities or the
    html entity simply removed?

    > Is there a way to get the special characters to decode as text?

    don't use them in the first place.
    davidsatzAuthor
    Participating Frequently
    August 22, 2006
    hi - there is another xml/xsl application that is producing the CFM with html entities. I cannot change that app, so I am hoping to fix its affects on my meta data.

    dave
    August 21, 2006
    Try URLDecode:
    http://www.techfeed.net/cfQuickDocs/?getDoc=URLDecode

    *** Edit: ***
    There is no undo for HTMLEditFormat() which is likely how the &ouml got there.

    http://www.houseoffusion.com/groups/CF-Talk/thread.cfm/threadid:36235
    davidsatzAuthor
    Participating Frequently
    August 21, 2006
    I have tried that and every other built-in formatting CFML function I can find