Skip to main content
Inspiring
November 20, 2021
Answered

UCASE without accent

  • November 20, 2021
  • 2 replies
  • 831 views

Existe-t-il une option pour que la fonction supprime les accentués  (exemple : Ê pour E) ?

Merci par avance

Is there an option for the function to remove accentuates (example: Ê for E)?

Thanks in advance

    This topic has been closed for replies.
    Correct answer ZNB

    Bonsoir,

    Merci à tous pour vos réponses !

    Je trouve malheureux ... pour Coldfusion, que ce ne soit pas une option de UCASE.

    Encore Merci

     

     


    Good evening,

    Thank you all for your responses !

    I find it unfortunate ... for Coldfusion, that this is not a UCASE option.

    Thanks again

    2 replies

    James Moberg
    Inspiring
    November 20, 2021

    I've used Apache's StringUtils library before, but I prefer using a Java library called Junidecode that converts unicode to ASCII7.

     

    Some basic examples:

    • Москвa becomes Moskva.
    • čeština becomes cestina.
    • Հայաստան becomes Hayastan.
    • Ελληνικά becomes Ellenika.
    • 北亰 becomes Bei Jing
    • Häuser Bäume Höfe Gärten becomes Hauser Baume Hofe Garten
    • daß becomes dass

     

    I've blogged about it here and included sample CFML:

    https://dev.to/gamesover/convert-unicode-strings-to-ascii-with-coldfusion-junidecode-lhf

     

     

    BKBK
    Community Expert
    Community Expert
    November 21, 2021

    Fair enough. Of course, StringUtils has the advantage that it is already present in ColdFusion.

     

     

     

    James Moberg
    Inspiring
    November 21, 2021

    Results will vary.  If you don't have the ability to install any other libraries, you'll have no choice but to use  the options available via the built-in stringUtils library.  (I initially tested stringUtils as well as built-in "java.text.Normalizer" and 3rd-party "com.ibm.icu.text.Transliterator" aka ICU4J.)

     

    I researched alternate solutions when I needed to reject URLs that were using high ASCII to bypass filters.  (For example, ".orⓖ" is blindly processed by browsers as ".org" and was bypassing many of our email filters to be subsequently processed by ColdFusion.)

     

    In my unit test, Junidecode converts the following UTF characters to ASCII while stringUtils returns them unchanged.  (This could cause problems depending how this characters are stored and/or rendered.)

    • ℡ (returns "TEL")
    • ™ (returns "(TM)")
    • © (returns "(C)")
    • ® (returns "(R)")
    • ½ (returns "1/2")

     

    In my unit test, stringUtils failed to normalize/deaccent/latinize the following characters. (This wasn't an exhaustive test... just what I initially noticed.):

    • ƥ  (Junidecode returns "p")
    • ƒ (Junidecode returns "f")
    • ’ (Junidecode returns "'".  I hate smart quotes.)
    • Ł (Junidecode returns "L")

     

    Other comparisons (from my unit test):

    • 北亰 (Junidecode returne "Bei Jing", stringUtils returns the same string.)
    • Łukasiński (Junidecode returns "Lukasinski". stringUtils returns "Łukasinski".)
    • ⠏⠗⠑⠍⠊⠑⠗ (Junidecode returns "permier". stringUtils returns original braille characters.)
    • æøåá (Junidecode returns "aeoaa". stringUtils returns "æøaa".)
    • ราชอาณาจักรไทย (Junidecode returns "raach`aanaacchakraithy". stringUtils returns initial string.)
    • Ελληνικά (Junidecode returns "Ellenika".  stringUtils returns "Ελληνικα"... only the last char is deaccented.)
    • Москвa (Junidecode returns "Moskva". stringUtils returns "Москвa".)
    • Հայաստան (Junidecode returns "Hayastan". stringUtils returns "Հայաստան".)

     

    BKBK
    Community Expert
    Community Expert
    November 20, 2021

    Use Apache's StringUtils class:

    <cfset stringUtilObject=createObject("java","org.apache.commons.lang3.StringUtils")>
    
    <cfset testStringWithAccent="Existe-t-il une option pour que la fonction supprime les accentués  (exemple : Ê pour E) ?">
    
    <cfset stringStrippedOfAccents=stringUtilObject.stripAccents(testStringWithAccent)>
    
    <cfoutput>#stringStrippedOfAccents#</cfoutput>
    
    <br>
    
    <cfoutput>	
    	uCase("ê"): #uCase("ê")# (with accent)<br>
    	uCase("ê"): #stringUtilObject.stripAccents(uCase("ê"))# (without accent)
    </cfoutput>