Skip to main content
Known Participant
April 1, 2010
Question

Replace not latin characters from file name.

  • April 1, 2010
  • 3 replies
  • 1792 views

Dear friends,

I tried a lot but even after reading the documentation about all the available CF functions I could not find a way  to accomplish this:

As you already know from my previous post I'm trying to finish an application which manages multiple images. I have reached a really good point and I was ready to implement the solutions you suggest on this post when the client did something marvellous. He uploaded some images named "Εικόνα 123.jpg" (Image 123.jpg in English) which actually broke the Flex application that retrieves the images because the firewall does not allow high bit characters to go through. I now need to add a function that will evaluate every character in the file name individually and it will remove or replace (I do not care) all the characters that are not latin or numbers (spaces, greek characters, special characters, etc). As I've already said no known function is able to do that (as far as I know of course) and I guess that the solution could be hiding in regular expressions which is not my strong point. So I need your help here.

Thank you in advance,

John

    This topic has been closed for replies.

    3 replies

    Known Participant
    April 2, 2010

    Just figured it out!

    <cfset photoName = REReplaceNoCase("#ourPhotos.name#","[^0-9A-Za-z\.\,]","i","ALL") />

    This does the job.

    Thanks,

    Yannis

    ilssac
    Inspiring
    April 4, 2010

    That is a good solution...

    But you mentioned earlier about not knowing how long the strings where to loop over all the characters.  You may want to get familar with the len() function.  It comes in handy quite often.

    Known Participant
    April 5, 2010

    Hi Ian,

    Thank you for the advice. Yes, I still do not know the length of the string (let's use the example path c:\inetpub\ftproot\images\Εικόνα 123.jpg for an invalid image name) and I'm aware of the len() function but the way I wrote the Regular Expression the length is not an issue any more. I just validate every single character in the path regardless the total length and where exactly the invalid characters occure. Of course the part of the path that refers to the folders (c:\inetpub\ftproot\images\) is always perfectly valid so the REReplaceNoCase() function leaves it unchanged. The part of the path which refers to the actual image name (Εικόνα 123.jpg) is changed only. The function replaces one by one the invalid characters with an "i" so it becomes "iiiiiii123.jpg" (valid image name) and of course it does not affect the extension. It works really fine so far. Do I miss something?

    Thanks,

    Yannis

    Known Participant
    April 1, 2010

    I just noticed

    .

    Of course in the title of this post I didn't mean to say "not latin characters" but "non latin characters".

    Sorry.

    ilssac
    Inspiring
    April 1, 2010

    regex may offer a simpler method.

    But if you care to go old school, you could just asc('e')ach character of the string, and dump or change anything that does not fit your rules.

    The asc() function will return the ascii value of a a character.

    Known Participant
    April 1, 2010

    Thanks Ian,

    I cannot figure out how to use asc() function for this. I will have to run a test in every single character and replace the invalid ones but I will never now the actual string length (how many characters each image name will have) and I will  possibly end up destroying the extensions (.jpg) as well. To make it a little more complicated let me tell you that I will have to run this twice. Once for the full image path (ie d:\company\aptown\images\Εικόνα 123.jpg) in which I will have to change only the ...Εικόνα 123... part and not anything else, and once for a comma separated list of the image names (ie Εικόνα 123.jpg, Εικόνα 124.jpg, Εικόνα 125.jpg, Εικόνα 126.jpg). And don't be misleaded from the pattern. The customer may upload an image named with a complitely different way using invalid characters though, for example "Αντίγραφο της Εικόνας 123.jpg" (Copy of the image 123.jpg in English).

    Seems to be impossible ,


    I hope it is not.

    Yannis