Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Coldfusion can't read French accents from text file

New Here ,
Feb 22, 2011 Feb 22, 2011

Hi all,

I am trying to read some french accents from a text file and i am getting some weird characters from the french accents. I tried applying some encoding thing, but nothing changed. I post below some details of my problem,would be great if you could help me out of it. thanks

I have a text file content as follows:

fileName: msg123.txt

Transfert : Aéroport-Hôtel Paradis
Siège enfant/bébé
PRI en Voiture Privée le 06DEC10

when i use coldfusion to read the file the output is like that:

Transfert : A�roport-H�tel Paradis

Si�ge enfant/b�b�

PRI en Voiture Priv�e le 06DEC10

my CFM who reads the text file looks like that:

fileName: read.cfm

<!--- READS ALL TEXT FILES --->
<cfset pathDirectory = ExpandPath( "./file" ) />

<cfdirectory action="list" directory="#pathDirectory#" filter="*.txt"  name="filename"/>


<cfloop query="filename">

     <cfset FilePath = "#pathDirectory#/#name#">
    
     <cfscript>
          // Define the file to read, use forward slashes only
          FileName="#FilePath#";
          // Initilize Java File IO
          FileIOClass=createObject("java","java.io.FileReader");
          FileIO=FileIOClass.init(FileName);
          LineIOClass=createObject("java","java.io.BufferedReader" );
          LineIO=LineIOClass.init(FileIO);
     </cfscript>

....

May you please advice me how should i do it correctly?

Regards

Message was edited by: diditin

TOPICS
Getting started
3.2K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 23, 2011 Feb 23, 2011

On 2/23/2011 1:47 PM, diditin said:

I have a text file with content as follows:

what version of cf (mx & newer, cf defaults to utf-8 encoding)? what encoding is

the text file?

if the file is not utf-8 encoded (guess latin-1??) then try

// Initilize Java File IO

btw if you're on cf8 & above you could simply use cfloop w/the file option. i

think it gives about the same performance as the java IO bits.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 23, 2011 Feb 23, 2011

Thanks for your reply Paul.

In fact, i receive a pdf as attachment from a mail, open the pdf, save it as text. then i used my read.cfm to read the text file i just got. So i don't really know what the encoding of the text file. Why I need to do it like that? it is to be able to keep the line by line format. else if i read directly the pdf, it will give me a bundle of data without any newline.

I am using CF8.

Try what exactly? tag cfprocessingdirective?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 23, 2011 Feb 23, 2011

On 2/23/2011 7:25 PM, diditin said:

Try what exactly? tag cfprocessingdirective?

yeah, the forums swallowed the example tag, try "ISO-8859-1" as the pageEncoding

value because i think the reader defaults to ANSI or latin-1 when it writes out

text like that.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 23, 2011 Feb 23, 2011

try "ISO-8859-1" as the pageEncoding

Using "ISO-8859-1", nothing appears, as if it doesn't extract anything from the text file. whereas if i use utf-8, data are extracted but still with the symbols.

Here is my complete read.cfm:

<cfprocessingdirective pageencoding="utf-8"/>

<!--- READS ALL TEXT FILES --->
<cfset pathDirectory = ExpandPath( "./file" ) />

<cfdirectory action="list" directory="#pathDirectory#" filter="*.txt"  name="filename"/>

<!--- INITIALIZE STRUCTURES --->
<cfset global = ArrayNew(1)/>
<cfset month_ar_eng= ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']>

<cfloop query="filename">

     <cfset FilePath = "#pathDirectory#/#name#">
    
     <cfscript>
          // Define the file to read, use forward slashes only
          FileName="#FilePath#";
          // Initilize Java File IO
          FileIOClass=createObject("java","java.io.FileReader");
          FileIO=FileIOClass.init(FileName);
          LineIOClass=createObject("java","java.io.BufferedReader" );
          LineIO=LineIOClass.init(FileIO);

</cfscript>
    
     <CFSET EOF = 0>
     <cfset partContent = ""/>

     <!--- INITIALIZE STRUCTURES --->
     <cfset reservation = StructNew()/>
     <cfset clients = QueryNew("title, family, name, status ,age, vipMTCO, honeymoon, remarks, language","cf_sql_varchar,cf_sql_varchar,cf_sql_varchar,cf_sql_varchar,cf_sql_integer,cf_sql_integer,cf_sql_integer,cf_sql_varchar,cf_sql_varchar")/>

     <cfset reservation.agencyName = "XXX"/>
     <cfset reservation.id_agency = "22"/>
     <cfset reservation.hotelName = ""/>
     <cfset reservation.arrivalDate = ""/>
     <cfset reservation.ToRef = ""/>
     <cfset reservation.client = #clients#/>
     <cfset reservation.arvFlight = "TBA"/>
     <cfset reservation.depFlight ="TBA"/>
     <cfset reservation.depDate = ""/>
     <cfset reservation.arvtransport = ""/>
     <cfset reservation.deptransport = ""/>
     <cfset reservation.arvTrfDate = ""/>
     <cfset reservation.arvTrfFrom = ""/>
     <cfset reservation.arvTrfTo = ""/>
     <cfset reservation.depTrfDate = ""/>
     <cfset reservation.depTrfFrom = ""/>
     <cfset reservation.depTrfTo = ""/>    
     <cfset reservation.status = ""/>
     <cfset reservation.remarks = ""/>
    
          <cfset ReadARV = 0>
          <cfset ReadDEP = 0>
         
         
     <CFLOOP condition="NOT EOF">
    
          <!--- Read in next line --->
          <CFSET CurrLine = LineIO.readLine()>

          <cfset length_ar = #Arraylen(global)#/>

          <CFIF IsDefined("CurrLine") EQ "NO">    

               <CFSET EOF=1>
               <cfif  #reservation.ToRef# neq ''>
                    <cfset temp2 = StructNew()>                   
                    <cfset temp2.filename = #name#/>
                    <cfset temp2.ToRef = #reservation.ToRef#/>
                    <cfset temp2.arrivalDate = #reservation.arvTrfDate#/>
                    <cfset temp2.arvTrfDate = #reservation.arvTrfDate#/>
                    <cfset temp2.depDate = #reservation.depTrfDate#/>
                    <cfset temp2.depTrfDate = #reservation.depTrfDate#/>
                    <cfset temp2.clients = #clients#/>
                    <cfset temp2.arvFlight = #reservation.arvFlight#/>
                    <cfset temp2.depFlight = #reservation.depFlight#/>
                    <cfset temp2.arvTrfFrom = "#reservation.arvTrfFrom#"/>
                    <cfset temp2.arvTrfTo = "#reservation.arvTrfTo#"/>
                    <cfset temp2.depTrfFrom = "#reservation.depTrfFrom#"/>
                    <cfset temp2.depTrfTo = "#reservation.depTrfTo#"/>
                    <cfset temp2.arvtransport = #reservation.arvtransport#/>
                    <cfset temp2.deptransport = #reservation.deptransport#/>
                    <cfset temp2.agencyName = #reservation.agencyName#/>
                    <cfset temp2.id_agency = #reservation.id_agency#/>
                    <cfset temp2.partContent = #partContent#/>
                    <cfif #reservation.arvTrfTo# neq "">
                         <cfset temp2.hotelName = #reservation.arvTrfTo#/>
                    <cfelse>
                         <cfset temp2.hotelName = #reservation.depTrfFrom#/>
                    </cfif>
                   
                    <cfif #reservation.agencyName# neq "">
                         <cfset global[#length_ar# + 1] = "#temp2#"/>
                    </cfif>
               </cfif>
              
               <cfbreak>
              
          </CFIF>

<!---

          <cfset p = REPLACE(#CurrLine#,"é" , "e" ,"ALL")/>
          <cfset p = REPLACE(#CurrLine#,"ô" , "o" ,"ALL")/>

--->

          <cfset rez55 = REFindNoCase("Départ\s+du\s+:\s+([0-9]{2})([A-Z]{3})([0-9]{2})",#CurrLine#, 1, "True")>
          <cfif #ArrayLen(rez55.len)# gt 1>
               <cfif #reservation.arvTrfDate# eq ''>
                    <cfset date = trim(mid(#CurrLine#, rez55.pos[2], rez55.len[2]))/>
                    <cfset months = trim(mid(#CurrLine#, rez55.pos[3], rez55.len[3]))/> <!--- GET MONTH NUMBER--->
                    <cfset y = trim(mid(#CurrLine#, rez55.pos[4], rez55.len[4]))/>
                    <cfset years = '20#y#'/>
                    <cfloop index="i" from="1" to="#ArrayLen(month_ar_eng)#">
                         <cfif month_ar_eng eq '#months#'>
                              <cfset monthnumber = #i#>
                         </cfif>
                    </cfloop>
                    <cfset reservation.arvTrfDate = dateformat(CreateDate(#years#,#monthnumber#,#date#), "DD-MM-YYYY")/>
               </cfif>
          </cfif>


          <cfset rez41 = REFindNoCase("\s+le\s+([0-9]{2})([A-Z]{3})([0-9]{2})",#CurrLine#, 1, "True")>
          <cfif #ArrayLen(rez41.len)# gt 1>
               <cfset date = trim(mid(#CurrLine#, rez41.pos[2], rez41.len[2]))>
               <cfset months = trim(mid(#CurrLine#, rez41.pos[3], rez41.len[3]))>
               <cfset years = trim(mid(#CurrLine#, rez41.pos[4], rez41.len[4]))>
               <cfloop index="i" from="1" to="#ArrayLen(month_ar_eng)#">
                    <cfif month_ar_eng eq '#months#'>
                         <cfset monthnumber = #i#>
                    </cfif>
               </cfloop>
               <cfset reservation.depTrfDate = dateformat(CreateDate(#years#,#monthnumber#,#date#), "DD-MM-YYYY")/>
          </cfif>    
         
       
         
          <cfset rez20 = REFind("Transfert\s+:\s+Aéroport-",#CurrLine#)>
          <cfif #rez20# neq 0>
               <cfset ReadARV = 1>
               <cfset ReadDEP = 0>
          </cfif>
         
          <cfset rez21 = REFind("-Aéroport",#CurrLine#)>
          <cfif #rez21# neq 0>
               <cfset ReadDEP = 1>
               <cfset ReadARV = 0>
          </cfif>
         
          <cfif ReadARV eq 1><!--- READ ARRIVEE --->
               <cfset rez601 = REFindNoCase("Provenance:\s+(.*)",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez601.len)# gt 1>
                    <cfset _from = trim(mid(#CurrLine#, rez601.pos[2], rez601.len[2]))/>
                    <cfset _from = REPLACE(#_from#,"ô" , "00000000" ,"ALL")/>
                    <cfset reservation.arvTrfFrom = UCASE(#_from#)/>
               </cfif>
              
               <cfset rez602 = REFindNoCase("Destination:\s+(.*)",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez602.len)# gt 1>
                    <cfset _too = trim(mid(#CurrLine#, rez602.pos[2], rez602.len[2]))/>
                    <cfset _too = REPLACE(#_too#,"ô" , "00000000" ,"ALL")/>
                    <cfset reservation.arvTrfTo = UCASE(#_too#)/>
               </cfif>
              
               <cfset rez440 = REFindNoCase("ARRIVEE\s+([A-Z]{2})\s+([0-9]{2})",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez440.len)# gt 1>
    
                    <cfset flightNo1 = trim(mid(#CurrLine#, rez440.pos[2], rez440.len[2]))>
                    <cfset flightNo2 = trim(mid(#CurrLine#, rez440.pos[3], rez440.len[3]))>
                    <cfset reservation.arvFlight = "#flightNo1##flightNo2#"/>
               </cfif>
              
               <cfset rez440 = REFindNoCase("MRU\s+([A-Z]{2})\s+([0-9]{2})",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez440.len)# gt 1>
    
                    <cfset flightNo1 = trim(mid(#CurrLine#, rez440.pos[2], rez440.len[2]))>
                    <cfset flightNo2 = trim(mid(#CurrLine#, rez440.pos[3], rez440.len[3]))>
                    <cfset reservation.arvFlight = "#flightNo1#0#flightNo2#"/>
               </cfif>
              
               <cfset rez441 = REFindNoCase("MRU\s+([A-Z]{2})\s+([0-9]{3})",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez441.len)# gt 1>
                    <cfset flightNo1 = trim(mid(#CurrLine#, rez441.pos[2], rez441.len[2]))>
                    <cfset flightNo2 = trim(mid(#CurrLine#, rez441.pos[3], rez441.len[3]))>
                    <cfset reservation.arvFlight = "#flightNo1##flightNo2#"/>
               </cfif>
              
              
               <cfset rez90 = REFind("MIN\s+en\s+Minibus",#CurrLine#)>
               <cfif #rez90# neq 0>
                    <cfset reservation.arvtransport = "STANDARD-MINIBUS"/>
               </cfif>
              
               <cfset rez93 = REFind("MIP\s+en\s+Minibus",#CurrLine#)>
               <cfif #rez93# neq 0>
                    <cfset reservation.arvtransport = "CLUB-FAMILY"/>
               </cfif>
              
               <cfset rez91 = REFind("HEL\s+en\s+Hélicoptère",#CurrLine#)>
               <cfif #rez91# neq 0>
                    <cfset reservation.arvtransport = "HELICO-HELICO"/>
               </cfif>
              
               <cfset rez92 = REFind("PRI\s+en\s+Voiture\s+Privée",#CurrLine#)>
               <cfif #rez92# neq 0>
                    <cfset reservation.arvtransport = "PRIVATE-CAR"/>
               </cfif>         
              
          </cfif><!--- END OF READ ARRIVEE --->
    
          <cfif ReadDEP eq 1><!--- READ DEPART --->
               <cfset rez611 = REFindNoCase("Provenance:\s+(.*)",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez611.len)# gt 1>
                    <cfset _from = trim(mid(#CurrLine#, rez611.pos[2], rez611.len[2]))/>
                    <cfset reservation.depTrfFrom = UCASE(#_from#)/>
               </cfif>
              
               <cfset rez612 = REFindNoCase("Destination:\s+(.*)",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez612.len)# gt 1>
                    <cfset _too = trim(mid(#CurrLine#, rez612.pos[2], rez612.len[2]))/>
                    <cfset reservation.depTrfTo = UCASE(#_too#)/>
               </cfif>
              
              
               <cfset rez441 = REFindNoCase("DECOLLAGE\s+.*\s+([A-Z]{2})([0-9]{2})",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez441.len)# gt 1>
                    <cfset flightNo1 = trim(mid(#CurrLine#, rez441.pos[2], rez441.len[2]))>
                    <cfset flightNo2 = trim(mid(#CurrLine#, rez441.pos[3], rez441.len[3]))>
                    <cfset reservation.depFlight = "#flightNo1##flightNo2#"/>
               </cfif>
              
               <cfset rez440 = REFindNoCase("APT\s+MRU.*([A-Z]{2})\s+([0-9]{2})",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez440.len)# gt 1>
    
                    <cfset flightNo1 = trim(mid(#CurrLine#, rez440.pos[2], rez440.len[2]))>
                    <cfset flightNo2 = trim(mid(#CurrLine#, rez440.pos[3], rez440.len[3]))>
                    <cfset reservation.depFlight = "#flightNo1#0#flightNo2#"/>
               </cfif>
              
               <cfset rez441 = REFindNoCase("APT\s+MRU.*([A-Z]{2})\s+([0-9]{3})",#CurrLine#, 1, "True")>
               <cfif #ArrayLen(rez441.len)# gt 1>
                    <cfset flightNo1 = trim(mid(#CurrLine#, rez441.pos[2], rez441.len[2]))>
                    <cfset flightNo2 = trim(mid(#CurrLine#, rez441.pos[3], rez441.len[3]))>
                    <cfset reservation.depFlight = "#flightNo1##flightNo2#"/>
               </cfif>
              
               <cfset rez90 = REFind("MIN\s+en\s+Minibus",#CurrLine#)>
               <cfif #rez90# neq 0>
                    <cfset reservation.deptransport = "STANDARD-MINIBUS"/>
               </cfif>              
              
               <cfset rez91 = REFind("HEL\s+en\s+Hélicoptère",#CurrLine#)>
               <cfif #rez91# neq 0>
                    <cfset reservation.deptransport = "HELICO-HELICO"/>
               </cfif>
                             
               <cfset rez92 = REFind("PRI\s+en\s+Voiture\s+Privée",#CurrLine#)>
               <cfif #rez92# neq 0>
                    <cfset reservation.deptransport = "PRIVATE-CAR"/>
               </cfif>              
              
               <cfset rez93 = REFind("MIP\s+en\s+Minibus",#CurrLine#)>
               <cfif #rez93# neq 0>
                    <cfset reservation.deptransport = "CLUB-FAMILY"/>
               </cfif>
          </cfif><!--- END OF READ DEPART --->
              
          <cfset partContent = "#partContent#<br>#CurrLine#"/>

     </CFLOOP>

</cfloop> 
<cfdump var="#global#">

2.  Create a folder called : 'file' under same directory as read.cfm

3. Place a text file for example: msg1.txt under file with content as follows:


Transfert : Aéroport-Hôtel Paradis
Siège enfant/bébé
PRI en Voiture Privée le 06DEC08
Destination: Paradis Hôtel & Golf Club

4. run the read.cfm on a browser, it should extract data without the normal french accent replaced by symbols.

I think that at the time CF opens the text file and reads data, the french accent has already been transformed.

Is there any other means to overcome this thing?, without the processingdirective tag?

thanks for all

David

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 24, 2011 Feb 24, 2011

On 2/24/2011 2:13 PM, diditin said:

>

>> try "ISO-8859-1" as the pageEncoding

>

Using "ISO-8859-1", nothing appears, as if it doesn't extract anything from

the text file. whereas if i use utf-8, data are extracted but still with the

symbols.

strange, if the declared encoding in cf doesn't match the encoding of the file

(well at least as far as a BOM goes), cf should throw an error.

i can reproduce your issue by mangling the text file's encoding (copying the

text in this email & pasting it into notepad & saving the file as ansi). i can

also solve the issue by saving the same text with a BOM. if you're sure the text

encoding is utf-8, try saving a test case w/notepad as utf-8, that should add a BOM.

Here is my complete read.cfm:

too much information. if you want folks to help, you have to try & reduce the

problem down to it's simplest. if that fails, then examine all the code.

and as much as i hate machine translations, maybe you can save some of this

effort by 1st using google xlate: http://bit.ly/gCUD9L then parsing the

translated text? it often produces gibberish for complex phrasing in many

languages (like thai) but it does an Ok job for languages like french as long as

the phrasing isn't too complex.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 24, 2011 Feb 24, 2011
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 31, 2011 Mar 31, 2011
LATEST

Hi,

Thanks for your support, finally, it got to be good now.

Summary of solution:

1. changed server language to French

2. added two lines of code at the beginning

The coldfusion server language was initially set to english using utf-8.

1. cd /etc/sysconfig/

create a backup of the initial file named i18n.

2. nano i18n

3. change to LANG="fr_FR.ISO-8859-1"
SYSFONT="latarcyrheb-sun16"

3. save and exit file

restart server using command 'reboot', restart services.

4. add on the beginning lines of the coldfusion file you using to read the french accent file.

<cfcontent type="text/html; charset=ISO-8859-1">
<cfprocessingdirective pageencoding="ISO-8859-1">

5.  Run your coldfusion page, it should display french accent.

Hope it goes well for you reading it.

gd luck and thanks

Message was edited by: diditin

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources