Copy link to clipboard
Copied
Hi all,
I am trying to read some french accents from a text file and i am getting some weird characters from the french accents. I tried applying some encoding thing, but nothing changed. I post below some details of my problem,would be great if you could help me out of it. thanks
I have a text file content as follows:
fileName: msg123.txt
Transfert : Aéroport-Hôtel Paradis
Siège enfant/bébé
PRI en Voiture Privée le 06DEC10
when i use coldfusion to read the file the output is like that:
Transfert : A�roport-H�tel Paradis
Si�ge enfant/b�b�
PRI en Voiture Priv�e le 06DEC10
my CFM who reads the text file looks like that:
fileName: read.cfm
<!--- READS ALL TEXT FILES --->
<cfset pathDirectory = ExpandPath( "./file" ) />
<cfdirectory action="list" directory="#pathDirectory#" filter="*.txt" name="filename"/>
<cfloop query="filename">
<cfset FilePath = "#pathDirectory#/#name#">
<cfscript>
// Define the file to read, use forward slashes only
FileName="#FilePath#";
// Initilize Java File IO
FileIOClass=createObject("java","java.io.FileReader");
FileIO=FileIOClass.init(FileName);
LineIOClass=createObject("java","java.io.BufferedReader" );
LineIO=LineIOClass.init(FileIO);
</cfscript>....
May you please advice me how should i do it correctly?
Regards
Message was edited by: diditin
Copy link to clipboard
Copied
On 2/23/2011 1:47 PM, diditin said:
I have a text file with content as follows:
what version of cf (mx & newer, cf defaults to utf-8 encoding)? what encoding is
the text file?
if the file is not utf-8 encoded (guess latin-1??) then try
// Initilize Java File IO
btw if you're on cf8 & above you could simply use cfloop w/the file option. i
think it gives about the same performance as the java IO bits.
Copy link to clipboard
Copied
Thanks for your reply Paul.
In fact, i receive a pdf as attachment from a mail, open the pdf, save it as text. then i used my read.cfm to read the text file i just got. So i don't really know what the encoding of the text file. Why I need to do it like that? it is to be able to keep the line by line format. else if i read directly the pdf, it will give me a bundle of data without any newline.
I am using CF8.
Try what exactly? tag cfprocessingdirective?
Copy link to clipboard
Copied
On 2/23/2011 7:25 PM, diditin said:
Try what exactly? tag cfprocessingdirective?
yeah, the forums swallowed the example tag, try "ISO-8859-1" as the pageEncoding
value because i think the reader defaults to ANSI or latin-1 when it writes out
text like that.
Copy link to clipboard
Copied
try "ISO-8859-1" as the pageEncoding
Using "ISO-8859-1", nothing appears, as if it doesn't extract anything from the text file. whereas if i use utf-8, data are extracted but still with the symbols.
Here is my complete read.cfm:
<cfprocessingdirective pageencoding="utf-8"/>
<!--- READS ALL TEXT FILES --->
<cfset pathDirectory = ExpandPath( "./file" ) />
<cfdirectory action="list" directory="#pathDirectory#" filter="*.txt" name="filename"/>
<!--- INITIALIZE STRUCTURES --->
<cfset global = ArrayNew(1)/>
<cfset month_ar_eng= ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']>
<cfloop query="filename">
<cfset FilePath = "#pathDirectory#/#name#">
<cfscript>
// Define the file to read, use forward slashes only
FileName="#FilePath#";
// Initilize Java File IO
FileIOClass=createObject("java","java.io.FileReader");
FileIO=FileIOClass.init(FileName);
LineIOClass=createObject("java","java.io.BufferedReader" );
LineIO=LineIOClass.init(FileIO);</cfscript>
<CFSET EOF = 0>
<cfset partContent = ""/>
<!--- INITIALIZE STRUCTURES --->
<cfset reservation = StructNew()/>
<cfset clients = QueryNew("title, family, name, status ,age, vipMTCO, honeymoon, remarks, language","cf_sql_varchar,cf_sql_varchar,cf_sql_varchar,cf_sql_varchar,cf_sql_integer,cf_sql_integer,cf_sql_integer,cf_sql_varchar,cf_sql_varchar")/>
<cfset reservation.agencyName = "XXX"/>
<cfset reservation.id_agency = "22"/>
<cfset reservation.hotelName = ""/>
<cfset reservation.arrivalDate = ""/>
<cfset reservation.ToRef = ""/>
<cfset reservation.client = #clients#/>
<cfset reservation.arvFlight = "TBA"/>
<cfset reservation.depFlight ="TBA"/>
<cfset reservation.depDate = ""/>
<cfset reservation.arvtransport = ""/>
<cfset reservation.deptransport = ""/>
<cfset reservation.arvTrfDate = ""/>
<cfset reservation.arvTrfFrom = ""/>
<cfset reservation.arvTrfTo = ""/>
<cfset reservation.depTrfDate = ""/>
<cfset reservation.depTrfFrom = ""/>
<cfset reservation.depTrfTo = ""/>
<cfset reservation.status = ""/>
<cfset reservation.remarks = ""/>
<cfset ReadARV = 0>
<cfset ReadDEP = 0>
<CFLOOP condition="NOT EOF">
<!--- Read in next line --->
<CFSET CurrLine = LineIO.readLine()>
<cfset length_ar = #Arraylen(global)#/>
<CFIF IsDefined("CurrLine") EQ "NO">
<CFSET EOF=1>
<cfif #reservation.ToRef# neq ''>
<cfset temp2 = StructNew()>
<cfset temp2.filename = #name#/>
<cfset temp2.ToRef = #reservation.ToRef#/>
<cfset temp2.arrivalDate = #reservation.arvTrfDate#/>
<cfset temp2.arvTrfDate = #reservation.arvTrfDate#/>
<cfset temp2.depDate = #reservation.depTrfDate#/>
<cfset temp2.depTrfDate = #reservation.depTrfDate#/>
<cfset temp2.clients = #clients#/>
<cfset temp2.arvFlight = #reservation.arvFlight#/>
<cfset temp2.depFlight = #reservation.depFlight#/>
<cfset temp2.arvTrfFrom = "#reservation.arvTrfFrom#"/>
<cfset temp2.arvTrfTo = "#reservation.arvTrfTo#"/>
<cfset temp2.depTrfFrom = "#reservation.depTrfFrom#"/>
<cfset temp2.depTrfTo = "#reservation.depTrfTo#"/>
<cfset temp2.arvtransport = #reservation.arvtransport#/>
<cfset temp2.deptransport = #reservation.deptransport#/>
<cfset temp2.agencyName = #reservation.agencyName#/>
<cfset temp2.id_agency = #reservation.id_agency#/>
<cfset temp2.partContent = #partContent#/>
<cfif #reservation.arvTrfTo# neq "">
<cfset temp2.hotelName = #reservation.arvTrfTo#/>
<cfelse>
<cfset temp2.hotelName = #reservation.depTrfFrom#/>
</cfif>
<cfif #reservation.agencyName# neq "">
<cfset global[#length_ar# + 1] = "#temp2#"/>
</cfif>
</cfif>
<cfbreak>
</CFIF><!---
<cfset p = REPLACE(#CurrLine#,"é" , "e" ,"ALL")/>
<cfset p = REPLACE(#CurrLine#,"ô" , "o" ,"ALL")/>--->
<cfset rez55 = REFindNoCase("Départ\s+du\s+:\s+([0-9]{2})([A-Z]{3})([0-9]{2})",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez55.len)# gt 1>
<cfif #reservation.arvTrfDate# eq ''>
<cfset date = trim(mid(#CurrLine#, rez55.pos[2], rez55.len[2]))/>
<cfset months = trim(mid(#CurrLine#, rez55.pos[3], rez55.len[3]))/> <!--- GET MONTH NUMBER--->
<cfset y = trim(mid(#CurrLine#, rez55.pos[4], rez55.len[4]))/>
<cfset years = '20#y#'/>
<cfloop index="i" from="1" to="#ArrayLen(month_ar_eng)#">
<cfif month_ar_eng eq '#months#'>
<cfset monthnumber = #i#>
</cfif>
</cfloop>
<cfset reservation.arvTrfDate = dateformat(CreateDate(#years#,#monthnumber#,#date#), "DD-MM-YYYY")/>
</cfif>
</cfif>
<cfset rez41 = REFindNoCase("\s+le\s+([0-9]{2})([A-Z]{3})([0-9]{2})",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez41.len)# gt 1>
<cfset date = trim(mid(#CurrLine#, rez41.pos[2], rez41.len[2]))>
<cfset months = trim(mid(#CurrLine#, rez41.pos[3], rez41.len[3]))>
<cfset years = trim(mid(#CurrLine#, rez41.pos[4], rez41.len[4]))>
<cfloop index="i" from="1" to="#ArrayLen(month_ar_eng)#">
<cfif month_ar_eng eq '#months#'>
<cfset monthnumber = #i#>
</cfif>
</cfloop>
<cfset reservation.depTrfDate = dateformat(CreateDate(#years#,#monthnumber#,#date#), "DD-MM-YYYY")/>
</cfif>
<cfset rez20 = REFind("Transfert\s+:\s+Aéroport-",#CurrLine#)>
<cfif #rez20# neq 0>
<cfset ReadARV = 1>
<cfset ReadDEP = 0>
</cfif>
<cfset rez21 = REFind("-Aéroport",#CurrLine#)>
<cfif #rez21# neq 0>
<cfset ReadDEP = 1>
<cfset ReadARV = 0>
</cfif>
<cfif ReadARV eq 1><!--- READ ARRIVEE --->
<cfset rez601 = REFindNoCase("Provenance:\s+(.*)",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez601.len)# gt 1>
<cfset _from = trim(mid(#CurrLine#, rez601.pos[2], rez601.len[2]))/>
<cfset _from = REPLACE(#_from#,"ô" , "00000000" ,"ALL")/>
<cfset reservation.arvTrfFrom = UCASE(#_from#)/>
</cfif>
<cfset rez602 = REFindNoCase("Destination:\s+(.*)",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez602.len)# gt 1>
<cfset _too = trim(mid(#CurrLine#, rez602.pos[2], rez602.len[2]))/>
<cfset _too = REPLACE(#_too#,"ô" , "00000000" ,"ALL")/>
<cfset reservation.arvTrfTo = UCASE(#_too#)/>
</cfif>
<cfset rez440 = REFindNoCase("ARRIVEE\s+([A-Z]{2})\s+([0-9]{2})",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez440.len)# gt 1>
<cfset flightNo1 = trim(mid(#CurrLine#, rez440.pos[2], rez440.len[2]))>
<cfset flightNo2 = trim(mid(#CurrLine#, rez440.pos[3], rez440.len[3]))>
<cfset reservation.arvFlight = "#flightNo1##flightNo2#"/>
</cfif>
<cfset rez440 = REFindNoCase("MRU\s+([A-Z]{2})\s+([0-9]{2})",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez440.len)# gt 1>
<cfset flightNo1 = trim(mid(#CurrLine#, rez440.pos[2], rez440.len[2]))>
<cfset flightNo2 = trim(mid(#CurrLine#, rez440.pos[3], rez440.len[3]))>
<cfset reservation.arvFlight = "#flightNo1#0#flightNo2#"/>
</cfif>
<cfset rez441 = REFindNoCase("MRU\s+([A-Z]{2})\s+([0-9]{3})",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez441.len)# gt 1>
<cfset flightNo1 = trim(mid(#CurrLine#, rez441.pos[2], rez441.len[2]))>
<cfset flightNo2 = trim(mid(#CurrLine#, rez441.pos[3], rez441.len[3]))>
<cfset reservation.arvFlight = "#flightNo1##flightNo2#"/>
</cfif>
<cfset rez90 = REFind("MIN\s+en\s+Minibus",#CurrLine#)>
<cfif #rez90# neq 0>
<cfset reservation.arvtransport = "STANDARD-MINIBUS"/>
</cfif>
<cfset rez93 = REFind("MIP\s+en\s+Minibus",#CurrLine#)>
<cfif #rez93# neq 0>
<cfset reservation.arvtransport = "CLUB-FAMILY"/>
</cfif>
<cfset rez91 = REFind("HEL\s+en\s+Hélicoptère",#CurrLine#)>
<cfif #rez91# neq 0>
<cfset reservation.arvtransport = "HELICO-HELICO"/>
</cfif>
<cfset rez92 = REFind("PRI\s+en\s+Voiture\s+Privée",#CurrLine#)>
<cfif #rez92# neq 0>
<cfset reservation.arvtransport = "PRIVATE-CAR"/>
</cfif>
</cfif><!--- END OF READ ARRIVEE --->
<cfif ReadDEP eq 1><!--- READ DEPART --->
<cfset rez611 = REFindNoCase("Provenance:\s+(.*)",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez611.len)# gt 1>
<cfset _from = trim(mid(#CurrLine#, rez611.pos[2], rez611.len[2]))/>
<cfset reservation.depTrfFrom = UCASE(#_from#)/>
</cfif>
<cfset rez612 = REFindNoCase("Destination:\s+(.*)",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez612.len)# gt 1>
<cfset _too = trim(mid(#CurrLine#, rez612.pos[2], rez612.len[2]))/>
<cfset reservation.depTrfTo = UCASE(#_too#)/>
</cfif>
<cfset rez441 = REFindNoCase("DECOLLAGE\s+.*\s+([A-Z]{2})([0-9]{2})",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez441.len)# gt 1>
<cfset flightNo1 = trim(mid(#CurrLine#, rez441.pos[2], rez441.len[2]))>
<cfset flightNo2 = trim(mid(#CurrLine#, rez441.pos[3], rez441.len[3]))>
<cfset reservation.depFlight = "#flightNo1##flightNo2#"/>
</cfif>
<cfset rez440 = REFindNoCase("APT\s+MRU.*([A-Z]{2})\s+([0-9]{2})",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez440.len)# gt 1>
<cfset flightNo1 = trim(mid(#CurrLine#, rez440.pos[2], rez440.len[2]))>
<cfset flightNo2 = trim(mid(#CurrLine#, rez440.pos[3], rez440.len[3]))>
<cfset reservation.depFlight = "#flightNo1#0#flightNo2#"/>
</cfif>
<cfset rez441 = REFindNoCase("APT\s+MRU.*([A-Z]{2})\s+([0-9]{3})",#CurrLine#, 1, "True")>
<cfif #ArrayLen(rez441.len)# gt 1>
<cfset flightNo1 = trim(mid(#CurrLine#, rez441.pos[2], rez441.len[2]))>
<cfset flightNo2 = trim(mid(#CurrLine#, rez441.pos[3], rez441.len[3]))>
<cfset reservation.depFlight = "#flightNo1##flightNo2#"/>
</cfif>
<cfset rez90 = REFind("MIN\s+en\s+Minibus",#CurrLine#)>
<cfif #rez90# neq 0>
<cfset reservation.deptransport = "STANDARD-MINIBUS"/>
</cfif>
<cfset rez91 = REFind("HEL\s+en\s+Hélicoptère",#CurrLine#)>
<cfif #rez91# neq 0>
<cfset reservation.deptransport = "HELICO-HELICO"/>
</cfif>
<cfset rez92 = REFind("PRI\s+en\s+Voiture\s+Privée",#CurrLine#)>
<cfif #rez92# neq 0>
<cfset reservation.deptransport = "PRIVATE-CAR"/>
</cfif>
<cfset rez93 = REFind("MIP\s+en\s+Minibus",#CurrLine#)>
<cfif #rez93# neq 0>
<cfset reservation.deptransport = "CLUB-FAMILY"/>
</cfif>
</cfif><!--- END OF READ DEPART --->
<cfset partContent = "#partContent#<br>#CurrLine#"/>
</CFLOOP>
</cfloop>
<cfdump var="#global#">
2. Create a folder called : 'file' under same directory as read.cfm
3. Place a text file for example: msg1.txt under file with content as follows:
Transfert : Aéroport-Hôtel Paradis
Siège enfant/bébé
PRI en Voiture Privée le 06DEC08
Destination: Paradis Hôtel & Golf Club
4. run the read.cfm on a browser, it should extract data without the normal french accent replaced by symbols.
I think that at the time CF opens the text file and reads data, the french accent has already been transformed.
Is there any other means to overcome this thing?, without the processingdirective tag?
thanks for all
David
Copy link to clipboard
Copied
On 2/24/2011 2:13 PM, diditin said:
>
>> try "ISO-8859-1" as the pageEncoding
>
Using "ISO-8859-1", nothing appears, as if it doesn't extract anything from
the text file. whereas if i use utf-8, data are extracted but still with the
symbols.
strange, if the declared encoding in cf doesn't match the encoding of the file
(well at least as far as a BOM goes), cf should throw an error.
i can reproduce your issue by mangling the text file's encoding (copying the
text in this email & pasting it into notepad & saving the file as ansi). i can
also solve the issue by saving the same text with a BOM. if you're sure the text
encoding is utf-8, try saving a test case w/notepad as utf-8, that should add a BOM.
Here is my complete read.cfm:
too much information. if you want folks to help, you have to try & reduce the
problem down to it's simplest. if that fails, then examine all the code.
and as much as i hate machine translations, maybe you can save some of this
effort by 1st using google xlate: http://bit.ly/gCUD9L then parsing the
translated text? it often produces gibberish for complex phrasing in many
languages (like thai) but it does an Ok job for languages like french as long as
the phrasing isn't too complex.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Hi,
Thanks for your support, finally, it got to be good now.
Summary of solution:
1. changed server language to French
2. added two lines of code at the beginning
The coldfusion server language was initially set to english using utf-8.
1. cd /etc/sysconfig/
create a backup of the initial file named i18n.
2. nano i18n
3. change to LANG="fr_FR.ISO-8859-1"
SYSFONT="latarcyrheb-sun16"
3. save and exit file
restart server using command 'reboot', restart services.
4. add on the beginning lines of the coldfusion file you using to read the french accent file.
<cfcontent type="text/html; charset=ISO-8859-1">
<cfprocessingdirective pageencoding="ISO-8859-1">
5. Run your coldfusion page, it should display french accent.
Hope it goes well for you reading it.
gd luck and thanks
Message was edited by: diditin