Skip to main content
Known Participant
November 11, 2010
Question

XML parsing huge file

  • November 11, 2010
  • 3 replies
  • 8869 views

Hi,

I have a 36M XML file i need to parse, I'm new to XML.

I usually get a 200K file in CSV format from most of my client that they transfer into there account i then simply update the MSSQL database with the CSV file at midnight on my server. But now i have 74 clients that are regroup and they send me 1 XML file.

When i run it using the sample they gave me it works fine but on the 36M file i get a Jrun error then i found out that :

<CFFile action="READ" variable="xmlfile" file="c:\mypath\#clientfile#.xml" charset="utf-8">
<cfset xmlObj = xmlParse(#xmlfile#)>

Doesnt work on big files because it runs out of memory.

I need a way to parse that file using Java i downloaded xmlsax.js but i dont know how to use it to parse then get my parsed var back from it can anyone help me please.

I got the file here :  http://xmljs.sourceforge.net/website/sampleApplications-sax.html

Thank you

This topic has been closed for replies.

3 replies

dave_jf
Participating Frequently
November 11, 2010

Have you tried to have xmlparse read in the file and not use cffile? I

have processed xml files with CF that were over 100mb so I am not

convinced that your file size is the issue.

--Dave

Gates_001Author
Known Participant
November 11, 2010

Have you tried to have xmlparse read in the file and not use cffile?

I dont know how thats what i want to know :-)

Inspiring
November 11, 2010

Did you start by reading the docs for xmlParse()?

http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-6e90.html

XmlParse(xmlText [, caseSensitive ], validator])

xmlText

Any of the following:

  • A string containing XML text.

  • The name of an XML file.

  • The URL of an XML file; valid protocol identifiers include http, https, ftp, and file.

[etc]

--

Adam

Inspiring
November 11, 2010

You might consider moving the XML processing and data import tasks off of ColdFusion and into MS SQL Server.

Some MS SQL options for importing data from XML:

http://msdn.microsoft.com/en-us/library/ms190936(SQL.90).aspx

Gates_001Author
Known Participant
November 11, 2010

You might consider moving the XML processing and data import tasks off of ColdFusion and into MS SQL Server.

You want me to insert the entire 36M file into a field on my MSSQL server ?

Owainnorth
Inspiring
November 11, 2010

No that's not what he was getting at - doing that would get you nowhere, other than getting yourself slapped upside the head by any web developers in the vicinity.

Bob was implying that you use one of Microsoft's tools for importing directly from XML into MSSQL, which would be infiinitely quicker:

http://msdn.microsoft.com/en-us/library/ms191184%28v=SQL.90%29.aspx

Inspiring
November 11, 2010

<cffile> shouldn't struggle with a 36MB file, because that's not really a terribly big file.  It's a swagload of XML, sure, but it's not "big" as far as files go.

If you're running out of memory just reading the file, I'd be looking at your jvm memory settings, in case they were "suboptimal".

The other thing I note is that you say you need to parse this with Java, but then go on to talk about a JavaSCRIPT parser.  Which is it?  Java and JavaScript are two completely unrelated things.  Well they're related in they're both programming languages, but they're more different than the same other than that.

I think you better clarify your requirement here.

--
Adam

Owainnorth
Inspiring
November 11, 2010

I must admit my initial though was that you're clearly doing something stupid here as there's no way CF can't parse a file that size. So I tried it myself.

50Mb XML file - fileread(), xmlParse(). No CFDUMP, just parse. CF shoots up from 500MB to 1100MB then explodes with an out of memory error.

That is truly tragic performance, I must say. Java libraries for the win, I'd suspect.

Inspiring
November 11, 2010
50Mb XML file - fileread(), xmlParse(). No CFDUMP, just parse. CF shoots up from 500MB to 1100MB then explodes with an out of memory error.

But was it the fileRead() or the xmlParse() that did that?  I mean... the former is not much use without the latter, but I suspect it's the xmlParse() that's doing it (and - wow - is it doing it), not the read.

--
Adam