Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

XML parsing huge file

New Here ,
Nov 11, 2010 Nov 11, 2010

Hi,

I have a 36M XML file i need to parse, I'm new to XML.

I usually get a 200K file in CSV format from most of my client that they transfer into there account i then simply update the MSSQL database with the CSV file at midnight on my server. But now i have 74 clients that are regroup and they send me 1 XML file.

When i run it using the sample they gave me it works fine but on the 36M file i get a Jrun error then i found out that :

<CFFile action="READ" variable="xmlfile" file="c:\mypath\#clientfile#.xml" charset="utf-8">
<cfset xmlObj = xmlParse(#xmlfile#)>

Doesnt work on big files because it runs out of memory.

I need a way to parse that file using Java i downloaded xmlsax.js but i dont know how to use it to parse then get my parsed var back from it can anyone help me please.

I got the file here :  http://xmljs.sourceforge.net/website/sampleApplications-sax.html

Thank you

TOPICS
Database access
8.6K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Nov 12, 2010 Nov 12, 2010

Aah, it's okay we're safe there. Shared building you see; we pay our service charges but sometimes alas you have to put your faith in some kind of contractor, whose standards just aren't on the same level as your own.

If it were my responsibility it'd be like a Kleenex factory up there.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 12, 2010 Nov 12, 2010

If it were my responsibility it'd be like a Kleenex factory up there.

So you really like your coding then?

--
Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Nov 12, 2010 Nov 12, 2010

I just heard the phrase "monolithic entities" on a Microsoft presentation and damn near exploded.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 12, 2010 Nov 12, 2010

[you have to picture me grimacing, and clicking my ruby slippers together repeating "there's no place like home, there's no place like home"]


--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Nov 12, 2010 Nov 12, 2010

In response to Owain Norths' comments about DOM parsing.

I'm not sure if the memory issues are the fault of the DOM parsing method being used or if the problem is in how CF converts XML text into CF objects (arrays, structs) that the XML text represents. It possible that the CF objects are responsible for using excessive amounts of memory. Either way it sounds like CF's XML parsing capabilities aren't appropriate for larger (large being a relative term) XML files.

It might be an interesting experiment to use third party Java components (such as Xerces2) to parse some XML files and see what the performance and memory usage look like.

I will re-state my original advice. The poster needs to import data from XML files into tables on MS SQL Server. Bulk import tasks, such as from XML or CSV files, are generally better handled in MS SQL Server. Some options include: a job that executes T-SQL, an Integration Services package, or the Bulk Copy Program (BCP) utility.

From: Owain North

Sent: Fri 11/12/2010 8:57 AM

To:

Subject: XML parsing huge file

Couldn't agree more, and to be honest I can't believe this hasn't come up before. To me, the thought that something like CF should have to be bypassed when you get to files of a few megs is utterly ridiculous. I haven't looked into the different methods of parsing XML as it's really not my thing, but are we saying that DOM parsing is necessary for CF to be able to perform the functions it does on the resulting XML object? Or does one create the same result, just through a different method?

Owain North

Code Monkey

Titan Internet Ltd

http://www.titaninternet.co.uk <http://www.titaninternet.co.uk/>

Owain North is a mildly overweight computer programmer who likes to sit in the corner of a darkened room tapping away on his keyboard whilst wearing a massive set of headphones to avoid human contact where possible. He particularly likes to avoid natural light and salad.

In his spare time he likes to pet his dog and work on his track car: http://www.306gti6.com/forum/showthread.php?id=124722&page=1

The other day he went up to the toilets upstairs and there were no hand towels left! Bad times.

It's Filthy Friday, so we all got Dominos for lunch. Large (obviously) half and half Mighty Meaty and American Hot. Good it was, especially as one of the other guys didn't want his garlic & herb dip = win.

At the moment, he's having to look into WCF for a new project on server monitoring. He doesn't know anything about it yet but after a quick session on Amazon with the company credit card and some extortionate delivery fees he's well on his way to writing his first WCF service.

In case you're interested - in the end, he just had to dry his hands on his jeans.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 12, 2010 Nov 12, 2010
LATEST

Ok i get it my file is not huge or big, lets call it average size file.  It kinna makes coldfusion parsing sound even weaker

I will check your solution out on monday thanks for the tip Bob.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 12, 2010 Nov 12, 2010

Right, but it's not a simple matter of 500 MB into 1.5 GB. JVM memory is a complicated thing. You have different generations, each generation has limits, etc, etc.

As for DOM vs SAX parsing, I don't think it would be easy for Adobe to just have a switch that lets you choose one or the other. SAX parsing is fundamentally different from DOM parsing. It doesn't let you go up and down a tree of nodes like DOM does. It forces you, the developer, to keep track of everything beyond where you're currently located in the XML document hierarchy.

Dave Watts, CTO, Fig Leaf Software

http://www.figleaf.com/

http://training.figleaf.com/

Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on

GSA Schedule, and provides the highest caliber vendor-authorized

instruction at our training centers, online, or onsite.

Read this before you post:

http://forums.adobe.com/thread/607238

Dave Watts, Eidolon LLC
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Nov 11, 2010 Nov 11, 2010

You can pass the name of your XML file to the XmlParse function.

Here is a link to the documentation:

http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-6e90.html

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources