cfspreadsheet and reading the xlsx files issue

Report · Jun 09, 2011

I am trying to process large excel files (65,000 rows) into the DB. So I try to use the new cfspreadsheet tag to read in the files by 5000 rows at a time and then process the query into the tables. The process is working fine when reading the (pre 2003 format "xls" files) but not when reading the new "xlsx" type of files. The server is running out of memory trying to read those files for some reason. Is this a known bug in ColdFusion 9.0?? Has anyone experiansed similar issues?

Thanks

Report · Jun 10, 2011

The docs claim all versions of Excel from 1997-2007 are supported.

Are you getting this with •any• .xlsx file, or just large ones?

Do you need CF sitting in the middle of this process? Can you not get the DB to import the Excel file directly?

--

Adam

Report · Jun 10, 2011

I need to do some processing on the data first that comes from the spreadsheet. This seems to be the problem with the very large xlsx type spreadsheets. The older format xls (up to 2003) is working great even with 65,000 rows of data. So I think this is something to do with xlsx and very large spreadsheets.

- Pavel

Report · Jun 10, 2011

That it runs OK with a large .xls & have problems with large .xlsx files does not demonstrate it's specifically a problem with the size of the .xlsx, rather that being a problem with .xlsx files in general. And whether it's one way or the other dictates what the solution might be.

Test (and compare performance) on a smaller file.

--

Adam

Report · Jun 10, 2011

Yah, I think you are right on the fact that there is a problem when reading the xlsx files with cfspreadsheet tag.

When reading the xlsx file the ColdFusion is using much more JVM memory then reading xls file and its also noticably slower, so I think that there is a problem when reading the xlsx file format in general but as long as the file is small its working fine. But once you get to files with 40,000 + rows the memory issue grows exponantially and server runs out of memory.

- Pavel

Report · Jun 10, 2011

If you've got time, grab Ben Nadel's POI wrapper (http://www.bennadel.com/projects/poi-utility.htm) and run a test with that instead of CFSPREADSHEET. Just to find out if it's a POI issue (not likely, admittedly), of CF's usage thereof.

And if using POI directly works OK... you can just use that instead, maybe...

--

Adam

Report · Jun 10, 2011

if it's a POI issue (not likely, admittedly), of CF's
usage thereof.

If you check the POI archives, you should see a LOT of memory issues with large xlsx files. Usually related to how the file is processed. IIRC the standard way of loading a spreadsheet ie new XSSFWorkbook(name) just reads the whole file into memory in one shot. So with all the bloated xml and parsing going on it is very prone to OOM errors.

For big files, they actually recommend using the event api instead, because of its lower footprint. I used it a couple months ago for a side project and it made a huge difference! I did some basic profiling and the standard method (usermodel) was definitely more prone to big spikes in memory and oom errors. In contrast, memory usage with event api was far lower and more stable.

Reason for bringing this all of this up ...? Well depending on how CF actually reads/loads spreadsheets internally, maybe the problem is a little bit of both .. ? Just my $0.02

-Leigh

Message was edited by: -==cfSearching==-

Report · Jun 10, 2011

*Oh Crap* Disregard my last response. I just realized I am mixing email threads, and this one is not about reading/converting to csv ...

Well, my job here is done. Off to find a cool libation ... 😉

cfspreadsheet and reading the xlsx files issue

Photos