• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

cfspreadsheet and reading the xlsx files issue

New Here ,
Jun 09, 2011 Jun 09, 2011

Copy link to clipboard

Copied

I am trying to process large excel files (65,000 rows) into the DB.  So I try to use the new cfspreadsheet tag to read in the files by 5000 rows at a time and then process the query into the tables.  The process is working fine when reading the (pre 2003 format "xls" files) but not when reading the new "xlsx" type of files.  The server is running out of memory trying to read those files for some reason.  Is this a known bug in ColdFusion 9.0??  Has anyone experiansed similar issues?

Thanks

Views

5.9K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 10, 2011 Jun 10, 2011

Copy link to clipboard

Copied

The docs claim all versions of Excel from 1997-2007 are supported.

Are you getting this with •any• .xlsx file, or just large ones?

Do you need CF sitting in the middle of this process? Can you not get the DB to import the Excel file directly?

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 10, 2011 Jun 10, 2011

Copy link to clipboard

Copied

I need to do some processing on the data first that comes from the spreadsheet.  This seems to be the problem with the very large xlsx type spreadsheets.  The older format xls (up to 2003) is working great even with 65,000 rows of data.  So I think this is something to do with xlsx and very large spreadsheets.

- Pavel

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 10, 2011 Jun 10, 2011

Copy link to clipboard

Copied

That it runs OK with a large .xls & have problems with large .xlsx files does not demonstrate it's specifically a problem with the size of the .xlsx, rather that being a problem with .xlsx files in general. And whether it's one way or the other dictates what the solution might be.

Test (and compare performance) on a smaller file.

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 10, 2011 Jun 10, 2011

Copy link to clipboard

Copied

Yah, I think you are right on the fact that there is a problem when reading the xlsx files with cfspreadsheet tag.

When reading the xlsx file the ColdFusion is using much more JVM memory then reading xls file and its also noticably slower, so I think that there is a problem when reading the xlsx file format in general but as long as the file is small its working fine. But once you get to files with 40,000 + rows the memory issue grows exponantially and server runs out of memory.

- Pavel

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 10, 2011 Jun 10, 2011

Copy link to clipboard

Copied

If you've got time, grab Ben Nadel's POI wrapper (http://www.bennadel.com/projects/poi-utility.htm) and run a test with that instead of CFSPREADSHEET. Just to find out if it's a POI issue (not likely, admittedly), of CF's usage thereof.

And if using POI directly works OK... you can just use that instead, maybe...

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Jun 10, 2011 Jun 10, 2011

Copy link to clipboard

Copied

if it's a POI issue (not likely, admittedly), of CF's

usage thereof.

If you check the POI archives, you should see a LOT of memory issues with large xlsx files. Usually related to how the file is processed. IIRC the standard way of loading a spreadsheet ie new XSSFWorkbook(name) just reads the whole file into memory in one shot. So with all the bloated xml and parsing going on it is very prone to OOM errors.

For big files, they actually recommend using the event api instead, because of its lower footprint. I used it a couple months ago for a side project and it made a huge difference!  I did some basic profiling and the standard method (usermodel) was definitely more prone to big spikes in memory and oom errors. In contrast, memory usage with event api was far lower and more stable.

Reason for bringing this all of this up ...? Well depending on how CF actually reads/loads spreadsheets internally, maybe the problem is a little bit of both .. ?  Just my $0.02

-Leigh

Message was edited by: -==cfSearching==-

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Jun 10, 2011 Jun 10, 2011

Copy link to clipboard

Copied

LATEST

*Oh Crap*  Disregard my last response.  I just realized I am mixing email threads, and this one is not about reading/converting to csv ...

Well, my job here is done. Off to find a cool libation ...  😉

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation