Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

cfspreadsheet file size problem

Explorer ,
Sep 09, 2011 Sep 09, 2011

When I am reading in a list of Excel spreadsheets, sorted by size,

with <cfspreadsheet>, everything goes smoothly until the tag hits a file

bigger than 2MB.  That is when I get this error:

An error occurred while reading the Excel: org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0xFF left 6 bytes remaining still to be read..

This happens consistantly at a file size of 2MB.  The files are Excel 2003.  The only important factor I can see is that I have quite a few columns in my spreadsheet.  The columns go out all the way to "CX".

Does <cfspreadsheet> have some sort of limitation when it comes to file size and/or columns?

TIA

Zach

4.7K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Valorous Hero , Sep 09, 2011 Sep 09, 2011

I do not know what is causing the problem. But having read much bigger files, with more columns, I am not yet convinced it is solely a size problem.

- Can you read any files over 2MB or ones with a similar number of rows and columns?

- Any odd characters or charsets involved?

- Any idea what program produced the files? Either from viewing the metadata or the headers?

When I first saw this problem a few months back, I poured

hours into searching it up and down on Google and found very

little. 

Did yo

...
Translate
Valorous Hero ,
Sep 09, 2011 Sep 09, 2011

Hard to say. But I tend to doubt size is the cause seeing as how it is only 2MB.

- Can you post a sample file?

- Were all of the files created by Excel 2003 or a different program?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 09, 2011 Sep 09, 2011

Unfortunately, I cannot post a sample because it is chock full of

Personal Health Information (PHI).  I wish I could.  Even going through and trying to replace the PHI with dummy info would take quite a bit of time, not to mention dangerous if I missed something.

The files are all created in Excel 2003 format by one of the State of Colorado's websites.  (Actually, the quote from the website is "This report is available in Excel format.")  The files are not ".xlsx" for 2007 and 2010.  We just download them from their site.

The reason I have leaned towards file size (or size in general) is because of the error message talking about "6 bytes".  If I remove records from the file to reduce its total size below 2MB, it loads just fine.  Is this error message familiar to anyone?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Sep 09, 2011 Sep 09, 2011

not to mention dangerous if

I missed something.

Kudos on a very wise decision.

 

The files are all created in Excel 2003 format by one of

the State of Colorado's websites.  (Actually, the quote

from the website is "This report is available in Excel

format.")  The files are not ".xlsx" for 2007 and 2010. 

We just download them from their site.

Hmm... I wonder what they used to produce the files. Any clues if you dump the spreadsheetinfo details? I did a brief search for that error on the POI lists. A few threads mentioned malformed files can cause that error. I do not know if it applies to your files though.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 09, 2011 Sep 09, 2011

The reason I have leaned towards file size (or size in general) is because of the error message talking about "6 bytes".  If I remove records from the file to reduce its total size below 2MB, it loads just fine.  Is this error message familiar to anyone?

Good analysis.  Does it matter which records you remove?  If you're chopping them from the "edges" of the file, you might just be removing some data that is causing grief.

I've got no special knowledge to go on, but 2MB doesn't sound like a very big Excel file to me.  That being the case, if POI had a problem with it, then we'd've heard about it.

Did you google this?  Anything?

--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 09, 2011 Sep 09, 2011

I can remove records from the beginning, middle, or end, and it won't load until the file is smaller than 2MB.  And yes, you are right, 2MB isn't very many records (just over 2000, in my case).  Remember, that the records have a length of the last column being "CX", which is very long.

As far as malformed files, that logic would mean that I wouldn't be able to use cfspreadsheet on any of the files.  The tag works great on the files less than 2MB, that I downloaded directly from the portal.

As it stands right now, I have to chop up the files into 1000-record "bite site" smaller files.  After doing that, cfspreadsheet breezes right through them.

When I first saw this problem a few months back, I poured hours into searching it up and down on Google and found very little.  What I did find was mainly about problems and bugs in the Java code (nothing having to do with CF, just the Java function(s) that CFSpreadsheet uses apparently).  All of these hits on Google being years old.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Sep 09, 2011 Sep 09, 2011

I do not know what is causing the problem. But having read much bigger files, with more columns, I am not yet convinced it is solely a size problem.

- Can you read any files over 2MB or ones with a similar number of rows and columns?

- Any odd characters or charsets involved?

- Any idea what program produced the files? Either from viewing the metadata or the headers?

When I first saw this problem a few months back, I poured

hours into searching it up and down on Google and found very

little. 

Did you ever try running the files through POI's BiffViewer?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 13, 2011 Sep 13, 2011

I generated a 10MB xls file which had cols A-CX populated, and 2000 rows.  Each cell had a UUID in it.

It loads fine (taking between 500-1500ms).

I can remove records from the beginning, middle, or end, and it won't load until the file is smaller than 2MB.  And yes, you are right, 2MB isn't very many records (just over 2000, in my case).  Remember, that the records have a length of the last column being "CX", which is very long.

CX isn't actually that wide, on reflection.  It's only 101 columns.

I still think you've got some bung data in there.  Can you replace [any character] with "A" (ie: so it's got the same volume of data, but just no weird characters in there), and then try to read it in again.  See how you go.

--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Sep 13, 2011 Sep 13, 2011

Try one of POI's dev classes like the biff viewer. That will tell you very quickly what it is choking on.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 13, 2011 Sep 13, 2011

I am not a full-fledged Java Dev (yet).  Would you post some CF and/or CFScript code, explaining how to use the Biff Viewer?  Thank you!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Sep 13, 2011 Sep 13, 2011

You can run it via CF if you prefer. Just supply the "--out" parameter and full path to your file. The output file name will be the same plus ".out".

The file can get pretty big. It will contain a LOT of really low level stuff. Most of which you would have to be a POI developer to understand. I am guessing the viewer will not make it through your file and will throw an exception. But that should at least tell us where it is choking.

Let me know if the forum mangles this snippet. Fix up the tags as needed.

{cfset viewer = createObject("java", "org.apache.poi.hssf.dev.BiffViewer")}

{cfset args = ["--out", "c:/path/to/myFile.xls"]}

{cfset viewer.main(args)}

-Leigh

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Sep 13, 2011 Sep 13, 2011

>> I am guessing the viewer will not make it through your file and will

>> throw an exception.

Which would confirm the problem is with POI. If for some strange reason it did not throw any exceptions, that would suggest the problem may be with the CF layer instead.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 13, 2011 Sep 13, 2011

I will try to fill a spreadsheet with similar, non-PHI data, and see if I can get that shoved through.  If I can't get that through, I will post the spreadsheet.  I will post the results either way.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Sep 13, 2011 Sep 13, 2011

I will try to fill a spreadsheet with similar, non-PHI

data, and see if I can get that shoved through.

Okay, good. That would answer my earlier question #1. Any updates on #2 and #3?

ie

#2-Any odd characters or charsets involved?

#3- Any idea what program produced the files? Either from viewing the metadata or the headers?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 16, 2011 Sep 16, 2011

OK, hidden "crappy" data or something 'else' is the answer.  In messing around with the data, I decided to just take one of the files, directly downloaded from the State, and do this:

- Open it up in Excel (2010)

- Push the Save button

- Close Excel

I didn't save it in any other format, didn't change anything, etc.  Just 'saved' it.  The result was a file that was a few KB less, but not significant.

I ran CFSpreadsheet over the 2MB file again...and it loaded.  Jeez.  SO, it looks like the State's processing of these "Excel" files is adding a little something that was bungeing up this tag.  After I did this to every >2MB file, my code zipped through them all like a charm!

Thank you to everyone for helping me with this!  Sorry so anticlimatic.

Zach

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Sep 16, 2011 Sep 16, 2011

OK, hidden "crappy" data or something 'else' is the

answer. 

Out of curiosity, did the poi dump crash or tell you anything you could report to the POI team?

 

Thank you to everyone for helping me with this!  Sorry so

anticlimatic.

lol. Someone actually mentioned using that exact same fix on one of the POI threads. I was going to suggest that, but thought you had already tried it in your previous experiments. Wish I had suggested it now! But at least you have temporarily worked around the problem

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 16, 2011 Sep 16, 2011

To tell you the truth, after having the epiphany of just re-saving the files in Excel, I never got around to doing the Biff Viewer testing.  One of these days I might.  Heaven knows the State won't be fixing their side of this equation soon...

Thanks again!

Zach

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Sep 16, 2011 Sep 16, 2011
LATEST

Heaven knows the State won't be fixing their side of this

equation soon...

Yep, when I heard the word "State" that was a pretty much a given in my mind I was thinking in terms of finding a work-around. Or at least leaving a few more clues for the next guy. So if you do discover any more clues in the future, feel free to post back. But in any case, I am glad to hear you found something that works for you.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources