Skip to main content
Inspiring
September 9, 2011
Answered

cfspreadsheet file size problem

  • September 9, 2011
  • 1 reply
  • 4692 views

When I am reading in a list of Excel spreadsheets, sorted by size,

with <cfspreadsheet>, everything goes smoothly until the tag hits a file

bigger than 2MB.  That is when I get this error:

An error occurred while reading the Excel: org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0xFF left 6 bytes remaining still to be read..

This happens consistantly at a file size of 2MB.  The files are Excel 2003.  The only important factor I can see is that I have quite a few columns in my spreadsheet.  The columns go out all the way to "CX".

Does <cfspreadsheet> have some sort of limitation when it comes to file size and/or columns?

TIA

Zach

    This topic has been closed for replies.
    Correct answer -__cfSearching__-

    I can remove records from the beginning, middle, or end, and it won't load until the file is smaller than 2MB.  And yes, you are right, 2MB isn't very many records (just over 2000, in my case).  Remember, that the records have a length of the last column being "CX", which is very long.

    As far as malformed files, that logic would mean that I wouldn't be able to use cfspreadsheet on any of the files.  The tag works great on the files less than 2MB, that I downloaded directly from the portal.

    As it stands right now, I have to chop up the files into 1000-record "bite site" smaller files.  After doing that, cfspreadsheet breezes right through them.

    When I first saw this problem a few months back, I poured hours into searching it up and down on Google and found very little.  What I did find was mainly about problems and bugs in the Java code (nothing having to do with CF, just the Java function(s) that CFSpreadsheet uses apparently).  All of these hits on Google being years old.


    I do not know what is causing the problem. But having read much bigger files, with more columns, I am not yet convinced it is solely a size problem.

    - Can you read any files over 2MB or ones with a similar number of rows and columns?

    - Any odd characters or charsets involved?

    - Any idea what program produced the files? Either from viewing the metadata or the headers?

    When I first saw this problem a few months back, I poured

    hours into searching it up and down on Google and found very

    little. 

    Did you ever try running the files through POI's BiffViewer?

    1 reply

    Inspiring
    September 9, 2011

    Hard to say. But I tend to doubt size is the cause seeing as how it is only 2MB.

    - Can you post a sample file?

    - Were all of the files created by Excel 2003 or a different program?

    Inspiring
    September 9, 2011

    Unfortunately, I cannot post a sample because it is chock full of

    Personal Health Information (PHI).  I wish I could.  Even going through and trying to replace the PHI with dummy info would take quite a bit of time, not to mention dangerous if I missed something.

    The files are all created in Excel 2003 format by one of the State of Colorado's websites.  (Actually, the quote from the website is "This report is available in Excel format.")  The files are not ".xlsx" for 2007 and 2010.  We just download them from their site.

    The reason I have leaned towards file size (or size in general) is because of the error message talking about "6 bytes".  If I remove records from the file to reduce its total size below 2MB, it loads just fine.  Is this error message familiar to anyone?

    Inspiring
    September 13, 2011

    I generated a 10MB xls file which had cols A-CX populated, and 2000 rows.  Each cell had a UUID in it.

    It loads fine (taking between 500-1500ms).

    I can remove records from the beginning, middle, or end, and it won't load until the file is smaller than 2MB.  And yes, you are right, 2MB isn't very many records (just over 2000, in my case).  Remember, that the records have a length of the last column being "CX", which is very long.

    CX isn't actually that wide, on reflection.  It's only 101 columns.

    I still think you've got some bung data in there.  Can you replace [any character] with "A" (ie: so it's got the same volume of data, but just no weird characters in there), and then try to read it in again.  See how you go.

    --

    Adam


    I will try to fill a spreadsheet with similar, non-PHI data, and see if I can get that shoved through.  If I can't get that through, I will post the spreadsheet.  I will post the results either way.