Copy link to clipboard
Copied
When I am reading in a list of Excel spreadsheets, sorted by size,
with <cfspreadsheet>, everything goes smoothly until the tag hits a file
bigger than 2MB. That is when I get this error:
This happens consistantly at a file size of 2MB. The files are Excel 2003. The only important factor I can see is that I have quite a few columns in my spreadsheet. The columns go out all the way to "CX".
Does <cfspreadsheet> have some sort of limitation when it comes to file size and/or columns?
TIA
Zach
I do not know what is causing the problem. But having read much bigger files, with more columns, I am not yet convinced it is solely a size problem.
- Can you read any files over 2MB or ones with a similar number of rows and columns?
- Any odd characters or charsets involved?
- Any idea what program produced the files? Either from viewing the metadata or the headers?
When I first saw this problem a few months back, I poured
hours into searching it up and down on Google and found very
little.
Did yo
...Copy link to clipboard
Copied
Hard to say. But I tend to doubt size is the cause seeing as how it is only 2MB.
- Can you post a sample file?
- Were all of the files created by Excel 2003 or a different program?
Copy link to clipboard
Copied
Unfortunately, I cannot post a sample because it is chock full of
Personal Health Information (PHI). I wish I could. Even going through and trying to replace the PHI with dummy info would take quite a bit of time, not to mention dangerous if I missed something.
The files are all created in Excel 2003 format by one of the State of Colorado's websites. (Actually, the quote from the website is "This report is available in Excel format.") The files are not ".xlsx" for 2007 and 2010. We just download them from their site.
The reason I have leaned towards file size (or size in general) is because of the error message talking about "6 bytes". If I remove records from the file to reduce its total size below 2MB, it loads just fine. Is this error message familiar to anyone?
Copy link to clipboard
Copied
not to mention dangerous if
I missed something.
Kudos on a very wise decision.
The files are all created in Excel 2003 format by one of
the State of Colorado's websites. (Actually, the quote
from the website is "This report is available in Excel
format.") The files are not ".xlsx" for 2007 and 2010.
We just download them from their site.
Hmm... I wonder what they used to produce the files. Any clues if you dump the spreadsheetinfo details? I did a brief search for that error on the POI lists. A few threads mentioned malformed files can cause that error. I do not know if it applies to your files though.
Copy link to clipboard
Copied
The reason I have leaned towards file size (or size in general) is because of the error message talking about "6 bytes". If I remove records from the file to reduce its total size below 2MB, it loads just fine. Is this error message familiar to anyone?
Good analysis. Does it matter which records you remove? If you're chopping them from the "edges" of the file, you might just be removing some data that is causing grief.
I've got no special knowledge to go on, but 2MB doesn't sound like a very big Excel file to me. That being the case, if POI had a problem with it, then we'd've heard about it.
Did you google this? Anything?
--
Adam
Copy link to clipboard
Copied
I can remove records from the beginning, middle, or end, and it won't load until the file is smaller than 2MB. And yes, you are right, 2MB isn't very many records (just over 2000, in my case). Remember, that the records have a length of the last column being "CX", which is very long.
As far as malformed files, that logic would mean that I wouldn't be able to use cfspreadsheet on any of the files. The tag works great on the files less than 2MB, that I downloaded directly from the portal.
As it stands right now, I have to chop up the files into 1000-record "bite site" smaller files. After doing that, cfspreadsheet breezes right through them.
When I first saw this problem a few months back, I poured hours into searching it up and down on Google and found very little. What I did find was mainly about problems and bugs in the Java code (nothing having to do with CF, just the Java function(s) that CFSpreadsheet uses apparently). All of these hits on Google being years old.
Copy link to clipboard
Copied
I do not know what is causing the problem. But having read much bigger files, with more columns, I am not yet convinced it is solely a size problem.
- Can you read any files over 2MB or ones with a similar number of rows and columns?
- Any odd characters or charsets involved?
- Any idea what program produced the files? Either from viewing the metadata or the headers?
When I first saw this problem a few months back, I poured
hours into searching it up and down on Google and found very
little.
Did you ever try running the files through POI's BiffViewer?
Copy link to clipboard
Copied
I generated a 10MB xls file which had cols A-CX populated, and 2000 rows. Each cell had a UUID in it.
It loads fine (taking between 500-1500ms).
I can remove records from the beginning, middle, or end, and it won't load until the file is smaller than 2MB. And yes, you are right, 2MB isn't very many records (just over 2000, in my case). Remember, that the records have a length of the last column being "CX", which is very long.
CX isn't actually that wide, on reflection. It's only 101 columns.
I still think you've got some bung data in there. Can you replace [any character] with "A" (ie: so it's got the same volume of data, but just no weird characters in there), and then try to read it in again. See how you go.
--
Adam
Copy link to clipboard
Copied
Try one of POI's dev classes like the biff viewer. That will tell you very quickly what it is choking on.
Copy link to clipboard
Copied
I am not a full-fledged Java Dev (yet). Would you post some CF and/or CFScript code, explaining how to use the Biff Viewer? Thank you!
Copy link to clipboard
Copied
You can run it via CF if you prefer. Just supply the "--out" parameter and full path to your file. The output file name will be the same plus ".out".
The file can get pretty big. It will contain a LOT of really low level stuff. Most of which you would have to be a POI developer to understand. I am guessing the viewer will not make it through your file and will throw an exception. But that should at least tell us where it is choking.
Let me know if the forum mangles this snippet. Fix up the tags as needed.
{cfset viewer = createObject("java", "org.apache.poi.hssf.dev.BiffViewer")}
{cfset args = ["--out", "c:/path/to/myFile.xls"]}
{cfset viewer.main(args)}
-Leigh
Copy link to clipboard
Copied
>> I am guessing the viewer will not make it through your file and will
>> throw an exception.
Which would confirm the problem is with POI. If for some strange reason it did not throw any exceptions, that would suggest the problem may be with the CF layer instead.
Copy link to clipboard
Copied
I will try to fill a spreadsheet with similar, non-PHI data, and see if I can get that shoved through. If I can't get that through, I will post the spreadsheet. I will post the results either way.
Copy link to clipboard
Copied
I will try to fill a spreadsheet with similar, non-PHI
data, and see if I can get that shoved through.
Okay, good. That would answer my earlier question #1. Any updates on #2 and #3?
ie
#2-Any odd characters or charsets involved?
#3- Any idea what program produced the files? Either from viewing the metadata or the headers?
Copy link to clipboard
Copied
OK, hidden "crappy" data or something 'else' is the answer. In messing around with the data, I decided to just take one of the files, directly downloaded from the State, and do this:
- Open it up in Excel (2010)
- Push the Save button
- Close Excel
I didn't save it in any other format, didn't change anything, etc. Just 'saved' it. The result was a file that was a few KB less, but not significant.
I ran CFSpreadsheet over the 2MB file again...and it loaded. Jeez. SO, it looks like the State's processing of these "Excel" files is adding a little something that was bungeing up this tag. After I did this to every >2MB file, my code zipped through them all like a charm!
Thank you to everyone for helping me with this! Sorry so anticlimatic.
Zach
Copy link to clipboard
Copied
OK, hidden "crappy" data or something 'else' is the
answer.
Out of curiosity, did the poi dump crash or tell you anything you could report to the POI team?
Thank you to everyone for helping me with this! Sorry so
anticlimatic.
lol. Someone actually mentioned using that exact same fix on one of the POI threads. I was going to suggest that, but thought you had already tried it in your previous experiments. Wish I had suggested it now! But at least you have temporarily worked around the problem ![]()
Copy link to clipboard
Copied
To tell you the truth, after having the epiphany of just re-saving the files in Excel, I never got around to doing the Biff Viewer testing. One of these days I might. Heaven knows the State won't be fixing their side of this equation soon...
Thanks again!
Zach
Copy link to clipboard
Copied
Heaven knows the State won't be fixing their side of this
equation soon...
Yep, when I heard the word "State" that was a pretty much a given in my mind
I was thinking in terms of finding a work-around. Or at least leaving a few more clues for the next guy. So if you do discover any more clues in the future, feel free to post back. But in any case, I am glad to hear you found something that works for you.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now