Copy link to clipboard
Copied
I have an application that allows users to upload multi-worksheet Excel files. I use CFSpreadsheet to parse these files and import the data. The processing can take anywhere from a minute to several minutes. The first worksheet contains data that dictates which additional worksheets require processing during the import.
While doing some load testing, I noticed that if two users upload files, the second user has to wait for the first user process to finish before it works. I have added CFLOG statements throughout my application, and done some simplified test scenarios (where I just read one worksheet and dump the result) that prove to me there is a blocking situation. Has anyone else seen this situation? Is there a way around this?
Copy link to clipboard
Copied
Can you post a simplified but complete test case that demonstrates the problem?
Also provide your versions of ColdFusion and JAVA.
Copy link to clipboard
Copied
Thanks for the response.
CF Version: 11,0,08,298512
Java Version: 1.8.0_25
I have also tried this on a different server running CF2016, and have witnessed the same behavior.
I have created two test files in the same directory. I have two XLSX files that each have 2 worksheets (so I can read from the second worksheet specifically). These files are ~300k in size and reside in the same directory as the CFM files. The second worksheets each have ~2000 rows x 10 columns of data.
The code of both CFM files is very similar to this (only changing the values of the "thisExecution" and "thisFile" variables.
I tail the application.log as load both of these from the same computer. (separate computers also present the same behavior).
The code: (for some reason isn't showing up - I have tried editing a few times)
<CFSETTING requesttimeout="6000" >
<CFSET private.thisExecution = "Test1" >
<CFSET private.thisFile = "testfile1.xlsx" >
<CFOUTPUT>#private.thisExecution#: #private.thisFile#: #private.dataType#</CFOUTPUT>
<CFLOG text="#private.thisExecution#: Start: Reading File #private.thisFile#" >
<CFSPREADSHEET action="read" src="#expandpath('#private.thisFile#')#" sheet="2" query="getData" headerrow="1" excludeheaderrow="true" >
<CFLOG text="#private.thisExecution#: End: Reading File #private.thisFile#" >
<CFLOG text="#private.thisExecution#: Sleep: 5 seconds" >
<CFSCRIPT>sleep(5000);</CFSCRIPT>
<CFLOG text="#private.thisExecution#: Start: Dump getData" >
<CFDUMP var="#getData#" >
<CFLOG text="#private.thisExecution#: End Dump getData" >
Picture of the code: (since I can't get it to output in this forum)
The log output:
"Information","ajp-bio-8014-exec-104","12/14/18","13:04:47",,"Test1: Start: Reading File ABC SUPPLY ROOFING.xlsx"
"Information","ajp-bio-8014-exec-87","12/14/18","13:04:48",,"Test2: Start: Reading File ALSIDE SUPPLY ROOFING.xlsx"
"Information","ajp-bio-8014-exec-104","12/14/18","13:04:58",,"Test1: End: Reading File ABC SUPPLY ROOFING.xlsx"
"Information","ajp-bio-8014-exec-104","12/14/18","13:04:58",,"Test1: Sleep: 5 seconds"
"Information","ajp-bio-8014-exec-104","12/14/18","13:05:03",,"Test1: Start: Dump getData"
"Information","ajp-bio-8014-exec-104","12/14/18","13:05:45",,"Test1: End: Dump getData"
"Information","ajp-bio-8014-exec-87","12/14/18","13:05:52",,"Test2: End: Reading File ALSIDE SUPPLY ROOFING.xlsx"
"Information","ajp-bio-8014-exec-87","12/14/18","13:05:52",,"Test2: Sleep: 5 seconds"
"Information","ajp-bio-8014-exec-87","12/14/18","13:05:57",,"Test2: Start: Dump getData"
"Information","ajp-bio-8014-exec-87","12/14/18","13:06:13",,"Test2: End: Dump getData"
They both start reading the files at around the same time. But script 2 waits until script 1 has finished working with the spreadsheet before it continues.
I am thinking CFSpreadsheet is not thread safe. Is there a way to explicitly kill the process after it is done reading the worksheet in the file? Since it creates a query object with the results of the read, I could duplicate the object, kill CFSpreadsheet and free it up for the next instance to use.
Copy link to clipboard
Copied
I won't have time to test this in my environment this week, but as a quick test, comment out the cfspreadsheet line and give GetData an arbitrary value so that the script completes without error.
If the same sequence of events happens, then cfspreadhseet is unlikely to be the cause. If the behavior is different, then there is a possibility that there is a problem.
As another test, you could replace cfspreadsheet with a binary read of the file and see if it affects the order of processing.
Copy link to clipboard
Copied
Thanks for the suggestions. I have already done a bunch of different tests to debug this, noteably, using a query with a similar amount of data from the database. It doesn't behave the same way. What I have provided here is the simplest way I have been able to reproduce and visualize the issue.
Copy link to clipboard
Copied
Erik, you refer to this as the tag being "not thread safe". Instead, it seems the opposite: it's single-threaded, which is the ULTIMATE in thread safety (only one thread at a time could run it). 🙂
And there may be a good explanation for that: are you running CF Standard (as opposed to Enterprise, Trial, or Developer)? If so, note that one of the things "held back" in Standard is that various tags/functions/script statements are in fact single-threaded. The spreadsheet ones (and indeed all those creating Office-style documents), as well as the PDF ones, and so on.
If you could run your test on a trial version of CF (which runs as Enterprise), and it worked as expected, at least you'd have your answer. Let us know what you find.
Copy link to clipboard
Copied
You are correct. I didn't mean to put the word "NOT" there. That was from previous edits.
I will test this out on a developer edition, as you are correct, we are running CF Standard.
Do you know where in Adobe documentation it explicitly says that these functions are limited to single-threads?
I found this document: Buying guide | Adobe ColdFusion Family but don't see an explanation of what "Restricted Functionality" means.
Copy link to clipboard
Copied
That is indeed the only resource and that phrase is explained only in a footnote on the page.
Copy link to clipboard
Copied
Hi erikpinpointdigital​, the execution of your 2 CFM files was not single-threaded. In fact, the threads that ran the pages were, respectively, ajp-bio-8014-exec-104 and ajp-bio-8014-exec-87.
Having said this, I can still imagine a situation where blocking occurs, just as you observed. The situation is when the 2 CFM files share the same value for the src or query attributes in <cfspreadsheet>. There might then be contention when 2 threads attempt to access the same file or the same query object.
If so, then the solution is obvious. Make sure the values of the attributes src and query in the first CFM file differ from the respective values in the second CFM file.
Copy link to clipboard
Copied
BKBK​ thanks. I had a hunch that might be an issue, and as such did indeed make sure it was reading two different files and using two different query attributes. I have even gone so far as to try copying the queries from the spreadsheet read into new variables and deleting the original queries. This really has me stumped. In 20+ years of CF development, this is the first time I haven't been able to find a solution or work around for something that puzzles me.
Copy link to clipboard
Copied
A sleep-time of 5 seconds is too short to enable us to draw any conclusions. Use 60 seconds, for example. I did.
I placed two copies of an Excel file on my desktop. I then opened page1.cfm in the browser and, within several seconds, page2.cfm.
page1.cfm
<cfsetting requesttimeout="66000">
<cfset thisExecution="Execution1">
<cfset thisFile="file1">
<cflog text="#thisExecution#. Start reading file: #thisFile#" >
<cfspreadsheet action="read" src="C:\Users\BKBK\Desktop\shipType.xlsx" query="shipData" sheet="2" headerrow="1" excludeheaderrow="true">
<cflog text="#thisExecution#. End reading file: #thisFile#" >
<cflog text="#thisExecution#. Start sleep" >
<cfset sleep(60000)>
<cflog text="#thisExecution#. End sleep / start dump in #thisExecution#" >
<cfdump var="#shipData#" >
<cflog text="#thisExecution#: end dump in #thisExecution#" >
page2.cfm
<cfsetting requesttimeout="6000">
<cfset thisExecution="Execution2">
<cfset thisFile="file2">
<cflog text="#thisExecution#. Start reading file: #thisFile#" >
<cfspreadsheet action="read" src="C:\Users\BKBK\Desktop\shipType2.xlsx" query="shipData2" sheet="2" headerrow="1" excludeheaderrow="true">
<cflog text="#thisExecution#. End reading file: #thisFile#" >
<cflog text="#thisExecution#: start dump in #thisExecution#" >
<cfdump var="#shipData2#" >
<cflog text="#thisExecution#: end dump in #thisExecution#" >
The result was as expected:
"Information","http-nio-8500-exec-2","12/17/18","21:37:16","FORUM","Execution1. Start reading file: file1"
"Information","http-nio-8500-exec-2","12/17/18","21:37:17","FORUM","Execution1. End reading file: file1"
"Information","http-nio-8500-exec-2","12/17/18","21:37:17","FORUM","Execution1. Start sleep"
"Information","http-nio-8500-exec-10","12/17/18","21:37:28","FORUM","Execution2. Start reading file: file2"
"Information","http-nio-8500-exec-10","12/17/18","21:37:28","FORUM","Execution2. End reading file: file2"
"Information","http-nio-8500-exec-10","12/17/18","21:37:28","FORUM","Execution2: start dump in Execution2"
"Information","http-nio-8500-exec-10","12/17/18","21:37:32","FORUM","Execution2: end dump in Execution2"
"Information","http-nio-8500-exec-2","12/17/18","21:38:17","FORUM","Execution1. End sleep / start dump in Execution1"
"Information","http-nio-8500-exec-2","12/17/18","21:38:20","FORUM","Execution1: end dump in Execution1"
Copy link to clipboard
Copied
Thank you for the continued effort with helping try to figure this out.
I increased my sleep to 60 seconds.
You can see that the behavior between our environments is different. Also, where is your "Start Dump in Execution1" ?
"Information","ajp-bio-8014-exec-156","12/17/18","16:50:22",,"Test1: Start: Reading File ABC SUPPLY ROOFING.xlsx"
"Information","ajp-bio-8014-exec-155","12/17/18","16:50:24",,"Test2: Start: Reading File ALSIDE SUPPLY ROOFING.xlsx"
"Information","ajp-bio-8014-exec-156","12/17/18","16:50:38",,"Test1: End: Reading File ABC SUPPLY ROOFING.xlsx"
"Information","ajp-bio-8014-exec-156","12/17/18","16:50:38",,"Test1: Sleep: 60 seconds"
"Information","ajp-bio-8014-exec-156","12/17/18","16:51:38",,"Test1: Start: Dump getData1"
"Information","ajp-bio-8014-exec-156","12/17/18","16:52:02",,"Test1: End: Dump getData1"
"Information","ajp-bio-8014-exec-155","12/17/18","16:52:03",,"Test2: End: Reading File ALSIDE SUPPLY ROOFING.xlsx"
"Information","ajp-bio-8014-exec-155","12/17/18","16:52:03",,"Test2: Sleep: 60 seconds"
"Information","ajp-bio-8014-exec-155","12/17/18","16:53:03",,"Test2: Start: Dump getData2"
"Information","ajp-bio-8014-exec-155","12/17/18","16:53:24",,"Test2: End: Dump getData2"
Copy link to clipboard
Copied
erikpinpointdigital wrote
...where is your "Start Dump in Execution1" ?
"Information","http-nio-8500-exec-2","12/17/18","21:38:17","FORUM","Execution1. End sleep / start dump in Execution1
erikpinpointdigital wrote
I increased my sleep to 60 seconds.
You can see that the behavior between our environments is different.
I think the difference so far is in what we are testing. I am using just one sleep(), you are using two.
My test was to see if the first request blocked the second. If it did the second request would have had to wait 60 seconds before completing. It didn't.
Copy link to clipboard
Copied
Guys, can you both stop and clarify whether your testing is each on cf Standard or on Enterprise/trial/developer?
If it's on Standard, then no sleeping will help. They will not be able to run concurrently. Again it's CF that is imposing the single threading.
And BK, while you're right that you would indeed SEE both threads listed in a thread dump as "running" an ajp thread at the same time, the proof that it's being single-threaded by CF Standard is that the thread that is held up will be in a method referring to this class:
coldfusion.featurerouter.edition.StandardServerEdition
So can you both start with confirming the edition you are running? And Erik, if you are Standard, and you're seeing a stack trace or thread dump at the time, can you tell us if the request hanging shows being in this class at that moment?
Copy link to clipboard
Copied
Hi Charlie,
I am running the Developer Edition.
Copy link to clipboard
Copied
And Erik?
Copy link to clipboard
Copied
I am running Standard in a production environment. Client can't justify the expense of moving to Enterprise for this solution, so we have discussed going a different route, and creating a processing queue for files that are uploaded by the customers. It was nice to be able to do it realtime, as I had programmed a "console" that showed them the progress (and any potential errors) of their file as I was parsing it. But it is what it is, I suppose.
Copy link to clipboard
Copied
The code you have shown may be run simultaneously on separate CFM pages in ColdFusion 11 Standard. With suitable changes, such as in variable names, of course. If that were not so, then websites built with ColdFusion 11 Standard would only handle one visitor at a time.
Copy link to clipboard
Copied
Well, bkbk, again the request WILL be single-threaded while running the cfspreadsheet. No changing of var names will alter that.
It's a fundamental limitation of cf standard (that some tags are indeed single-threaded) which explains Erik's original problem posted here.
No, it's not that ALL "websites built with ColdFusion Standard would only handle one visitor at a time", but yes it's that any using such an EFR-limited tag /feature will run only when no other request is using that same tag.
So one either accepts that limitation (and the delay it can cause) , or they manage the work (like Erik is proposing), or they move to CF Enterprise (or Lucee, assuming it supports the desired tag/feature).
Or instead of switching their Cf engine over ENTIRELY, one could implement such a alternate engine (even on the same machine as CF) and call it via cfhttp or REST, to have that engine run that tag/feature and return the result to CF.
(It may be possible to find a CF Enterprise license for an old CF version on the secondary market. Since it need not be open to access to other than such inter-process communications, the negative of using that old version beyond its end-of-life even would seemingly be reduced.)
Just offering ideas for someone feeling in a bind.
Copy link to clipboard
Copied
Hi Charlie,
On re-reading my replies I realize they might give the impression that I disagree with your point. Sorry.
With "changing of var names" and "websites built with ColdFusion Standard would only handle one visitor at a time" I meant something else. Namely, the independence of the server's responses to separate CFM requests. Best-practice in encapsulation and client-server architecture require that the client know nothing about how the server handles in-coming requests. Further explanation follows in a moment.
But first, I agree with your point about ColdFusion's Enterprise Feature Router (EFR). In fact, given its importance. I shall repeat it here. Lest there be any confusion.
The EFR restricts ColdFusion Standard Edition to one, single thread when processing a function or tag in any of the following categories:
Asynchronous CFML Gateway
Data Management Service
Data Services Messaging
Directory Watcher Gateway
Enterprise IM Gateways (Sametime and XMPP)
Gateway Architecture
Interface For Publish/Subscribe Using Web Socket
JMS Gateway
Microsoft Office 2010 Excel Support
Online Presentation Generation
PDF Document Manipulation
PDF File Generation From Html
PDF Form Processing
Server-Side Printing
SMS Gateway
Structured Reporting
TCP/IP Socket Gateway
There is no such restriction in ColdFusion Enterprise Edition. There these functions and tags each enjoy multi-threaded access and processing.
Now, to the point I wish to make. The suggestion to use distinct names for variables is to avoid them being overwritten. In spite of the EFR restriction in ColdFusion Standard, multiple CFM requests may access an EFR-restricted function or tag, such as <cfspreadsheet>, simultaneously. ColdFusion Standard does not queue or block the requests. It queues or blocks just that part of the request that invokes the function or tag.
The remaining business of the CFM request is not queued or blocked. The moment ColdFusion finishes processing the EFR-restricted function or tag it hands processing back to the respective request threads. Typically, the remaining business may consume the most execution time in the request-thread. But that wont be because of the EFR restriction.
To conclude, in spite of the EFR restriction, ColdFusion Standard has been optimized to give the semblance of multi-threading.