Skip to main content
Participant
April 10, 2025
Question

How to Retrieve All Files from S3 Bucket When There Are More Than 1000 Files Using ColdFusion?

  • April 10, 2025
  • 3 replies
  • 284 views

Hello,

I am currently working on an S3 integration using ColdFusion, and I'm facing an issue when trying to retrieve all the files from an S3 bucket. I am using the listAll() method, but it seems that it only returns a maximum of 1000 files, and I need to handle cases where there are more than 1000 files in the bucket.

I understand that Amazon S3 uses pagination for listing objects, and I need to handle the NextContinuationToken to paginate through the results, but I'm unsure how to implement this correctly in ColdFusion.

Has anyone experienced this issue or can offer advice on how to paginate through the list of objects in S3 using ColdFusion to retrieve more than 1000 files?

Here is the code I'm currently using:
<cfset allObjects = bucket.listAll()>

Can someone please guide me on how to modify this to handle pagination when there are more than 1000 objects in the bucket?

Thanks in advance!

    3 replies

    BKBK
    Community Expert
    Community Expert
    April 12, 2025
    <!--- Function to paginate an array --->
    <cffunction name="paginateArray" returntype="struct">
    	
    	<cfargument name="inputArray" type="array" required="yes" hint="Array that consists of a list to be paginated. That is, an array you want to be split up into sub-arrays.">	
    	<cfargument name="inputNumberPerPage" type="numeric" required="yes" hint="The maximum number of objects per page. That is, the maximum number of elements per sub-array.">
    
    	<!--- A page = a sub-array of the input array --->
    	<cfset var page = arrayNew(1)>
    	
    	<!--- The data to be returned will be assembled in a struct --->
    	<cfset var returnData = structNew()>
    
    	<cfset var numberOfObjects = arrayLen(arguments.inputArray)>
    	<cfset var numberOfObjectsPerPage = arguments.inputNumberPerPage>
    	<cfset var numberOfPages = ceiling(numberOfObjects/numberOfObjectsPerPage)>
    	
    	<!--- Initialize page counter --->
    	<cfset var np = 0>
    	
    	<cfif numberOfObjects GT 0>
    		<cfloop index="np" from="1" to="#numberOfPages#">
    			<cfif np LT numberOfPages>
    				<!--- Pages that are 'full'. That is, pages each of which contains the max number of objects per page --->
    				<cfset page[np] = arraySlice(arguments.inputArray,1+(np-1)*numberOfObjectsPerPage,numberOfObjectsPerPage)>
    			<cfelse>
    				<!--- This line handles the last page separately. That is because the last page may contain fewer than the max number of objects allowed per page --->
    				<cfset page[np] = arraySlice(arguments.inputArray,1+(np-1)*numberOfObjectsPerPage,numberOfObjects-(np-1)*numberOfObjectsPerPage)>
    			</cfif>
    		</cfloop>		
    	</cfif>
    	
    	<!--- Assemble return data --->
    	<cfset returnData = { "numberOfObjects":numberOfObjects,
    						  "numberOfObjectsPerPage":numberOfObjectsPerPage,
    						  "numberOfPages":numberOfPages,
    						  "page":page }>
    						  
    	<cfreturn returnData>
    	
    </cffunction>

     

    <!--- Test run --->
    <cfset testArray=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q']>
    <cfset numberOfObjectsPerPage=4>
    
    <cfset paginationResult = paginateArray(testArray,numberOfObjectsPerPage)>
    <p>
    	<cfoutput>
    		Number of objects: #paginationResult.numberOfObjects# <br>
    		Number of objects per page: #paginationResult.numberOfObjectsPerPage# <br>
    		Number of pages: #paginationResult.numberOfPages# <br>
    	</cfoutput>
    </p>
    	
    <cfloop index="n" from="1" to="#paginationResult.numberOfPages#">
    	<p>
    		<cfdump var="#paginationResult.page[n]#" label="paginationResult.page[#n#]">
    	</p>
    </cfloop>
    		
    Participating Frequently
    April 11, 2025

    Here is a somewhat crude example, it appends an array of the bucket array, what you want is to use options.marker which will keep track of the last object so it knows where to start.


            <cfscript>
                storage_bucket = "Your Bucket Name";
                prefix = {
                    "prefix" = "PREFIX/NAME/IF/YOU/ONLY/NEEDED/Sub/Objects/"
                };

                allObjects = []; // Array to store all objects
                marker = ""; // Marker to track the last object
                maxObjects = 1000; // AWS S3 default limit

                do {
                    // Prepare the options for the listAll method
                    options = {
                        "prefix": prefix.prefix
                    };

                    // Add the marker only if it's not blank
                    if (len(marker) > 0) {
                        options.marker = marker;
                    }

                    // Fetch objects with the current options
                    bucketList = s3Obj.bucket(storage_bucket, false).listAll(options);

                    bucketList = bucketList['response'];

                    // Append the current batch of objects to the allObjects array
                    arrayAppend(allObjects, bucketList);

                    // Update the marker to the key of the last object in the current batch
                    if (arrayLen(bucketList) > 0) {
                        marker = bucketList[arrayLen(bucketList)].key;
                    }

                } while (arrayLen(bucketList) == maxObjects); // Continue if the batch size is 1000

                // Output all objects
                writeDump(var='#allObjects#', abort='true');
            </cfscript>

     

    BKBK
    Community Expert
    Community Expert
    April 11, 2025

    Let's assume you start off with an array. If your result is not an array, then convert it into one.

     

    You could then use the ColdFusion function ArraySlice for pagination. The following demo is a fully worked out example. 

    <cfset objectsArray=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q']>
    
    <cfset numberOfObjects = arrayLen(objectsArray)>
    <cfset numberOfObjectsPerPage = 4>
    <cfset numberOfPages = ceiling(numberOfObjects/numberOfObjectsPerPage)>
    
    <p>
    	<cfoutput>
    		List: #arrayToList(objectsArray)# <br>
    		Number of objects: #numberOfObjects# <br>
    		Number of objects per page: #numberOfObjectsPerPage# <br>
    		Number of pages: #numberOfPages# <br>
    	</cfoutput>
    </p>
    
    <cfset page = arrayNew(1)>
    
    <cfif numberOfObjects GT 0>
    	<cfloop index="np" from="1" to="#numberOfPages#">
    		<cfif np LT numberOfPages>
    			<!--- Pages that are 'full'. Each contains the max number of objects allowed per page --->
    			<cfset page[np] = arraySlice(objectsArray,1+(np-1)*numberOfObjectsPerPage,numberOfObjectsPerPage)>
    		<cfelse>
    			<!--- Relevant when the last page has fewer than the max number of objects allowed per page --->
    			<cfset page[np] = arraySlice(objectsArray,1+(np-1)*numberOfObjectsPerPage,numberOfObjects-(np-1)*numberOfObjectsPerPage)>
    		</cfif>
    	</cfloop>
    	
    	<cfloop index="np" from="1" to="#numberOfPages#">
    		<p>
    			<cfdump var="#page[np]#" label="Page[#np#]">
    		</p>
    	</cfloop>
    </cfif>
    

    Just plug in your own values for objectsArray and numberOfObjectsPerPage in the above code, and you're done.