We have a legacy application with a sprawling directory of files. It is our suspicion that a large number of them were created for purposes long since forgotten. Because of the number of files, it would take a lot of time and effort to go through the directories and identify those which are no longer in use. I often use Search >> File to search file names and see if there are any files containing a file name string, but I would like to delete the extraneous files without doing this for each file in the application.
Is there an easy way to identify all of the files which are not linked to other files in the application?
This is, I think, more difficult than it seems it should be! There are so many ways a file might be referenced as part of your application without even having its name specified. Most of those ways aren't good to use, but they're there and there's nothing you can really do about it. For example, let's say you have a file where the user can provide form input, then you use that input in a less-than-obvious manner to reference a file once the form is submitted.
So, my recommmendation is to do something different. First, add a CFLOG reference to the top of every file that logs the file's use. You can do this in Application.cfm or Application.cfc, I think, or you can explicitly do it within each file. Then, run a test script that performs ABSOLUTELY EVERY TASK within your application. This is good for other things beyond solving this problem, like being able to test every task within your application! Of course, you'll have to write the test script yourself.
Dave Watts, Eidolon LLC
During onRequestEnd, you could capture the exception data, loop over "tagContext" array and log the data as desired:
<cfset oException = createObject("java","java.lang.Exception").init()> <cfdump var="#oException#">
This approach would assist in identifying line number, type of include (CFInclude, javaproxy, etc), template path and file type. (This is the opposite of what you requested, but is probably the more pragmatic way of identifying files that aren't used.)
It is possible that some requests may access different internal scripts based on parameters passed and/or user permissions, so you'll want to log the data for an extended period so you can capture a large variation of users and requests.
Here's a similar question on StackOverflow with some alternate solutions:
To further assist you (and my curiosity), I wrote a UDF based on the above methods that returns line, parent, template, function and totaltime.
If you desire to suppress the classic debug output, you could either replace "/WEB-INF/debug/classic.cfm" with an empty script (which is what I am doing) or choose an editable script from https://github.com/joshknutson/coldfusion-debug so that you can specifically enable it for desired IPs in a hard-coded (or request-scoped) variable.
After modifying the debugging script, I was able to output the information as desired without any appended HTML or JS content. (NOTE: This is not desirable. It munges JSON and XML output, etc.)
Thanks for all of the earnest help, which has been more sophisticated than expected and probably more than the project requires 😉
I grabbed the script from GitHub and have implemented it in the application.cfc onRequestEnd method. I am 'dumping' the output at page bottom and I'm working on a way to capture it. So far, I have created a text log file and tried append with CFFILE. I also tried adding the output parameter to CFDUMP but so far neither has worked. I'm using as my output
#getTemplatesUsed()# but it might need to be converted to text first.
Our organization has been in the process of securing servers and apps using application security scans (and significant developer elbow grease) for the last couple of years. Now they are doing SAST scanning which will detect code vulnerabilities in the repository and there are thousands, so I'm hoping that chopping off the useless files will bring the number down to hundreds. I have already used 'minimally invasive' techniques to section off superfluous sections of the front end from users and application scans. Now they are testing 'under the hood' though, so I will have to delete the files outright. This method will likely show that all of the areas I have sectioned off are unused and let me know which of the active ones are the real deal. So thanks again for the help--it just might impress my boss. But that's tough to do, so MAYBE!
Allen, I'll throw out one more possibility (and James, I have a question about yours).
First, taking off from Dave's suggestion, note that there are tools (including free ones) which can audit any and all files accessed (by any process or a given process), and that can be logged. You could run such a tool for a day or a week and use its output to help you find what files were never accessed...though again that's only for that timeframe you audit. And of course, the longer you audit the larger the audit log will be. I list some such tools here: https://www.cf411.com/security_file_change_detection (their focus is on files "changed", but some can just watch files "read").
James, back to your first comment, can you clarify what "exception" you're referring to? I'm only now seeing this thread, but neither Allen's or Dave's comments before your refer to any exception. Is it perhaps that they've been edited? Or if not, can you clarify the connection between any exceptions and Allen's original post? I'm not asking to be annoying, but just to connect the dots (and to help future readers).
I see how your subsequent comments instead focus on leveraging the debugging output, to the degree that it tracks each file referenced. That said, it will only track CFM or CFC pages (or any extensions that might be cfincluded), but it's not clear if Allen's folder of files is only CFM files or (as Dave was addressing) perhaps also files uploaded or created by users and perhaps accesses with things like cffile, etc, which the debugging would not track.
Still, you both give him ideas to get started, and perhaps one of mine may help if those somehow don't.
I was referring to the data in the object returned from java.lang.Exception. I typically use this when consulting on older third-party spaghetti code projects to determine which templates were used within a request. While this doesn't return all the great information that debugging does, at least it doesn't require that debugging is enabled.
You are correct. The techniques I shared won't identify any non-CFML files that are accessed (log files, file uploads, fileread(), generated XML/JSON, etc).
If using the getTemplatesUsed() UDF, a developer could retain a collection of unique paths (ie, as a server variable) and then log any new paths that are encountered to a file. It's not perfect, but this approach wouldn't require external software or monitoring and would only identify CFM/CFC files that are used so that you can audit against the actual directory to identify unused (or lesser used) templates.
Cool.. Lots of good ideas for Allen to consider. 🙂
I took a look at your apps list and we've actually tried FuseGuard for vulnerability remediation and it's VeraCode which is reporting our vulnerabilities. Looks like a good overall resource for web development in general, and ColdFusion in specific. My initial thought was to take a client based approach and use Builder or PowerShell to identify/delete. I've seen where PowerShell can do a recursive search through directories and file contents and I was thining maybe you could go one file at a time, look for references in every other file for the current file name, and delete the file if it was not referenced anywhere else. Then, validate the site and see if any mistakes were made/bring back files that were incorrectly deleted. I thought I also saw a feature which could build a list in the old Dreamweaver if you set it right and refined the process. The other ideas cited here were more 'real time' and based on actual usage, so they might be just a hair better.