Skip to main content
Participant
October 26, 2020
Answered

Identify Unlinked Files

  • October 26, 2020
  • 4 replies
  • 1073 views

We have a legacy application with a sprawling directory of files. It is our suspicion that a large number of them were created for purposes long since forgotten. Because of the number of files, it would take a lot of time and effort to go through the directories and identify those which are no longer in use. I often use Search >> File to search file names and see if there are any files containing a file name string, but I would like to delete the extraneous files without doing this for each file in the application.

 

Is there an easy way to identify all of the files which are not linked to other files in the application?

This topic has been closed for replies.
Correct answer James Moberg

Here's a similar question on StackOverflow with some alternate solutions:

https://stackoverflow.com/questions/35370081/coldfusion-show-all-included-files-on-load

 

To further assist you (and my curiosity), I wrote a UDF based on the above methods that returns line, parent, template, function and totaltime.

https://gist.github.com/JamoCA/c2b90b5596d0b9c352aafc3a76ef0674

4 replies

Charlie Arehart
Community Expert
Community Expert
October 29, 2020

Allen, I'll throw out one more possibility (and James, I have a question about yours).

 

First, taking off from Dave's suggestion, note that there are tools (including free ones) which can audit any and all files accessed (by any process or a given process), and that can be logged. You could run such a tool for a day or a week and use its output to help you find what files were never accessed...though again that's only for that timeframe you audit. And of course, the longer you audit the larger the audit log will be. I list some such tools here: https://www.cf411.com/security_file_change_detection (their focus is on files "changed", but some can just watch files "read").

 

James, back to your first comment, can you clarify what "exception" you're referring to? I'm only now seeing this thread, but neither Allen's or Dave's comments before your refer to any exception. Is it perhaps that they've been edited? Or if not, can you clarify the connection between any exceptions and Allen's original post? I'm not asking to be annoying, but just to connect the dots (and to help future readers).

 

I see how your subsequent comments instead focus on leveraging the debugging output, to the degree that it tracks each file referenced. That said, it will only track CFM or CFC pages (or any extensions that might be cfincluded), but it's not clear if Allen's folder of files is only CFM files or (as Dave was addressing) perhaps also files uploaded or created by users and perhaps accesses with things like cffile, etc, which the debugging would not track.

 

Still, you both give him ideas to get started, and perhaps one of mine may help if those somehow don't.

/Charlie (troubleshooter, carehart. org)
James Moberg
Inspiring
October 29, 2020

Charlie,

 

I was referring to the data in the object returned from java.lang.Exception.  I typically use this when consulting on older third-party spaghetti code projects to determine which templates were used within a request.  While this doesn't return all the great information that debugging does, at least it doesn't require that debugging is enabled.

 

You are correct.  The techniques I shared won't identify any non-CFML files that are accessed (log files, file uploads, fileread(), generated XML/JSON, etc).

 

If using the getTemplatesUsed() UDF, a developer could retain a collection of unique paths (ie, as a server variable) and then log any new paths that are encountered to a file.  It's not perfect, but this approach wouldn't require external software or monitoring and would only identify CFM/CFC files that are used so that you can audit against the actual directory to identify unused (or lesser used) templates.

Charlie Arehart
Community Expert
Community Expert
October 30, 2020

Cool.. Lots of good ideas for Allen to consider. 🙂 

/Charlie (troubleshooter, carehart. org)
James Moberg
James MobergCorrect answer
Inspiring
October 27, 2020

Here's a similar question on StackOverflow with some alternate solutions:

https://stackoverflow.com/questions/35370081/coldfusion-show-all-included-files-on-load

 

To further assist you (and my curiosity), I wrote a UDF based on the above methods that returns line, parent, template, function and totaltime.

https://gist.github.com/JamoCA/c2b90b5596d0b9c352aafc3a76ef0674

James Moberg
Inspiring
October 27, 2020

If you desire to suppress the classic debug output, you could either replace "/WEB-INF/debug/classic.cfm" with an empty script (which is what I am doing) or choose an editable script from https://github.com/joshknutson/coldfusion-debug so that you can specifically enable it for desired IPs in a hard-coded (or request-scoped) variable.

After modifying the debugging script, I was able to output the information as desired without any appended HTML or JS content.  (NOTE: This is not desirable. It munges JSON and XML output, etc.)

James Moberg
Inspiring
October 26, 2020

During onRequestEnd, you could capture the exception data, loop over "tagContext" array and log the data as desired:

<cfset oException = createObject("java","java.lang.Exception").init()>
<cfdump var="#oException#">

 

This approach would assist in identifying line number, type of include (CFInclude, javaproxy, etc), template path and file type.  (This is the opposite of what you requested, but is probably the more pragmatic way of identifying files that aren't used.)

 

It is possible that some requests may access different internal scripts based on parameters passed and/or user permissions, so you'll want to log the data for an extended period so you can capture a large variation of users and requests.

Community Expert
October 26, 2020

This is, I think, more difficult than it seems it should be! There are so many ways a file might be referenced as part of your application without even having its name specified. Most of those ways aren't good to use, but they're there and there's nothing you can really do about it. For example, let's say you have a file where the user can provide form input, then you use that input in a less-than-obvious manner to reference a file once the form is submitted.

 

So, my recommmendation is to do something different. First, add a CFLOG reference to the top of every file that logs the file's use. You can do this in Application.cfm or Application.cfc, I think, or you can explicitly do it within each file. Then, run a test script that performs ABSOLUTELY EVERY TASK within your application. This is good for other things beyond solving this problem, like being able to test every task within your application! Of course, you'll have to write the test script yourself.

 

Dave Watts, Eidolon LLC

Dave Watts, Eidolon LLC