Parsing & Analyzing Log Files

Question

Afternoon,Once again thanks ahead of time for looking over my post.Here is what I am working on this time. I am working on an email monitor, which is run via a scheduled task on the hour, every hour. This is in coldfusion of course.What I need to do is grab the sent e-mails. The only record of email status is in a daily log file within the email server. The log file can be anywhere between 20kb to 120mb. The format of the log file itself is a bit random based on which step of the email process it is on.The file name is saved as sysMMDD.txt and we do have a process running every 20mins to check the log file size for the current date. If it's larger than 10mb we rename it sysMMDD_1.txt This is really irrelevant to my question but thought I'd provide all the information.Going back to the actual format of the log format it looks something like this:HH:MM SS:MS TYPE(HASH) [IP ADDRESS] etc.Type = type of e-mail or service being calledhash = a unique hash id of the e-mail to tie the steps together etc. is all of the text after [IP ADDRESS] this has no basic structure based on which step.The monitor needs to grab all the sends in this log file between an hour span. Remember, the log could contain up to a days worth of data. As it stands right now I'm able to do a send count by searching for the # of times 'ldeliver' appears in the log. Does anyone have any suggestions for parsing a log file like this one? I'm worried that the way I'm doing it now, which is a hack, will not suffice and there is probably a better way to do it.Basically right now I'm doing a cfloop using the index="line" to go through the file. You can imagine how this performs with large log files which is why we created the scheduled task above to rename log files. Now if I start adding time extractions as well, I'm pretty sure this process is going to bust.I know this post is scattered but it's just one of those days where everything appears to be happening at once. Does anyone have any other ideas about going about this process? Someone mentioned an ODBC datasource to the text file but will that work when it's space delimited and only the first "four" chunks are reliable format?Any help appreciated!

Adam Cameron. · Accepted Answer

Sorry, yeah. I didn't see you mention that another app generates the log.

Looping over the file line by line does not really add too much of a resource overhead. It does not need to load the whole file into RAM, it only reads each line in turn. I tried looping over a 1GB file on a CF instance with only 512MB of RAM allocated to it, and it churned away quite nicely, processing a few lines per millisecond, and it never broke a sweat. It took about 7min to process a million rows, and never consumed any more than a marginal amount of memory.

Do actually know doing it this way will cause you gyp? It doesn't sound like the sort of process that really needs to be lightning quick: it's a background process, isn't it?

I guess if you were concerned about it, you could pump the file through grep at file-system level first to extract just the lines you want, and process that much smaller file. The file system should process files of that size pretty quickly and efficiently.

I would not bother trying to put this stuff into a DB and then process it: doing that would probably be more work than just looping over the file as you are now.

--

Adam

Adam Cameron. · Answer

It sounds to me like you have two different requirements for this data being logged: one for the full log, another for the SEND data. Why don't you just additionally write the latter data to a separate log at the time you're logging it in the first place?

--

Adam

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded