Parsing & Analyzing Log Files
Afternoon,
Once again thanks ahead of time for looking over my post.
Here is what I am working on this time. I am working on an email monitor, which is run via a scheduled task on the hour, every hour. This is in coldfusion of course.
What I need to do is grab the sent e-mails. The only record of email status is in a daily log file within the email server. The log file can be anywhere between 20kb to 120mb. The format of the log file itself is a bit random based on which step of the email process it is on.
The file name is saved as sysMMDD.txt and we do have a process running every 20mins to check the log file size for the current date. If it's larger than 10mb we rename it sysMMDD_1.txt This is really irrelevant to my question but thought I'd provide all the information.
Going back to the actual format of the log format it looks something like this:
HH:MM SS:MS TYPE(HASH) [IP ADDRESS] etc.
Type = type of e-mail or service being called
hash = a unique hash id of the e-mail to tie the steps together
etc. is all of the text after [IP ADDRESS] this has no basic structure based on which step.
The monitor needs to grab all the sends in this log file between an hour span. Remember, the log could contain up to a days worth of data. As it stands right now I'm able to do a send count by searching for the # of times 'ldeliver' appears in the log.
Does anyone have any suggestions for parsing a log file like this one? I'm worried that the way I'm doing it now, which is a hack, will not suffice and there is probably a better way to do it.
Basically right now I'm doing a cfloop using the index="line" to go through the file. You can imagine how this performs with large log files which is why we created the scheduled task above to rename log files. Now if I start adding time extractions as well, I'm pretty sure this process is going to bust.
I know this post is scattered but it's just one of those days where everything appears to be happening at once. Does anyone have any other ideas about going about this process? Someone mentioned an ODBC datasource to the text file but will that work when it's space delimited and only the first "four" chunks are reliable format?
Any help appreciated!
