Sorting out of sequence log Apache log file

Allen K. Smith lazlor at bigboy.lotaris.org
Wed Apr 26 20:03:34 UTC 2006


On Wednesday 26 April 2006 12:43, Ray Van Dolson wrote:
> On Wed, Apr 26, 2006 at 12:13:51PM -0700, Chris W. Parker wrote:
> > Hello,
> > 
> > I had a hiccup with syslog/apache/logrotate recently and as a result
> > some of the Apache log files are out of sequence. This is bad because
> > Webalizer no longer recognizes the out of sequence lines and my
> > reporting results are skewed.
> > 
> > Is there a command line util that will sort the records correctly? I've
> > been looking around through Google without any luck so far.
> 
> I assume by out of sequence you mean the time stamps are all off?
> 
> The following quickie Python hack works for me.  Basically call it as follows:
> 
>  % cat access_log | /path/to/sort_apache.py > sorted_log.log

sort -t[ -k 2.4,6M -k 2.1,2n access_log


> 
> If it's a huge logfile the script may give you some problems.  Basically it
> reads in all the lines in the file and sorts by the date and time and then
> spits it out in the right order.
> 
> #!/usr/bin/python
> #
> # Simple script to sort an Apache log based on it's time/date field.
> #
> # Ray Van Dolson <rayvd at digitalpath.net>
> #
> 
> import re
> import sys
> from time import strptime, mktime
> 
> def main():
> 
>   line_dict = {}
> 
>   while 1:
>     buf = sys.stdin.readline()
>     if buf:
>       # We have data to process.
>       t = re.match(".*\[(\d\d\/[A-Za-z]{3}\/[0-9]{4}:\d{2}:\d{2}:\d{2}) .+?\].*", buf)
>       if t:
>         ts = mktime(strptime(t.group(1), "%d/%b/%Y:%H:%M:%S"))
>         if not line_dict.has_key(ts):
>           line_dict[ts] = []
>         line_dict[ts].append(t.group(0))
> 
>     else:
>       break
> 
>   keys = line_dict.keys()
>   keys.sort()
> 
>   for entry in keys:
>     for line in line_dict[entry]:
>       print line
> 
> if __name__ == '__main__':
>   main()
> 




More information about the redhat-list mailing list