Exctractor - exception extractor
 
What is it?
Exctractor is a tool for extracting exceptions from log files and producing statistics upon that data.
Its sole purpose is to identify different exceptions and how many times they have occurred.
Exctractor has been written for and tested on log files produced by Java applications, but with some tweaking should be able to cope with other languages as well.
How does it work?
Exctractor parses all files in a single or multiple directories. Files can be flat files, or bzip archives.
 
Here is what exctractor does:
 
- Looks for text blocks that can potentially be Java exceptions:
    a) block starts with a timestamp
    b) it is followed by one or more lines without a timestamp
 
- Validates the block to be an exception
    a) must contain words ‘java’ and ‘exception’
 
- Dissects the block (assuming it’s a “valid” exception) into:
    a) Headline (one that contains timestamp)
    b) First stack trace line
    c) Remaining stack trace body
 
- Attempts to match it against grouping rules (see below)
    a) If it matches one of the rules (first occurrence), the group counter is increased
    b) If no matching rules were found:
        b.1) Removes information that usually ‘individualises’ exceptions:
               (Proxy classes and everything between ‘(‘ and ‘)’)
        b.2) Creates an MD5 hash of the remaining exception body and assigns a group for it
        b.3) Increases group counter
Group configuration file format
Group configuration file should be called exctractor.xml and should be located in the same directory, where executable script is located.
 
Format of the configuration file is as follows:
 
<?xml version="1.0"?>
<config>
    <exception_types>
       <exception logline="<regexp to match the first log line (containing the timestamp)>"
                        headline="<regexp to match the first exception body line>"
                        body="<regexp to match exception body>"
                        group="<group name, any text>"
                        desc="<group description, eny text>"
        />
   </exception_types>
</config>
Command line options
 
Usage: exctractor.py [options] LOG_DIR1 [LOG_DIR2 [LOG_DIR3 [...]]]
 
Options:
       -h, --help                 show this help message and exit
       -p FILE_PATTERN     Pattern for log file name matching
       -f FORMAT               Output format: CSV (default) or TEXT
       -g                           Include group statistics in the report (TEXT mode only)
       -v                           Verbose output (TEXT mode only)
Download
You can download latest version of exctractor from sourceforge.net
Author
Exctractor is written by Rytis Sileika (reachable via: rytis.sileika [AT] gmail.com)