How does it work?
Exctractor parses all files in a single or multiple directories. Files can be flat files, or bzip archives.
Here is what exctractor does:
- Looks for text blocks that can potentially be Java exceptions:
a) block starts with a timestamp
b) it is followed by one or more lines without a timestamp
- Validates the block to be an exception
a) must contain words ‘java’ and ‘exception’
- Dissects the block (assuming it’s a “valid” exception) into:
a) Headline (one that contains timestamp)
b) First stack trace line
c) Remaining stack trace body
- Attempts to match it against grouping rules (see below)
a) If it matches one of the rules (first occurrence), the group counter is increased
b) If no matching rules were found:
b.1) Removes information that usually ‘individualises’ exceptions:
(Proxy classes and everything between ‘(‘ and ‘)’)
b.2) Creates an MD5 hash of the remaining exception body and assigns a group for it
b.3) Increases group counter