Parsing Log records with regular expressions

  • Follow


I have a log file which is text based which has records in two formats
of the following form
`
A|B|C|D\n
A|B|C|D|E\n
\n
Exception\n
\n
\tstack trace line1\n
\tstack trace line2\n
\tstack trace line3\n
\n
A|B|C|D\n`

The first form (A|B|C|D) has statically defined columns delimited by a
pipe symbol. The second form has the last character "E" which implies an
exception record. If it is an exception record the information about the
exception follows. The exception information starts with a line
"Exception", followed by another newline and stacktrace on multiple
lines. Each stacktrace element starts with a tab.

I am parsing this file with ruby. Currently I am reading line by line
and building the log records. This is working fine.

I am wondering if I could rely on regular expressions to do it instead
of reading line by line - I could read a chunk of the file and apply two
regular expressions to see if there is a match and if I find the match
process the record and move to the next record. If there is no match,
then I combine multiple chunks until I find a match. Is this approach a
valid
consideration? Is this doable with Ruby? If there are any open source
projects, that do something like this, can someone point me to it? Also
any thoughts which one is more efficient and why? Appreciate any
feedback.

-- 
Posted via http://www.ruby-forum.com/.

0
Reply Kris 2/3/2011 9:38:24 PM

On Thu, Feb 3, 2011 at 10:38 PM, Kris K. <iamkrisko@gmail.com> wrote:
> I have a log file which is text based which has records in two formats
> of the following form
> `
> A|B|C|D\n
> A|B|C|D|E\n
> \n
> Exception\n
> \n
> \tstack trace line1\n
> \tstack trace line2\n
> \tstack trace line3\n
> \n
> A|B|C|D\n`
>
> The first form (A|B|C|D) has statically defined columns delimited by a
> pipe symbol. The second form has the last character "E" which implies an
> exception record. If it is an exception record the information about the
> exception follows. The exception information starts with a line
> "Exception", followed by another newline and stacktrace on multiple
> lines. Each stacktrace element starts with a tab.
>
> I am parsing this file with ruby. Currently I am reading line by line
> and building the log records. This is working fine.
>
> I am wondering if I could rely on regular expressions to do it instead
> of reading line by line - I could read a chunk of the file and apply two
> regular expressions to see if there is a match and if I find the match
> process the record and move to the next record. If there is no match,
> then I combine multiple chunks until I find a match. Is this approach a
> valid consideration?

Question is: why do you want to do that?  Line based parsing is simple
and has the advantage that you always get a complete record.  Note
also that underneath Ruby uses buffered reading - just in case you
wonder about IO efficiency.

> Is this doable with Ruby?

Yes, certainly.

> If there are any open source
> projects, that do something like this, can someone point me to it? Also
> any thoughts which one is more efficient and why? Appreciate any
> feedback.

My implementation of this would use a single regular expression with
an optional part for the "|E".  That way you need to match only once
and you can immediately distinguish record types.

# untested
Record = Struct.new :a, :b, :c, :d, :e

last = nil
ex = false

def parse
  ARGF.each do |line|
    if %r{^([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)(\|E)?} =~ line
      ex = $5
      r = Record.new $1, $2, $3, $4
      r.e = "" if ex

      yield last if last

      last = r
    elsif ex
      last.e << line
    else
      warn "Dunno what to do with line %{line.inspect}"
    end
  end

  yield last if last
end

parse do |rec|
  p rec
end

Cheers

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

0
Reply Robert 2/4/2011 9:30:22 AM


Thanks for the prompt response. Apprecite your taking the time to
respond with sample code. I have just started on this as a pet project
to learn Ruby. The task is to build a log analysis web application. The
log file is not a standard one - in the sense that it is dynamically
constructed where some columns are optional, but all of them are
seperated by '|' character.  Initially I am starting with reading a
static file but at some point my plan is to use SSH to read the live
file contents and provide realtime inforation. So I was considering what
other alternatives might work well in the realtime scenario as well.

Robert Klemme wrote in post #979584:
> My implementation of this would use a single regular expression with
> an optional part for the "|E".  That way you need to match only once
> and you can immediately distinguish record types.
>
> # untested
> Record = Struct.new :a, :b, :c, :d, :e
>
> last = nil
> ex = false
>
> def parse
>   ARGF.each do |line|
>     if %r{^([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)(\|E)?} =~ line
>       ex = $5
>       r = Record.new $1, $2, $3, $4
>       r.e = "" if ex
>
>       yield last if last
>
>       last = r
>     elsif ex
>       last.e << line
>     else
>       warn "Dunno what to do with line %{line.inspect}"
>     end
>   end
>
>   yield last if last
> end
>
> parse do |rec|
>   p rec
> end
>
> Cheers
>
> robert

-- 
Posted via http://www.ruby-forum.com/.

0
Reply Kris 2/4/2011 6:59:37 PM

2 Replies
459 Views

(page loaded in 0.062 seconds)

Similiar Articles:













7/26/2012 4:46:42 PM


Reply: