Any regular expression wizzards around here?

I am wanting to parse a log file

which is of the form

Code:
Some header information (over several lines)
------------------------------------------------
Revision 1.2
date: 2004/03/30 11:57:58; author: pcb; state: Exp; lines: +35 -5
Some comment here (multiline)

------------------------------------------------
Revision 1.3
date: 2004/03/30 12:50:58; author: pcb; state: Exp; lines: +35 -5
Some comment here
------------------------------------------------

And extract the Revision numbers and there associated comment.
Now I could easily enough just hard code it, but I thought that it would be more elegant to use a regular expression. The problem I seem to get is making it stop matching at the end of the comment.

Any ideas? (Unfortunatly I was doing this at home and so don't have the expression I had gotten so far, I think it was something along the lines of
\n\s*revision\s\d(\.\d)+\s*\n.*\n*
Which I think will match on the Revision header (and cope with revisions of the form 1.2.3). But the tricky bit is getting it to stop on the --------


CC
 
Penguin!

:)

What I am doing is some vbscripts to automate some standard things.
E.g. I want to do a diff between my current working file and a file that is checked into source control. I have a script that does this, but I would like it to present me with a list of revision numbers and their associated comments so that I can pick the revision I want. To do that I need to parse the log file.

CC
 
Captain Chickenpants said:
Any ideas? (Unfortunatly I was doing this at home and so don't have the expression I had gotten so far, I think it was something along the lines of
\n\s*revision\s\d(\.\d)+\s*\n.*\n*
Which I think will match on the Revision header (and cope with revisions of the form 1.2.3). But the tricky bit is getting it to stop on the --------

You have lots of possible workarounds. If you're using Perl5 regexp, you can do a zero-width negative lookahead assertion, e.g.

\n\s*revision\s\d(\.\d)+\s*\n.*\n*(?!-----)

You can also non-greedy regexps, so you can include the ----

\n\s*revision\s\d(\.\d)+\s*\n.*\n*(------)??

This will consume the '----', but only one of them.

You can also try specifically to consume only one

\n\s*revision\s\d(\.\d)+\s*\n.*\n*(------){1,1}

There's some other variations too that can "negate" a string of consecutive -----'s, but they run much slower.

Don't forget many regexp libs differentiate between singleline and multiline match modes.

But you original code looks like it will work if you simply write

\n\s*revision\s\d(\.\d)+\s*\n.*(\n[^\-].*)*\n

This is as follows:
match first line (Revision) \n\s*revision\s\d(\.\d)+\s*\n
gobble up date: line .*

gobble up ZERO or more lines that do not start with a '-' character (\n[^\-].*)*

gobble up the final \n before the '-'
 
Back
Top