Searching over multiple lines - how to find blocks?

rsf77 · Jun 12, 2013#12013-06-12T18:18+00:00

Trying to find the following scenario in multiple files.
Line 1 starts with REF*D9 (* is part of data not wildcard), followed by 1 or more lines, then a line that starts with K3.
I want to find the data between REF*D9 and upto and including the line starting with K3.
I do NOT want any lines that start with REF*D9 that are not followed by a K3 before another REF*D9 occurs.

I have tried %REF?D9?++[^p]++K3 to no avail.
Looking for suggestions. Thanks. Richard.

Mofi · Jun 13, 2013#22013-06-13T05:36+00:00

Such a complex search is not possible with the UltraEdit regular expression engine.

But it can be done with Perl regular expression engine using search string ^REF\*D9.*\r\n(?:(?:(?!REF\*D9).)*$\r\n)+K3.*$

For the red formatted part the expression from How to delete all lines NOT containing specific word or string or expression? was used which finds a DOS terminated line not containing in this case the string REF*D9. The blue parts of the expression is a multiplier for lines not containing REF*D9. And the green parts are for first and last line of the block which should be found.

You can use \r\n instead of $ at end of the search string if you want the DOS line termination of the line starting with K3 also matched by the expression.

rsf77 · Jun 13, 2013#32013-06-13T20:16+00:00

Thank you so much. You are truly the Master!

jmdme · May 20, 2014#42014-05-20T00:59+00:00

Hello all.

I have been trying to accomplish this for quite some time now and I could use some advice
Essentially I have a log file that contains communication with a host system from a device.
There is a signature start to the sequence, and a handful of signature endings.
Obviously, there will be multiple matches per log.

An example of the start sequence will be a socket connecting.
As an example of the terminating sequence, it will be the literal <CR><LF><LF>.
I would need to match everything from the beginning of the sequence until the end which is likely to span multiple lines.

Here is an example of the format of the file. The lines to match are formatted red.

....Lines of data
(4/25/14 3:09:43 PM CDT) 2398421: BSD: Connecting
(4/25/14 3:09:43 PM CDT) 2398426: BSD: Sending data via socket from file CORECONFIG
(4/25/14 3:09:43 PM CDT) 2398429: BSD: Data Sent: prTaskLUTCoreConfiguration('04-25-14 15:09:39','572504070','
(4/25/14 3:09:43 PM CDT) 2398429: BSD: Data Sent: 00124','en_US','Test','CT-42-03-097')<CR><LF><LF>
(4/25/14 3:09:43 PM CDT) 2398642: BSD: Data Received: hem,00124,0,0,<CR><LF><LF><LF>
(4/25/14 3:09:43 PM CDT) 2398644: BSD: Socket got file CORECONFIG 24 bytes, wrote as CORECONFIG.lut 22 bytes
(4/25/14 3:09:43 PM CDT) 2398648: BSD: Closing Socket - LUT Transfer
(4/25/14 3:09:43 PM CDT) 2398648: BSD: Command Successful [Total Successful = 4, Total Failures = 0]
(4/25/14 3:09:43 PM CDT) 2398651: CORECONFIG LUT request sent 1
(4/25/14 3:09:43 PM CDT) 2398651: CORECONFIG LUT received
(4/25/14 3:09:43 PM CDT) 2398675: PHD: Waiting for LUT send response
(4/25/14 3:09:43 PM CDT) 2398677: [VCMainThread] Got PMS_MSG_ACTION_COMMAND
(4/25/14 3:09:43 PM CDT) 2398681: BSD: LUT Retries is OFF
....More lines of data

So I would need to stop the matching when the terminating characters are encountered. ( I labeled the areas I'd like to match. )

Attempting to use the following but it matches beyond the terminating sequence.

.*BSD: Connecting(?:\r\n.*)+<CR><LF><LF>

Any help is greatly appreciated.

Thank you!

Mofi · May 20, 2014#52014-05-20T05:43+00:00

No problem, use the Perl regular expression (?s)^[^\r\n]*?BSD: Connecting.+?<CR><LF><LF>[^\n]*\n as search string.

(?s) at beginning of the search string changes the behavior for . which usually does not match carriage return and line-feed, but with this string at beginning of the search string matches also line terminators. See the topic "." (dot) in Perl regular expressions doesn't include CRLFs? for details.

^ ... start the search at beginning of a line.

[^\r\n]*? ... as a . matches now also line terminators, a different expression for .*? is needed which does not match line terminators. Therefore a negative character set definition is used which matches all characters except new line characters 0 or more times and as less characters as possible.

.+? ... that expression matches now any character including line terminators non greedy.

[^\n]*\n ... matches all other characters on last line of the block up to the line-feed and finally also the line-feed.

Or you use the Perl regular expression ^.*?BSD: Connecting[\S\s]+?<CR><LF><LF>[^\n]*\n as search string.

[\S\s]+? ... matches any non whitespace character and any whitespace character 1 or more times non greedy. [\S\s] is like . with (?s) at beginning of the search string. It simple matches any character.

Also working for your example are the expressions ^.*?BSD: Connecting[^<]+?<CR><LF><LF>[^\n]*\n and ^.*?BSD: Connecting[^<]+?<CR><LF><LF>[^\n]*\n although I would not use them as any < in the block to match before <CR> would result in not matching the block.

jmdme · May 20, 2014#62014-05-20T16:49+00:00

That works perfectly and I appreciate the explanation and detail you gave.

Thank you so much Mofi!

YSLGuru · Mar 13, 2015#72015-03-13T16:31+00:00

I've been over the posts here and have tried to use what is listed but either I've missed something or what I need is not listed. I need to locate a file (via FIND IN FILES) where the sets of words I'm searching for will be across many lines and there may be 1 or more whites spaces or tabs between them.

EXAMPLE:

I want a regular expression that will find this:

SELECT T.sCol1, T.sCol2, T.sCol3 AS 'My3rdColumn'

But the above will not be on 1 line. It could be that each word is on a new line with 1 or many white spaces or even tabs. So it could be like this:

Code: Select all

SELECT T.sCol1,
T.sCol2,
T.sCol3 AS 'My3rdColumn'

Or like this

Code: Select all

SELECT
  T.sCol1,
                                            T.sCol2,
T.sCol3   AS 'My3rdColumn'

The first word (SELECT) may not be the first character(s) in the file. I have a SQL Query to locate and I know the order of the words (shown in example above) but theres no telling if its all on 1 line or many lines or is there are 1 or many spaces or other non-visible characters like the tab.

Any ideas? BTW: I have created test *.txt file containing the query so that I can verify with certainty if the regex I'm using works or not.

Thanks

Ovg · Mar 13, 2015#82015-03-13T16:54+00:00

Try Regular expressions: Perl

SELECT\s*T\.sCol1,\s*T\.sCol2,\s*T\.sCol3\s*as \s*'My3rdColumn'

YSLGuru · Mar 13, 2015#92015-03-13T17:13+00:00

Thanks Ovg that worked. Now I have one additional change. What if there is an unknown character or set of characters between the known words?

For example if I was searching for the same sets of words, but I wanted to allow for one or more unknown characters between parts of it like this?

SEARCH PHRASE:

Code: Select all

SELECT T.sCol1, T.sCol2, T.sCol3 AS 'My3rdColumn'

Should match this which has AS 'sCol2' in it and its not part of my search string.

Code: Select all

SELECT T.sCol1, T.sCol2 AS 'Col2', T.sCol3 AS 'My3rdColumn'

Thanks again

Ovg · Mar 13, 2015#102015-03-13T17:26+00:00

SELECT\s*T\.sCol1,\s*T\.sCol2.*\s*T\.sCol3\s*as \s*'My3rdColumn'