How to define a case sensitive function string search?

pepemosca · May 28, 2010#12010-05-28T00:22+00:00

Hello,
I would like to define a function statement to this case:

END
DATES
/
1 'JUL' 2010
/
DATES
  1 'JUL' 2007  /
  1 'JAN' 2008  /  4. year
  1 'JUL' 2008  /
  1 'JAN' 2009  /  5. year
  1 'JUL' 2009  /

I'm trying to edit my .uew file and add this:

Code: Select all

/TGBegin "Keyword"
/TGFindStr = "%[A-Z]"
/TGFindBEnd = "/"
/TGEnd

What I'm looking for is UltraEdit recognize the UPPERCASE words and add them as Keywords.
So, the question is: How do I say to UE to only recognize the UPPERCASE? I mean not to use [A-Z] as [a-zA-z]

Thanks!

Mofi · May 28, 2010#22010-05-28T05:47+00:00

pepemosca · May 28, 2010#32010-05-28T13:53+00:00

Thanks, thanks!

But there is an error here:

Code: Select all

/TGFindStr = "^(\u+)\>(?<!EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)"

It includes inside the "Keywords" the words "END", "GRID"... Only excluding is "EDIT". The first one.

Better working is:

Code: Select all

/TGBegin "Keyword"
/TGFindStr = "^(\u+)\>(?<!EDIT)(?<!END)(?<!GRID)(?<!OPTIMIZE)(?<!PROPS)(?<!REGIONS)(?<!RUNSPEC)(?<!SCHEDULE)(?<!SOLUTION)(?<!SUMMARY)"
/TGEnd

Any better way to do it?

bulgrien · May 28, 2010#42010-05-28T14:59+00:00

Have you tried:

Code: Select all

/TGFindStr = "^(\u+)\>(?<!(EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY))"

Sorry if it doesn't work..., it is an untested guess.

Mofi · May 28, 2010#52010-05-28T16:05+00:00

A quick test of

/TGFindStr = "^(\u+)\>(?<!(EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY))"

showed me that this is no help. I'm not a Perl expect and therefore don't know why. However, it looks like

/TGFindStr = "^(\u+)\>(?<!EDIT)(?<!END)(?<!GRID)(?<!OPTIMIZE)(?<!PROPS)(?<!REGIONS)(?<!RUNSPEC)(?<!SCHEDULE)(?<!SOLUTION)(?<!SUMMARY)"

works quite well and therefore there is no real nead to search for a better one.

bulgrien · May 28, 2010#62010-05-28T19:23+00:00

In Perl (?<!regex) is a zero-width negative lookbehind which means a match will only occur if not immediately preceded by the specified string.
The reason why (?<!regex|regex|regex) doesn't work is because the | (or) operator short-circuits whenever the evaluation is true.

If the preceding value is one of the later strings in the list it will never be detected. The regular expression short-circuits when the preceding value does not match the first string in the list since, logically, !false=true.

I was hoping that (?<!(regex|regex|regex)) would work, but frankly am not surprised that it does not. Unlike other languages, Perl's use of parentheses in the form of (regex) does not establish order-of-operations...it simply attempts to group the enclosed portion of the expression for the purpose of creating a back-reference. The zero-width negative lookbehind, therefore, is still evaluated the same way and still short-circuits.

In short, Mofi is right. There is no better way to code a multiple value negative lookbehind. Every value must be evaluated independently to eliminate the short-circuiting of the | (or) construct.

pepemosca · May 29, 2010#72010-05-29T14:48+00:00

I'll leave it like this:

Code: Select all

/Regexp Type = Perl
/TGBegin "Section"
/TGFindStr = "(?-i)^(EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)\>"
/TGEnd
/TGBegin "Keyword"
/TGFindStr = "^(\u+)\>(?<!EDIT)(?<!END)(?<!GRID)(?<!OPTIMIZE)(?<!PROPS)(?<!REGIONS)(?<!RUNSPEC)(?<!SCHEDULE)(?<!SOLUTION)(?<!SUMMARY)"
/TGEnd

But, I was thinking that it will be nice to add at a space, tab or enter at the beginning of the line.

I've tried to add:

Code: Select all

[ \t\r\n]

at the beginning of the statement... but doesn't work.

Something like:

Code: Select all

/Regexp Type = Perl
/TGBegin "Section"
/TGFindStr = "(?-i)^*[ \t\r\n](EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)\>"
/TGEnd
/TGBegin "Keyword"
/TGFindStr = "^*[ \t\r\n](\u+)\>(?<!EDIT)(?<!END)(?<!GRID)(?<!OPTIMIZE)(?<!PROPS)(?<!REGIONS)(?<!RUNSPEC)(?<!SCHEDULE)(?<!SOLUTION)(?<!SUMMARY)"
/TGEnd

I need to have the [ \t\r\n] or nothing at the beginning of the line in my .DATA files.

Any ideas?

By the way: bulgrien and Mofi, thanks.

Mofi · May 29, 2010#82010-05-29T18:25+00:00

Thanks bulgrien for your explanation for the insight why my expression did not really work. Good to know for the future. pepemosca, you could use:

Code: Select all

/Regexp Type = Perl
/TGBegin "Section"
/TGFindStr = "(?-i)^[ \t]*(EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)\>"
/TGEnd
/TGBegin "Keyword"
/TGFindStr = "^[ \t]*(\u+)\>(?<!\<EDIT)(?<!\<END)(?<!\<GRID)(?<!\<OPTIMIZE)(?<!\<PROPS)(?<!\<REGIONS)(?<!\<RUNSPEC)(?<!\<SCHEDULE)(?<!\<SOLUTION)(?<!\<SUMMARY)"
/TGEnd

[ \t]* means that 0 or more spaces or tabs can occur between start of line and the first word in uppercase letters.

\r\n should not be inside the square brackets because ^ means start of a line. And where is the start of a line? The start of a line is the first character AFTER \r (MAC), \n (UNIX) or \r\n (DOS). So it is impossible that \r or \n are at start of a line, they define always the end of a line.

\> is very important because it defines where to end the search. Without \> something like ENDBOX would be matched by the section regular expression and on the other hand END would be returned as a valid result from the keyword regular expression.

Additionally as I created some test examples I found out that for the keyword regular expression it is necessary to insert \< before every word in the lookbehind expressions. Otherwise the expression would not find something like FEND because this word ends with the lookbehind word END. The lookbehind expression is evaluated from right to left on the string found by the preceding expression. Therefore just \< is required, but not \>, to avoid wrong exclusions.

pepemosca · May 30, 2010#92010-05-30T00:28+00:00

Mofi, OK I understand everything except...

In Keyword, why not to add \> to the end of the words that I want to exclude?
Like:

Code: Select all

/TGBegin "Keyword"
/TGFindStr = "^[ \t]*(\u+)\>(?<!\<EDIT\>)(?<!\<END\>)(?<!\<GRID\>)(?<!\<OPTIMIZE\>)(?<!\<PROPS\>)(?<!\<REGIONS\>)(?<!\<RUNSPEC\>)(?<!\<SCHEDULE\>)(?<!\<SOLUTION\>)(?<!\<SUMMARY\>)"
/TGEnd

Mofi · May 30, 2010#102010-05-30T18:01+00:00

pepemosca wrote:In Keyword, why not to add \> to the end of the words that I want to exclude?

You can do that, but it is not necessary. It does not make a difference if \> is used or not. The only difference would be that the search string is longer and therefore the search would be a little bit slower.

I don't have a deep insight to the Perl engine, but I think lookbehinds are applied from right the left. Let us simplify the expression to

/TGFindStr = "^[ \t]*(\u+)\>(?<!\<END\>)"

and think first how it works on the word ENDBOX. Most right character of found string is X. This character is compared with the negative lookbehind expression string which is simply the string END. X from ENDBOX is not equal D from END. So found string is surely okay. No further test necessary.

Now let us think about found string FEND with most right character is D. Most right character of negative lookbehind string is also D. So there is a match and therefore the next character to left must be analyzed. This is in both strings N, again a match, continue with next character to left, once again a match for E. Now when the lookbehind expression would be just (?<!END)" the negative lookbehind would return here true and FEND would be ignored. But the negative lookbehind expression is (?<!\<END). Therefore the Perl regex engine has to check now, if the E is the first character of the found string which is not the case or the character to the left is a non word character which is also not the case because F is a word character. Therefore the negative lookbehind is false for found string FEND and FEND is not excluded.

The search expression (\u+)\> finds only entire words and the negative lookbehinds are applied always from right to left on the found strings - from end of the words. The first character of the found string not matching the character at same position of the negative lookbehind string breaks further evaluation. Therefore \> is not necessary on the lookbehind strings. This expression returns always true for all found strings here.

pepemosca · May 30, 2010#112010-05-30T22:00+00:00

Mofi, OK. Now I get your point.
Thanks for your explanation.

May 23, 2011#122011-05-23T20:11+00:00

OK, now a new challenge

It's kind of the same... But to be honest: I can't make it work!

Here is my text:

Code: Select all

*RUN
*DATE 1990 5 1

*GROUP 'G' *ATTACHTO 'FIELD'

*DATE 1990 5 3

*WELL 'OP' *ATTACHTO 'G'
*BHPDEPTH 'OP' 2600.0

I want to make a list of DATE.

I want to have in the Function List something like this:

Code: Select all

DATE
  1990 5 1
  1990 5 3

Ideas? Thanks!

Mofi · May 24, 2011#132011-05-24T05:43+00:00

Well, that is a very simple task.

With using the Perl regular expression engine:

/Regexp Type = Perl
/TGBegin "DATE"
/TGFindStr = "^[ \t]*\*DATE[ \t]+(\d+ +\d+ +\d+)"
/TGEnd

With using the UltraEdit regular expression engine:

/TGBegin "DATE"
/TGFindStr = "%[ ^t]++^*DATE[ ^t]+^([0-9]+ +[0-9]+ +[0-9]+^)"
/TGEnd

If [ \t]* respectively [ ^t]++ is necessary or should be removed depends on the fact if preceding whitespaces between start of line and *DATE are allowed and possibly exist or are not allowed.

pepemosca · May 24, 2011#142011-05-24T11:21+00:00

But, this works better

Code: Select all

/TGBegin "Date"
/TGFindStr = "(?-i)^[ \t]*\*DATE[ \t]+(\d+[ \t]+\d+[ \t]+\d+)"
/TGEnd

As usual, works perfect.

But... To understand more:
How do I say: "Show me this part? And the rest is not shown?"

Thanks Mofi!

May 26, 2011#152011-05-26T11:44+00:00

Mofi, I want to make a slight modification to this code:

Code: Select all

/TGBegin "Keyword"
/TGFindStr = "(?-i)^[ \t]*\(\u+)\>(?<!\<TITLE1)(?<!\<GRID)(?<!\<MODEL)(?<!\<ROCKFLUID)(?<!\<INITIAL)(?<!\<NUMERICAL)(?<!\<RUN)(?<!\<STOP)"
/TGEnd

I want that this code to find *KEYWORD and KEYWORD but doesn't find **KEYWORD.

I say this, because I could try this code:

Code: Select all

/TGBegin "Keyword"
/TGFindStr = "(?-i)^[ \t\*]*(\u+)\>(?<!\<TITLE1)(?<!\<GRID)(?<!\<MODEL)(?<!\<ROCKFLUID)(?<!\<INITIAL)(?<!\<NUMERICAL)(?<!\<RUN)(?<!\<STOP)"
/TGEnd

But finds one or more *.

I found inside the Perl that there is a setting for number of ocurrences... But I cannot make it work ({})

Ideas? Thanks!