How to define a case sensitive function string search?

How to define a case sensitive function string search?

27
Basic UserBasic User
27

    May 28, 2010#1

    Hello,
    I would like to define a function statement to this case:

    Code: Select all

    END
    DATES
    /
    1 'JUL' 2010
    /
    DATES
      1 'JUL' 2007  /
      1 'JAN' 2008  /  4. year
      1 'JUL' 2008  /
      1 'JAN' 2009  /  5. year
      1 'JUL' 2009  /
    I'm trying to edit my .uew file and add this:

    Code: Select all

    /TGBegin "Keyword"
    /TGFindStr = "%[A-Z]"
    /TGFindBEnd = "/"
    /TGEnd
    What I'm looking for is UltraEdit recognize the UPPERCASE words and add them as Keywords.
    So, the question is: How do I say to UE to only recognize the UPPERCASE? I mean not to use [A-Z] as [a-zA-z]

    Thanks!
    UltraEdit 16.30.0.1003
    UltraCompare Professional 7.20.0.1009
    Windows Vista Enterprise, 64 bits, Spanish

    6,675585
    Grand MasterGrand Master
    6,675585

      May 28, 2010#2

      What you want is not possible when using the default UltraEdit regular expression engine. But with using the Perl regular expression engine the function strings you need are no problems. Here are the function strings with a short explanation:

      /TGFindStr = "(?-i)^(EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)\>"

      (?-i) makes the search case sensitive.
      ^ means start of line.
      (...) tag the string found by the expression inside the round brackets to display only that part of the line in the function list.
      EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY the words which must be at start of the line combined with a logical OR.
      \> means end of a word.


      /TGFindStr = "^(\u+)\>(?<!EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)"

      ^ means start of line.
      (...) tag the string found by the expression inside the round brackets to display only that part of the line in the function list.
      \u+ means find one or more uppercase characters.
      \> means end of a word.
      (?<!EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY) means the preceding found string is NOT one of these strings. We don't want find the same strings twice.
      Best regards from an UC/UE/UES for Windows user from Austria

      27
      Basic UserBasic User
      27

        May 28, 2010#3

        Thanks, thanks!

        But there is an error here:

        Code: Select all

        /TGFindStr = "^(\u+)\>(?<!EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)"
        It includes inside the "Keywords" the words "END", "GRID"... Only excluding is "EDIT". The first one.

        Better working is:

        Code: Select all

        /TGBegin "Keyword"
        /TGFindStr = "^(\u+)\>(?<!EDIT)(?<!END)(?<!GRID)(?<!OPTIMIZE)(?<!PROPS)(?<!REGIONS)(?<!RUNSPEC)(?<!SCHEDULE)(?<!SOLUTION)(?<!SUMMARY)"
        /TGEnd
        Any better way to do it?
        UltraEdit 16.30.0.1003
        UltraCompare Professional 7.20.0.1009
        Windows Vista Enterprise, 64 bits, Spanish

        901
        MasterMaster
        901

          May 28, 2010#4

          Have you tried:

          Code: Select all

          /TGFindStr = "^(\u+)\>(?<!(EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY))"
          Sorry if it doesn't work..., it is an untested guess. :|

          6,675585
          Grand MasterGrand Master
          6,675585

            May 28, 2010#5

            A quick test of

            /TGFindStr = "^(\u+)\>(?<!(EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY))"

            showed me that this is no help. I'm not a Perl expect and therefore don't know why. However, it looks like

            /TGFindStr = "^(\u+)\>(?<!EDIT)(?<!END)(?<!GRID)(?<!OPTIMIZE)(?<!PROPS)(?<!REGIONS)(?<!RUNSPEC)(?<!SCHEDULE)(?<!SOLUTION)(?<!SUMMARY)"

            works quite well and therefore there is no real nead to search for a better one.
            Best regards from an UC/UE/UES for Windows user from Austria

            901
            MasterMaster
            901

              May 28, 2010#6

              In Perl (?<!regex) is a zero-width negative lookbehind which means a match will only occur if not immediately preceded by the specified string.
              The reason why (?<!regex|regex|regex) doesn't work is because the | (or) operator short-circuits whenever the evaluation is true.

              If the preceding value is one of the later strings in the list it will never be detected. The regular expression short-circuits when the preceding value does not match the first string in the list since, logically, !false=true.

              I was hoping that (?<!(regex|regex|regex)) would work, but frankly am not surprised that it does not. Unlike other languages, Perl's use of parentheses in the form of (regex) does not establish order-of-operations...it simply attempts to group the enclosed portion of the expression for the purpose of creating a back-reference. The zero-width negative lookbehind, therefore, is still evaluated the same way and still short-circuits.

              In short, Mofi is right. There is no better way to code a multiple value negative lookbehind. Every value must be evaluated independently to eliminate the short-circuiting of the | (or) construct.

              27
              Basic UserBasic User
              27

                May 29, 2010#7

                I'll leave it like this:

                Code: Select all

                /Regexp Type = Perl
                /TGBegin "Section"
                /TGFindStr = "(?-i)^(EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)\>"
                /TGEnd
                /TGBegin "Keyword"
                /TGFindStr = "^(\u+)\>(?<!EDIT)(?<!END)(?<!GRID)(?<!OPTIMIZE)(?<!PROPS)(?<!REGIONS)(?<!RUNSPEC)(?<!SCHEDULE)(?<!SOLUTION)(?<!SUMMARY)"
                /TGEnd
                But, I was thinking that it will be nice to add at a space, tab or enter at the beginning of the line.

                I've tried to add:

                Code: Select all

                [ \t\r\n]
                at the beginning of the statement... but doesn't work.

                Something like:

                Code: Select all

                /Regexp Type = Perl
                /TGBegin "Section"
                /TGFindStr = "(?-i)^*[ \t\r\n](EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)\>"
                /TGEnd
                /TGBegin "Keyword"
                /TGFindStr = "^*[ \t\r\n](\u+)\>(?<!EDIT)(?<!END)(?<!GRID)(?<!OPTIMIZE)(?<!PROPS)(?<!REGIONS)(?<!RUNSPEC)(?<!SCHEDULE)(?<!SOLUTION)(?<!SUMMARY)"
                /TGEnd
                I need to have the [ \t\r\n] or nothing at the beginning of the line in my .DATA files.

                Any ideas?

                By the way: bulgrien and Mofi, thanks.
                UltraEdit 16.30.0.1003
                UltraCompare Professional 7.20.0.1009
                Windows Vista Enterprise, 64 bits, Spanish

                6,675585
                Grand MasterGrand Master
                6,675585

                  May 29, 2010#8

                  Thanks bulgrien for your explanation for the insight why my expression did not really work. Good to know for the future. pepemosca, you could use:

                  Code: Select all

                  /Regexp Type = Perl
                  /TGBegin "Section"
                  /TGFindStr = "(?-i)^[ \t]*(EDIT|END|GRID|OPTIMIZE|PROPS|REGIONS|RUNSPEC|SCHEDULE|SOLUTION|SUMMARY)\>"
                  /TGEnd
                  /TGBegin "Keyword"
                  /TGFindStr = "^[ \t]*(\u+)\>(?<!\<EDIT)(?<!\<END)(?<!\<GRID)(?<!\<OPTIMIZE)(?<!\<PROPS)(?<!\<REGIONS)(?<!\<RUNSPEC)(?<!\<SCHEDULE)(?<!\<SOLUTION)(?<!\<SUMMARY)"
                  /TGEnd
                  [ \t]* means that 0 or more spaces or tabs can occur between start of line and the first word in uppercase letters.

                  \r\n should not be inside the square brackets because ^ means start of a line. And where is the start of a line? The start of a line is the first character AFTER \r (MAC), \n (UNIX) or \r\n (DOS). So it is impossible that \r or \n are at start of a line, they define always the end of a line.

                  \> is very important because it defines where to end the search. Without \> something like ENDBOX would be matched by the section regular expression and on the other hand END would be returned as a valid result from the keyword regular expression.

                  Additionally as I created some test examples I found out that for the keyword regular expression it is necessary to insert \< before every word in the lookbehind expressions. Otherwise the expression would not find something like FEND because this word ends with the lookbehind word END. The lookbehind expression is evaluated from right to left on the string found by the preceding expression. Therefore just \< is required, but not \>, to avoid wrong exclusions.
                  Best regards from an UC/UE/UES for Windows user from Austria

                  27
                  Basic UserBasic User
                  27

                    May 30, 2010#9

                    Mofi, OK I understand everything except...

                    In Keyword, why not to add \> to the end of the words that I want to exclude?
                    Like:

                    Code: Select all

                    /TGBegin "Keyword"
                    /TGFindStr = "^[ \t]*(\u+)\>(?<!\<EDIT\>)(?<!\<END\>)(?<!\<GRID\>)(?<!\<OPTIMIZE\>)(?<!\<PROPS\>)(?<!\<REGIONS\>)(?<!\<RUNSPEC\>)(?<!\<SCHEDULE\>)(?<!\<SOLUTION\>)(?<!\<SUMMARY\>)"
                    /TGEnd
                    UltraEdit 16.30.0.1003
                    UltraCompare Professional 7.20.0.1009
                    Windows Vista Enterprise, 64 bits, Spanish

                    6,675585
                    Grand MasterGrand Master
                    6,675585

                      May 30, 2010#10

                      pepemosca wrote:In Keyword, why not to add \> to the end of the words that I want to exclude?
                      You can do that, but it is not necessary. It does not make a difference if \> is used or not. The only difference would be that the search string is longer and therefore the search would be a little bit slower.

                      I don't have a deep insight to the Perl engine, but I think lookbehinds are applied from right the left. Let us simplify the expression to

                      /TGFindStr = "^[ \t]*(\u+)\>(?<!\<END\>)"

                      and think first how it works on the word ENDBOX. Most right character of found string is X. This character is compared with the negative lookbehind expression string which is simply the string END. X from ENDBOX is not equal D from END. So found string is surely okay. No further test necessary.

                      Now let us think about found string FEND with most right character is D. Most right character of negative lookbehind string is also D. So there is a match and therefore the next character to left must be analyzed. This is in both strings N, again a match, continue with next character to left, once again a match for E. Now when the lookbehind expression would be just (?<!END)" the negative lookbehind would return here true and FEND would be ignored. But the negative lookbehind expression is (?<!\<END). Therefore the Perl regex engine has to check now, if the E is the first character of the found string which is not the case or the character to the left is a non word character which is also not the case because F is a word character. Therefore the negative lookbehind is false for found string FEND and FEND is not excluded.

                      The search expression (\u+)\> finds only entire words and the negative lookbehinds are applied always from right to left on the found strings - from end of the words. The first character of the found string not matching the character at same position of the negative lookbehind string breaks further evaluation. Therefore \> is not necessary on the lookbehind strings. This expression returns always true for all found strings here.
                      Best regards from an UC/UE/UES for Windows user from Austria

                      27
                      Basic UserBasic User
                      27

                        May 30, 2010#11

                        Mofi, OK. Now I get your point.
                        Thanks for your explanation.

                          May 23, 2011#12

                          OK, now a new challenge :)

                          It's kind of the same... But to be honest: I can't make it work!

                          Here is my text:

                          Code: Select all

                          *RUN
                          *DATE 1990 5 1
                          
                          *GROUP 'G' *ATTACHTO 'FIELD'
                          
                          *DATE 1990 5 3
                          
                          *WELL 'OP' *ATTACHTO 'G'
                          *BHPDEPTH 'OP' 2600.0
                          I want to make a list of DATE.

                          I want to have in the Function List something like this:

                          Code: Select all

                          DATE
                            1990 5 1
                            1990 5 3
                          Ideas? Thanks!

                          6,675585
                          Grand MasterGrand Master
                          6,675585

                            May 24, 2011#13

                            Well, that is a very simple task.

                            With using the Perl regular expression engine:

                            /Regexp Type = Perl
                            /TGBegin "DATE"
                            /TGFindStr = "^[ \t]*\*DATE[ \t]+(\d+ +\d+ +\d+)"
                            /TGEnd


                            With using the UltraEdit regular expression engine:

                            /TGBegin "DATE"
                            /TGFindStr = "%[ ^t]++^*DATE[ ^t]+^([0-9]+ +[0-9]+ +[0-9]+^)"
                            /TGEnd


                            If [ \t]* respectively [ ^t]++ is necessary or should be removed depends on the fact if preceding whitespaces between start of line and *DATE are allowed and possibly exist or are not allowed.

                            27
                            Basic UserBasic User
                            27

                              May 24, 2011#14

                              But, this works better ;)

                              Code: Select all

                              /TGBegin "Date"
                              /TGFindStr = "(?-i)^[ \t]*\*DATE[ \t]+(\d+[ \t]+\d+[ \t]+\d+)"
                              /TGEnd
                              As usual, works perfect.

                              But... To understand more:
                              How do I say: "Show me this part? And the rest is not shown?"

                              Thanks Mofi!

                                May 26, 2011#15

                                Mofi, I want to make a slight modification to this code:

                                Code: Select all

                                /TGBegin "Keyword"
                                /TGFindStr = "(?-i)^[ \t]*\(\u+)\>(?<!\<TITLE1)(?<!\<GRID)(?<!\<MODEL)(?<!\<ROCKFLUID)(?<!\<INITIAL)(?<!\<NUMERICAL)(?<!\<RUN)(?<!\<STOP)"
                                /TGEnd
                                I want that this code to find *KEYWORD and KEYWORD but doesn't find **KEYWORD.

                                I say this, because I could try this code:

                                Code: Select all

                                /TGBegin "Keyword"
                                /TGFindStr = "(?-i)^[ \t\*]*(\u+)\>(?<!\<TITLE1)(?<!\<GRID)(?<!\<MODEL)(?<!\<ROCKFLUID)(?<!\<INITIAL)(?<!\<NUMERICAL)(?<!\<RUN)(?<!\<STOP)"
                                /TGEnd
                                But finds one or more *.

                                I found inside the Perl that there is a setting for number of ocurrences... But I cannot make it work ({})

                                Ideas? Thanks!

                                Read more posts (2 remaining)