greedy [ ]+ ?

greedy [ ]+ ?

4
NewbieNewbie
4

    Dec 27, 2007#1

    I have a problem like this:

    Suppose I have this text: (the part between --------)

    --------------------------------------------------------------
    1 ENTRY --- one white space between "1" and "ENTRY"
    2 TEST --- one white space between "2" and "TEST"
    3 ENTRY --- two white spaces between "3" and "ENTRY"
    4 TEST --- two white spaces between "4" and "TEST"
    --------------------------------------------------------------

    Perl style. Regular expression

    [0-9][ \t]+[^E]

    matched line 2, 3 and 4 but not line 1. I expect it to match only 2 and 4.

    Why line 3 got matched?

    For line 3, seems that ultraedit perl used [ \t]+ to match one white space, and used [^E] to match the 2nd white space.

    Anyone know how to generate a perl style regular expression to match only line 2 and 4 but not line 1 and 3? That is, do not match lines with one or more blanks following the number [0-9], followed by "ENTRY"?


    Thanks.

      Dec 27, 2007#2

      I found a solution:

      Using the following regular expression will match only line 2 and 4:

      [0-9][ \t]+[^ E]

      Comparing to my original regular expression which matched line 2, 3 and 4:

      [0-9][ \t]+[^E]

      I added a space into ^E

      236
      MasterMaster
      236

        Dec 28, 2007#3

        You're right, the original regex matches line 3 because [^E] matches the space character. Greedy does not mean that a regex part won't give up a (partial) match if the overall regex requires it.

        A more robust way to circumvent this would be to use

        Code: Select all

        [0-9](?>[ \t]+)[^E]
        as your regex. Now the space/tab bit is enclosed in a so-called atomic group that will not be backtracked into once it has matched successfully. In other words, it will "use up" the space/tab part of the match and not try to go back into this if the following character is not an E.