Search for lines containing a string and for lines not containing a string

Search for lines containing a string and for lines not containing a string

1
NewbieNewbie
1

    Oct 08, 2009#1

    Hello,

    I have the following text:
    COLOUR asd colour asd Colour
    asdf asdf asdf asdf asdf asdf
    Colour asdf asdf asdf
    colour asdf asdf

    ^.*(colour|Colour|COLOUR).* - in Perl regular expression that finds all lines that contain the word colour.

    How do I replace it with something else? Sorry if this is obvious - I have been trying.

    Also, how do I adapt ^.*(colour|Colour|COLOUR).* to search for all lines that do not contain the word Colour.

    Cheers Joe.

    236
    MasterMaster
    236

      Oct 08, 2009#2

      OK, a few suggestions:

      - Read the readme topic of this forum. It really helps to get a grip on the rather extensive search/replace capabilities of UE.
      - If you want to learn more about regexes, also look at http://www.regular-expressions.info - great tutorial.

      About your questions:

      First, (colour|Colour|COLOUR) is valid but doesn't make sense - it's not necessary (see option "Case sensitive") and might have unexpected results (what about colouR?). I'd wager that colou?rs? would make a lot more sense since it matches both AmE and BrE spelling variations (and plural forms, too).

      Then the question is whether you want to replace the ENTIRE line with something else (well then just type the replacement text into the replace box) or just the word "colour". If the latter, then your regex fails on lines that contain the word colour twice (can you see why?). Much better just to search for colou?rs? only and have that replaced with something else.

      You might also want to read up on "backreferences" - ways to remember parts of your match and reuse them later.

      Lastly, to match a line that does NOT contain the word colour, you need the following Perl regex (which is a bit more difficult):

      ^(?:(?!colou?rs?).)*$\r\n

      or possibly

      ^(?:(?!\bcolou?rs?\b).)*$\r\n

      if you do want to allow lines that contain words like "watercolor" or "colorize", just not "colour" all by itself.

      The possibilities are nearly endless, but you need to learn a few basics in order to enjoy them. Tools like RegexBuddy or RegexMagic can be very helpful here, too.


      Little explanation:

      ^ and $ anchor the search at start/end of line; \r\n make sure that the linebreak characters are also matched.

      (?: starts a non-capturing group - we need the group because we later want to multiply it, but we don't need the character it matches.

      (?!XXX) is a "lookahead assertion": It matches (without using up any characters of the text) if it is impossible to match the characters XXX starting at the current position

      . matches any character except linebreaks.

      * makes the preceding (non-capturing) group optional and unlimited - a match from zero times to indefinitely many times is allowed, thereby matching lines of any length.

      Altogether, the regex steps through the line character by character, making sure that it's impossible from any starting position within the line to match the string colou?rs?.

      \b is another assertion that matches if the regex engine is currently at a word boundary. So it will match "color" in "the color purple" but not in "give me those watercolors".

      Cheers,
      Tim