Tapatalk

Why can't find text between HTML tags?

Why can't find text between HTML tags?

32
Basic UserBasic User
32

    Aug 25, 2006#1

    Hello

    I've read the online help and archives here, but I don't understand why UE can't run the following regex on the following text (I want to remove all text between the two tags):

    </STRUCTURE>
    BLABLA
    BLABLA
    <GROUP>

    Find = </STRUCTURE>.+?<GROUP>
    Replace =

    I've tried the Unix style, and the PCRE style, with no difference. It doesn't seem like <, /, and > are forbidden characters. Any idea?

    Thank you.

    344
    MasterMaster
    344

      Aug 26, 2006#2

      Hi,

      if it was in ONE line, this works:
      replace </STRUCTURE>.*<GROUP>
      with </STRUCTURE><GROUP>

      Perl regexp, UE 12.10a

      rds Bego
      Normally using all newest english version incl. each hotfix. Win 10 64 bit

      6,685587
      Grand MasterGrand Master
      6,685587

        Aug 26, 2006#3

        Why it does not work on multiple lines is explained in help of UE in the regular expressions article:

        + ... Matches the preceding character one or more times. Does not match repeated newlines.

        . ... Matches any single character except a newline character. Does not match repeated newlines.

        The following Unix style regular expression will also work on multiple lines. But be very careful when using it because the [^<]* expression which ignores line termination characters can lead to unexpected results because it selects sometimes MUCH more as you expect.

        Find: </STRUCTURE>[^<]*<GROUP>

        Best would be to use a macro with the commands Find, Find Select and Delete in a Loop. There are enough macros in the Macro forum where you can see this macro method of block deletion.
        Best regards from an UC/UE/UES for Windows user from Austria

        32
        Basic UserBasic User
        32

          Jul 28, 2010#4

          Thanks Mofi for the explanations.

          4

            Aug 06, 2010#5

            Code: Select all

            Find: (?s)</STRUCTURE>.+?<GROUP>
            Replace: </STRUCTURE><GROUP>
            
            or 
            
            Find: (?si)(</structure>)[^<].*?(<group>)
            Replace: $1$2
            The above Perl style regexp works on multi lines.

            The inline modifier of (?s) processes the search as if it was one complete string so that . matches any char including new line.

            The second example adds i modifer (ignore case) and the parenthesis around </structure> and <group> store the content in variable $1/$2 used in the replace. The other addition is the char class of [^<] ... e.g. not a "<" char, which prevents the regexp from matching </STRUCTURE>GROUP></STRUCTURE><GROUP> once all the replaces have been made.