User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

Find, replace, find in files, replace in files, regular expressions
6 posts Page 1 of 1
Hi everyone

I have a task

Input
<p class="indented">»Er ist ein <span class="bold">Freund von Stone. Es geht um eine <span class="gray font3">Zimmerrenovierung, <span class="bold">kann also <span class="italic">spät werden, weil meine anderen Termine jetzt</span></span> alle</span> nach hinten <span class="italic">rutschen</span>.«</span></p>
<p class="indented"><span class="italic">Freund von Stone. Es geht um eine Zimmerrenovierung, <span class="bold">kann <span class="gray font4">also <span class="italic">spät werden, weil meine anderen Termine jetzt</span></span> alle</span> nach hinten <span class="italic">rutschen</span>.«</span></p>

Output
<p class="indented">»Er ist ein <b>Freund von Stone. Es geht um eine <span class="gray font3">Zimmerrenovierung, <b>kann also <i>spät werden, weil meine anderen Termine jetzt</i></b> alle</span> nach hinten <i>rutschen</i>.«</b></p>
<p class="indented"><i>Freund von Stone. Es geht um eine Zimmerrenovierung, <b>kann <span class="gray font2">also <i>spät werden, weil meine anderen Termine jetzt</i></span> alle</b> nach hinten <i>rutschen</i>.«</i></p>

How to replace tag <span class="bold"> with <b>, <span class="italic"> with <i> like adobe example
Hi Samir,

try this Perl regex:
(?s)(<(span) class="(?(3)[^"]++|(?|(b)old|(i)talic))">((?>(?:(?!<\2\b)(?!</\2\b).)++|(?1))*+)</\2>)

and replace:
<\3>\4\</\3>

Unfortunately you have to repeate Replace All until nothing is replaced because of nested blocks.

BR, Fleggy
Thanks fleggy
This pattern has worked very good
I am not understand properly this pattern, please can you explain this pattern in details....
Hi Samir

hope this will help you. I am not a very good teacher :)

Code: Select all
(?s)                          -- . matches also CR/LF
(                             -- the 1st group begins. Used in recursion
  <(span) class="             -- match <span class=" and capture span in the 2nd group
  (?(3)                       -- test if the group 3 has been already captured
    [^"]++                    -- YES: we are already in the recursion and whatever can be matched until the "
    |                         -- NO: we are on the top level and only bold or italic can be matched (the very beginning tag must contain bold or italic)
    (?|                       -- the branch reset group begins
      (b)old|(i)talic         -- match bold or italic and capture the first letter in the 3rd group
    )                         -- end of the branch reset group
  )                           -- end of the test
  ">                          -- match the rest of the tag
  (                           -- the 4th group begins. It matches the inner part between <span><\span>
                              -- a recursion will be used to find the correct closing tag
                              -- the inner part must consist either from any text but opening/closing tag or from another <span></span> block or can be empty
    (?>                       -- atomic group for better performance
      (?:                     -- non-capturing group for better performance
        (?!<\2\b)(?!</\2\b).  -- match any character if the current text is not <span or <\span
      )++                     -- and possessively repeat
      |                       -- OR
      (?1)                    -- if the current text is <span or <\span then try to match it recursively
    )*+                       -- this part can repeat 0 or more times possessively
  )                           -- end of the 4th group
  </\2>                       -- match the closing tag
)                             -- end of the 1st group
Hi Samir,

this modified regex is better:

(?s)(<(span)\b(?(3)[^>]*+|(?: class="(?|(b)old|(i)talic))")>((?>(?:(?!<\2\b)(?!</\2\b).)++|(?1))*+)</\2>)

because now it works with any attributes in the tag <span> in the text between <span class="bold"/"italic"> and </span>
E.G. the previous one fails in this text:

<span class="bold">First part<span class="italic">nested<span style="color:blue"> part</span></span>final part</span>

BR, Fleggy
And here is a version which does not need a condition. I think this regex is more comprehensible. On the top level it begins only with <span class="bold"/"italic"> and the inner recursion begins with any form of tag <span>.

(?s)<(span) class="(?|(b)old|(i)talic)">(((?>(?:(?!<\1\b)(?!</\1\b).)++|<\1\b[^>]*+>(?3))*+)</\1>)

and replace with:
<\2>\4\</\2>

BR, Fleggy


PeM
You can simply modify the beginning part

<(span) class="(?|(b)old|(i)talic)">

to match any other tag and keep the rest of the regex same if the beginning part contains two capturing groups. Otherwise you have to renumber them accordingly.
For example:

<(div) style="padding:(\d+)px">(((?>(?:(?!<\1\b)(?!</\1\b).)++|<\1\b[^>]*+>(?3))*+)</\1>)

or without the second capturing group inside the tag

<(div) style="padding:\d+px">(((?>(?:(?!<\1\b)(?!</\1\b).)++|<\1\b[^>]*+>(?2))*+)</\1>)
6 posts Page 1 of 1