User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

Find, replace, find in files, replace in files, regular expressions
8 posts Page 1 of 1
I'm trying to find whether there are lines that are starting without a tag(i.e. the lines are for some reason broken in two or multiple lines) in a file using something similar like "^\w+" but the search would ignore anything inside "<math><LaTeX>...</LaTeX></math>" e.x. sample

Code: Select all
<para>The EPUB specification does not enforce or suggest a particular DRM scheme.</para>
<para>An ePub publication is delivered as a single file. This file is an unencrypted zipped archive containing
a set of interrelated resources.</para>
<para>Books with synchronized audio narration are created in EPUB 3 by using media overlay documents to describe SMIL.</para>
<math><LaTeX>\begin{align*}
whatever is written\\
a=ba+g
\end{align*}</LaTeX></math>
<para>Anything goes....</para>
<caption>MHTML – a webpage archive format used to combine resources
in a single document</caption>
<para>Some random stuff.</para>
<math><LaTeX>\begin{equation*}
0=a\ fs
\end{equation*}</LaTeX></math>

The search result should find
"a" from the line a set of interrelated resources.</para>
"in" from the line in a single document</caption>
and not
"whatever" from the line whatever is written\\
"a" from the line a=ba+g
"0" from the line 0=a\ fs

Can this be done somehow using the lookaheads and lookbehinds?
Hi,

As long as no nested tag exists between <LaTeX>...</LaTeX> this pattern should be OK:

(?s)^[^<](?=[^<]++(?!</LaTeX>))

Or (if other tag than </math> can follow the tag </LaTeX>):

(?s)^[^<](?=[^<]++(?!</LaTeX></math>))

Or if you want select everything and not just the first character:

(?s)^[^<]++(?!</LaTeX></math>)

BR, Fleggy
Thanks fleggy. :mrgreen:
Would you mind explaining how the regex works? :|
Hi,

I'll try to explain the simplest pattern :)

(?s)^[^<]++(?!</LaTeX></math>)

  • (?s)
    '.' matches also CR/LF (not necessary)
  • ^
    match the beginning of a line
  • [^<]++
    match possessively all characters which are not '<'
    the first + means one or more characters
    the second + means that the previous quantifier is possessive (keep the match even when any following tokens fail)
  • (?!</LaTeX></math>)
    a negative lookahead: we need the closing tags are not </LaTeX></math>
And the variant:

(?s)^[^<](?=[^<]++(?!</LaTeX></math>))

is almost the same. Only the first character matches and the rest is used as a positive lookahead.

BR, Fleggy
Hi Don,

If you expect operators such <, >, <=, <>, >> or << in the text then try this pattern:

(?s)^(?!<[^>]+>)(?:.(?!<[^ <=>]+>))++.(?!</LaTeX></math>$)

Or if you really want to select just the first character:

(?s)^(?!<[^>]+>).(?=(?:.(?!<[^ <=>]+>))++.(?!</LaTeX></math>$))

I added $ to check EOL. Maybe you won't need it.

BR, Fleggy
Thanks a lot again. :mrgreen:
Hi fleggy,

Can you tell me how this expression can be used in UltraEdit v14.10 as the expression "++" is not supported in the Perl regex of this version so "(?s)^[^<]++(?!</LaTeX></math>)" shows a invalid regex message.
Hi Don,

Sorry, I don't have UE14 to play with. Perhaps this will work:

(?s)^(?!<[^>]+>)(?>(?:.(?!<[^ <=>]+>))+).(?!</LaTeX></math>$)

I replaced the possessive modifier by atomic group and it works in your short sample.
Use the following pattern to modify any other Perl search expression containing ++.
X++ -> (?>X+)
e.g.
(?s)^[^<]++(?!</LaTeX></math>)
->
(?s)^(?>[^<]+)(?!</LaTeX></math>)

BR, Fleggy
8 posts Page 1 of 1