It looks like you are using the legacy UltraEdit regular expression engine. This engine is not as clearly defined as the modern Perl regular expression engine in case of a search string like the one you tried. What is the problem?
* matches ANY
character EXCEPT carriage return and line-feed
0 or more times.
[0-9]+ matches ANY
digit 1 or more times.
The problem: ANY
digit is a character range which is included also in character range ANY
character.
So we have here a character range with undetermined number of repeats followed by another character range included in first character range with also an undetermined number of repeats.
How should the expression engine know where to stop matching characters according to first character range and start matching characters according to second character range with second character range being included also in first character range?
A software programmer calls such a situation an
undefined behavior which means the result is unpredictable.
There is another method to match ANY
character EXCEPT carriage return and line-feed
0 or more times with UE regular expression engine:
?++
So what happens on using
<p>?++[0-9]+-[0-9]+
Well, it looks like the still
undefined behavior works for this example. But does it really work?
What about two or even more number ranges within a paragraph, i.e. something like:
Code: Select all
<p>See the pages 20-35, the tables 1-3 and the figures 3-5.</p><p>And take note of the comments 5-9.</p>
Most users would expect a match of everything from paragraph element to 20-35. But this search string matches everything from first paragraph element to 5-9 in second paragraph. In Perl regular expression documentations this matching behavior is described as greedy.
?++ matches as much as possible to produce nevertheless a positive match.
Let us look together on a simple example for difference between greedy and non greedy matching behavior with using UltraEdit regular expression engine where greedy and non greedy matching behavior cannot be really controlled as in Perl. In a file there is a line with a file name with full path.
The UE regex search string
[A-Z]:\*\ matches just
C:\Temp\ which is non greedy whereas
[A-Z]:\?++\ matches
C:\Temp\Test\ which is greedy.
But here is a definite string with a single character - the backslash - which defines where to stop matching any character except newline characters 0 or more times. Therefore both expressions work and match something.
But UE regex search string
<p>*[0-9]+-[0-9]+ is different as there is no fixed string after
* which determines the stop condition for matching any character except newline characters 0 or more times non greedy.
But let us look on the example above with the two paragraphs with in total 4 number ranges. What should be matched?
- Everything from beginning of each paragraph to the first number range in each paragraph;
- everything from beginning of each paragraph to the last number range in each paragraph,
- everything from beginning of first paragraph to the last number range in any paragraph on the line in the file.
Well, most likely it would be best to match just each number range within a paragraph, but this is very tricky as it can be seen on this example.