Tapatalk

Replace fails when using lookahead and lookbehind

Replace fails when using lookahead and lookbehind

2

Aug 23, 2007#1

I am using ultraedit 13.10+1 and attempting to use a perl style regex to find spaces between a word and a number, and replace the found text with a delimiter. In the file example below I would expect to find the whitespace between the word preceding the date and the date on each line.

# is a place holder for a tab character.

Preliminary Project Documents Created#6/22/07
Preliminary Project Documents Sent to Client#6/29/07
Conduct Technical Planning Meeting#5/18/07
Customer Orders Hardware#6/12/07

This regex finds and highlights the correct whitespace on each line.

(?<=\w)[\s](?=\d)

When I try to replace the whitespace with "XX" nothing happens. The text is found, I press replace, and the find goes to the next instance. The XX is not written to the file.

Any idea's.

Thanks,

Pete.

344
MasterMaster
344

Aug 23, 2007#2

Hi Pete

try this:
replace

Code: Select all

([A-Za-z])\s*([0-9])
with

Code: Select all

\1XX\2
rds Bego
Normally using all newest english version incl. each hotfix. Win 10 64 bit

236
MasterMaster
236

Aug 23, 2007#3

Hi Pete,

this is a known bug in UE'S Perl regex engine. Positive lookaround is broken - searches work, replaces don't (funnily enough, the replace dialog tells you that it did perform n replaces and also marks the file as changed, but it doesn't actually do anything... negative lookaround works fine, by the way.

I have written to IDM support several times about this ; they have been confirming the problem each time and said they'd have their technicians look into it. Maybe it'll get boosted on the list of priorities if you send them a mail at support@idmcomp.com - I'd really appreciate it.

As a workaround, and since negative lookaround does work, the following regex works on your sample data; make sure, though, that it won't produce unwanted matches with your actual data:

Code: Select all

(?!<\W) (?!\D)
HTH,
Tim

edit: Hi Bego, you were faster than me; your regex will work too (but slower), and the * should probably be replaced by a + or else it will also replace "B2B" by "BXX2B"...

344
MasterMaster
344

Aug 23, 2007#4

Hi Tim,

correct, so the "easy" non-lookaround string looks better like this:

Code: Select all

([A-Za-z])\s+([0-9])
rds Bego
Normally using all newest english version incl. each hotfix. Win 10 64 bit

236
MasterMaster
236

Aug 23, 2007#5

You mean \s+ :)

And (if you're using Perl regexes) the replacement string should be \1XX\2 (I don't know the UE/Unix styles).

344
MasterMaster
344

Aug 23, 2007#6

Oh boy, I shouldn't do 2 things at one time... only women can do this (they say) ;-)

corrected it above.
Normally using all newest english version incl. each hotfix. Win 10 64 bit

2

Aug 23, 2007#7

Thanks for the replies and the alternatives. Sometimes I get focused on a solution that doesn't work when I should look for alternatives.

Thanks to the mod who fixed my spelling as well.

Pete.

236
MasterMaster
236

Apr 29, 2008#8

Good news: In 14.00a+2, positive lookaround has been fixed. This version isn't yet available for download (April 29th) but surely will be soon. That's a great leap forward for Perl regular expressions and will speed up complex regex operations a lot. Great work, IDM! So keep checking for new hotfixes :)

9
NewbieNewbie
9

May 27, 2008#9

Are you sure that this has been fixed?

In UEdit 14.00b, with the following text snippet (newline after the "---"):

Code: Select all

---
avast! Antivirus: Inbound message clean.
the Perl regexp:

(avast! Antivirus)(?<!---\r\n)

succeeds, but:

(avast! Antivirus)(?<=---\r\n)

fails.

Does that not mean that lookbehind is still broken?


Alan

236
MasterMaster
236

May 27, 2008#10

Wait a second, your regex is wrong - the lookbehind should be at the beginning of the regex. But even with the correct regex, UE doesn't match correctly.

That's a more general problem, though: UE's regex engine is line-based. This leads to lookbehind not working beyond line breaks, and to greedy quantifiers losing their greediness if a match is possible on the current line (but the correct match would be beyond a linebreak). So in most daily use cases, lookaround works, but there are some limitations. I had been hoping for better regex support for a long time (not only for search/replace, but for syntax highlighting, code folding etc.), but have found that most users don't seem to care enough about this for IDM to put this high on their to-do list. If you need really good regex support, try EditPadPro.

9
NewbieNewbie
9

May 28, 2008#11

Hi Tim,
my bad copy-and-paste; the lookbehind does come first in the original macro (the snippet is part of a large macro to clean up mbox format emails).

I'm sorry to hear the confirmation that lookaround is still broken. I recently finished a long dialogue with IDM support (just prior to the release of 14.00a) on the performance of UltraEdit Perl RegExps and I thought my problems were over.

Oh well, I'll just have to email Troy again...

Thanks for the input.

Alan

236
MasterMaster
236

May 28, 2008#12

Well, it's not exactly lookaround that's broken. Since the entire regex engine is line-based, regexes that involve multiple lines can get risky. Mostly, it's "corner cases", but every now and then, you get unexpected/incorrect results. Primary reason for me to switch to EPP, most other users don't seem to mind...