Replace fails when using lookahead and lookbehind

Replace fails when using lookahead and lookbehind

2

    Aug 23, 2007#1

    I am using ultraedit 13.10+1 and attempting to use a perl style regex to find spaces between a word and a number, and replace the found text with a delimiter. In the file example below I would expect to find the whitespace between the word preceding the date and the date on each line.

    # is a place holder for a tab character.

    Preliminary Project Documents Created#6/22/07
    Preliminary Project Documents Sent to Client#6/29/07
    Conduct Technical Planning Meeting#5/18/07
    Customer Orders Hardware#6/12/07

    This regex finds and highlights the correct whitespace on each line.

    (?<=\w)[\s](?=\d)

    When I try to replace the whitespace with "XX" nothing happens. The text is found, I press replace, and the find goes to the next instance. The XX is not written to the file.

    Any idea's.

    Thanks,

    Pete.

    344
    MasterMaster
    344

      Aug 23, 2007#2

      Hi Pete

      try this:
      replace

      Code: Select all

      ([A-Za-z])\s*([0-9])
      with

      Code: Select all

      \1XX\2
      rds Bego
      Normally using all newest english version incl. each hotfix. Win 10 64 bit

      236
      MasterMaster
      236

        Aug 23, 2007#3

        Hi Pete,

        this is a known bug in UE'S Perl regex engine. Positive lookaround is broken - searches work, replaces don't (funnily enough, the replace dialog tells you that it did perform n replaces and also marks the file as changed, but it doesn't actually do anything... negative lookaround works fine, by the way.

        I have written to IDM support several times about this ; they have been confirming the problem each time and said they'd have their technicians look into it. Maybe it'll get boosted on the list of priorities if you send them a mail at [email protected] - I'd really appreciate it.

        As a workaround, and since negative lookaround does work, the following regex works on your sample data; make sure, though, that it won't produce unwanted matches with your actual data:

        Code: Select all

        (?!<\W) (?!\D)
        HTH,
        Tim

        edit: Hi Bego, you were faster than me; your regex will work too (but slower), and the * should probably be replaced by a + or else it will also replace "B2B" by "BXX2B"...

        344
        MasterMaster
        344

          Aug 23, 2007#4

          Hi Tim,

          correct, so the "easy" non-lookaround string looks better like this:

          Code: Select all

          ([A-Za-z])\s+([0-9])
          rds Bego
          Normally using all newest english version incl. each hotfix. Win 10 64 bit

          236
          MasterMaster
          236

            Aug 23, 2007#5

            You mean \s+ :)

            And (if you're using Perl regexes) the replacement string should be \1XX\2 (I don't know the UE/Unix styles).

            344
            MasterMaster
            344

              Aug 23, 2007#6

              Oh boy, I shouldn't do 2 things at one time... only women can do this (they say) ;-)

              corrected it above.
              Normally using all newest english version incl. each hotfix. Win 10 64 bit

              2

                Aug 23, 2007#7

                Thanks for the replies and the alternatives. Sometimes I get focused on a solution that doesn't work when I should look for alternatives.

                Thanks to the mod who fixed my spelling as well.

                Pete.

                236
                MasterMaster
                236

                  Apr 29, 2008#8

                  Good news: In 14.00a+2, positive lookaround has been fixed. This version isn't yet available for download (April 29th) but surely will be soon. That's a great leap forward for Perl regular expressions and will speed up complex regex operations a lot. Great work, IDM! So keep checking for new hotfixes :)

                  9
                  NewbieNewbie
                  9

                    May 27, 2008#9

                    Are you sure that this has been fixed?

                    In UEdit 14.00b, with the following text snippet (newline after the "---"):

                    Code: Select all

                    ---
                    avast! Antivirus: Inbound message clean.
                    the Perl regexp:

                    (avast! Antivirus)(?<!---\r\n)

                    succeeds, but:

                    (avast! Antivirus)(?<=---\r\n)

                    fails.

                    Does that not mean that lookbehind is still broken?


                    Alan

                    236
                    MasterMaster
                    236

                      May 27, 2008#10

                      Wait a second, your regex is wrong - the lookbehind should be at the beginning of the regex. But even with the correct regex, UE doesn't match correctly.

                      That's a more general problem, though: UE's regex engine is line-based. This leads to lookbehind not working beyond line breaks, and to greedy quantifiers losing their greediness if a match is possible on the current line (but the correct match would be beyond a linebreak). So in most daily use cases, lookaround works, but there are some limitations. I had been hoping for better regex support for a long time (not only for search/replace, but for syntax highlighting, code folding etc.), but have found that most users don't seem to care enough about this for IDM to put this high on their to-do list. If you need really good regex support, try EditPadPro.

                      9
                      NewbieNewbie
                      9

                        May 28, 2008#11

                        Hi Tim,
                        my bad copy-and-paste; the lookbehind does come first in the original macro (the snippet is part of a large macro to clean up mbox format emails).

                        I'm sorry to hear the confirmation that lookaround is still broken. I recently finished a long dialogue with IDM support (just prior to the release of 14.00a) on the performance of UltraEdit Perl RegExps and I thought my problems were over.

                        Oh well, I'll just have to email Troy again...

                        Thanks for the input.

                        Alan

                        236
                        MasterMaster
                        236

                          May 28, 2008#12

                          Well, it's not exactly lookaround that's broken. Since the entire regex engine is line-based, regexes that involve multiple lines can get risky. Mostly, it's "corner cases", but every now and then, you get unexpected/incorrect results. Primary reason for me to switch to EPP, most other users don't seem to mind...