Problem with Perl Regex engine?

Problem with Perl Regex engine?

60
Advanced UserAdvanced User
60

    Jul 31, 2008#1

    I am reading a tutorial , following examples in regular expressions and in a Perl Example they show that

    Code: Select all

    [^\d\s]
    is not the same as

    Code: Select all

    [\D\S]
    however in UE 14.10.0.1024 I see the latter matches exactly what the first expression matches and not what I expected it to match.

    Example
    8x8
    The first regex should match the x and indeed it does. However the second regex should match the first 8 and it does not. It again matches the x. Am I correct that this is a problem in the regex engine for Perl ?

    236
    MasterMaster
    236

      Jul 31, 2008#2

      Seems you're right. Wow, that's a major blunder. I guess you should file a bug report.

      119
      Power UserPower User
      119

        Jul 31, 2008#3

        Interesting. I wonder if it's a UE problem or a bug in the Boost library. I suspect the latter.

        344
        MasterMaster
        344

          Jul 31, 2008#4

          Puuuh. That one is my heaviest problem with UE (or any underlying dll, whatever):
          From version to version, fix to fix, something in the regexp stuff if not working correctly anymore. Bugs reappear and so on.
          I can understand Tims change to E......d Pro very well...
          Normally using all newest english version incl. each hotfix. Win 10 64 bit

          236
          MasterMaster
          236

            Jul 31, 2008#5

            I'm pretty sure that it's a bug in the Boost library. I can think of no other way. In my opinion, IDM should seriously start to reconsider their business relationship with Boost. They probably won't get JGSoft's engine. Maybe PCRE?

            60
            Advanced UserAdvanced User
            60

              Jul 31, 2008#6

              Here is another Regex issue

              match the b this should match

              Code: Select all

              (q?)b\1
              this should not

              Code: Select all

              (q)?b\1
              however they both match!

              I know which regex engine I like so far. Now I am becoming a bit concerned about regex matching with UE.

              Maybe UE could run a few test regex scripts through and check the results.

              236
              MasterMaster
              236

                Jul 31, 2008#7

                I must say that I'm impressed at the rate that you're uncovering bugs in the regex engine. To be fair, this last example is probably a "corner case" that won't matter to many people. And the biggest show-stopper (inability to use positive lookaround) has been fixed in V14, so for most everyday work UE's regex engine is OK. But there are quite a few inconsistencies like the behaviour of greedy quantifiers when the potential match crosses a newline, or the engine skipping matches because the "transmission" bumps along too far and moves beyond the next correct match. And now these new bugs. Sad.

                60
                Advanced UserAdvanced User
                60

                  Jul 31, 2008#8

                  Amazingly I am reading a tutorial on regex and Perl expressions and examples and I am trying to understand how they work and why. The good news is I am understanding most 8O , ok some of it. :D It is taking many tries with tests for me, and I keep my examples and expected outputs in files. All of the help from people like you has really excited me about learning this.

                  4
                  NewbieNewbie
                  4

                    Sep 14, 2008#9

                    sklad2 wrote:I am reading a tutorial , following examples in regular expressions and in a Perl Example they show that

                    Code: Select all

                    [^\d\s]
                    is not the same as

                    Code: Select all

                    [\D\S]
                    however in UE 14.10.0.1024 I see the latter matches exactly what the first expression matches and not what I expected it to match.

                    Example
                    8x8
                    The first regex should match the x and indeed it does. However the second regex should match the first 8 and it does not. It again matches the x. Am I correct that this is a problem in the regex engine for Perl ?
                    Am I correct, that in the [...] syntax (character set), special escaped characters loose their meaning, so effectively this character set accepts anything but a backslash and characters 'd' or 's' ('D' or 'S' correspondingly in the second regex)?

                    As a result, both regexes would match the 'x'. I would be surprised if this character set matched first the '8' (the digit) and then the 'x'. A character set, as far as I know, is supposed to match a single character.

                    236
                    MasterMaster
                    236

                      Sep 14, 2008#10

                      Nope. Inside a character set, escaped characters don't lose their meaning - [\r\n] means "match a CR or an LF". However, most special characters (outside of character sets) have a different meaning inside character sets (e.g., ^, -, [, ], (, ) etc.). A character set does match a single character.

                      So [^\d\s] means "Match any character that is neither a digit nor a whitespace character". [\D\S], however, means "Match any character that is (not a digit) or (not a whitespace)" which is true for every single character imaginable - "8" is not a whitespace character, so it matches. " " is not a digit, so it matches, too. "x" is neither, so it matches, too. So UE's regex engine is wrong when it matches "x" but not "8".

                      4
                      NewbieNewbie
                      4

                        Sep 21, 2008#11

                        I have submitted an error report to support about the [\D\S] issue and they confirmed it being a bug in the Boost libraries. They have told me that they need now to submit a bug report to Boost library maker to get it fixed.