Strange unix regex problem

Strange unix regex problem

6
NewbieNewbie
6

    Apr 28, 2006#1

    Hi all. What am I doing wrong? Unix-style regex chosen, regex is ticked in the search dialog. I'm using Find in Files.

    ^[^\p']*(Connection[.]Execute|RecordSet[.]Open)

    This doesn't work. It only finds "Connection.Execute" if it occurs on a line and doesn't have any single-quote character before it (i.e. commented out).

    It should also find instances of "RecordSet.Open". But it doesn't. However,

    ^[^\p']*RecordSet[.]Open

    Works fine. So what am I doing wrong here? Am I screwing up the OR syntax? Been banging my head against this for an hour.

    Using UE 11.10b.

      Apr 28, 2006#2

      Wow, I tried the UE regex syntax. And I got the following really weird problem:

      %[~']*^{Connection.Execute^}^{RecordSet.Open^}

      Using this, I got 639 matches.

      %[~']*^{RecordSet.Open^}^{Connection.Execute^}

      Using this, I got 51 matches. Isn't that insane? The first one caught all of what I'm looking for, the second expression caught only the RecordSet.Open subset.

      Shouldn't they be functionally the same? Those expressions basically have an identical bit at the beginning, and then an A or B expression.

      So when I switch it to B or A, it should still match all the same things. Why isn't it?

      112
      Power UserPower User
      112

        Apr 28, 2006#3

        raddygast,

        What OS are you running? See my recent post regarding strange UE RE behaviour on W2K...

        HTH,
        Paolo
        There is no such thing as an inconsistently correct system...
        Therefore, aim for consistency; in the expectation of reaching correctness!

        6,686585
        Grand MasterGrand Master
        6,686585

          Apr 28, 2006#4

          See my post at Keyword sorting macro why

          %[~']*^{Connection.Execute^}^{RecordSet.Open^}

          and

          %[~']*^{RecordSet.Open^}^{Connection.Execute^}

          returns different results.

          Maybe following regular expression with MatchCase enabled is better (not tested):

          %[~'][~RC^r^n]+^{RecordSet.Open^}^{Connection.Execute^}

          If this regular expression is also not working, a macro is needed for this find or somebody finds a working Unix/Perl regexp for this job.
          Best regards from an UC/UE/UES for Windows user from Austria

          6
          NewbieNewbie
          6

            Apr 28, 2006#5

            I'm using XP SP2.

            I'm really not quite sure what macros have to do with this. I take it they're just a workaround?

            Are you saying that my regex "looks ok" but that it must be a bug with the UltraEdit matching engine? Reading your posts, it looks like the OR ability is severely messed up if there are any other conditions in the regex.

            What I don't understand is why my first regex seems to match both -- but I'll read your posts over again and try to verify.

            BTW, are there any shareware tools out there that basically ONLY do regex searches on text files, returning the same kind of result list as UE, and where you can click on all the returned "lines" that match, and it'll fire up your default editor at the right line? Wishful thinking, but I don't mind doing my regex searches outside of UE for now.

              Apr 28, 2006#6

              Yikes. More findings:

              Using this as a test file:

              Code: Select all

              This is an unmatched line
              
              The next line contains a matched RS
              something something RecordSet.Open something something
              
              The next line contains a matched CE
              something something Connection.Execute something something
              
              The next line contains an RS AND a CE
              something something RecordSet.Open something Connection.Execute something
              
              The next line contains a commented RS
              	'something something RecordSet.Open something something
              
              The next line contains a commented CE
              'something something Connection.Execute something something
              
              The next line contains a commented RS AND a CE
              'something something RecordSet.Open something Connection.Execute something
              
              If I use a very simple or (whether Match Word is checked or not), using UE-style:

              ^{RecordSet.Open^}^{Connection.Execute^}

              I get all six matches.

              When I use:

              %[~'^p]*^{RecordSet.Open^}^{Connection.Execute^}

              Then I match every uncommented RecordSet.Open, but not Connection.Execute (except the one that is contained in the same line as a RecordSet.Open).

              On top of that, I match an additional line, the commented one with RecordSet.Open. That's because that line is indented before the comment. Am I messing up the regex here?

              Anyway, I'm freaking out about this. I think OR expressions break, in that the second expression after the OR doesn't work, when there is anything other than a bare A OR B statement in the regex.

              I tried the expression you provided and it doesn't work.

              I messed around a lot more with Unix-style.

              (^[^'\p]*Connection[.]Execute)|(^[^'\p]*RecordSet[.]Open)

              This one matches only the two uncommented lines that contain cn.execute. Not the line that contains a valid uncommented RecordSet.Open. So OR's are really broken for me. :(

              6,686585
              Grand MasterGrand Master
              6,686585

                Apr 28, 2006#7

                The problem here with your UltraEdit style regex is that you are not thinking in the same way as the search string parser does. You think here following:

                Search in the file for a line containing the string "RecordSet.Open" OR "Connection.Execute". If found, check if the line does not start with a '. If this condition is also TRUE - string found.

                What I think, how the search string parser does:

                Find the start of a line. Check if is not a ' - OK. The parser is now at column 2.
                * means match any number of occurrences of any character except newline.
                Well, start matching at current position (column 2) and stop at - that's the problem.

                Where should the * matching stop now? Normally if the following string is found or the line termination. I think the * search matching algorithm is designed for stop when next simple string is found. The string following the * can be also only a single character (1 byte string).

                But in your regular expression there is no simple string after the *. There is again a regular expression which can match different strings. The search engines solves this conflict by simple use only the first string of the OR expression for the stop condition of the * match. And that's the reason why it fails.

                Same problem also with something like "A*[~BC]". This also does not work. The * match algorithm needs a simple string or character as stop condition.

                How to solve this * matching stop condition problem. Well, I have already written a regular expression which is better:

                %[~'][~RC^r^n]+^{RecordSet.Open^}^{Connection.Execute^} with option Match Case is also enabled.

                Executing this search string at your file example finds it at line 4, 7, 10 and 13. If the search string should also ignore preceding white-spaces, the following UltraEdit style regex will do it with option Match Case enabled:

                %[~' ^t^r^n]+[~RC^r^n]+^{RecordSet.Open^}^{Connection.Execute^}

                But what happens when instead of "something something" at line 4 a string is before which contains an upper case R or C? Yes, then this line will be also not found.

                I don't have an idea how to specify a regex search string which returns 100% correct result for even this situation. Wait ... I have an idea ... yes, this works in UltraEdit style:

                %[~' ^t^r^n]+^{*RecordSet.Open^}^{*Connection.Execute^}

                The * match inside every OR expression, that's the trick to do this search.

                By the way: My prefered file manager Total Commander supports Perl regex search for text in files (and also regex for the file names) and creates a list of files (not a list of the lines where the string is found). You can click on a file and press F3 to view the file content. Once again F3 (Find next) and you can see the line where the search string was found. If this line must be edited, back to the Total Commander window and F4 to open file in the editor while the viewer window is still open. Reads much more complicated as it really is if you have done these steps several hundreds of times. I nearly always use Total Commander for search something in files instead of the Find In Files command of UE because the search of Total Commander is much more powerful which I also have reported to IDM.
                Best regards from an UC/UE/UES for Windows user from Austria

                6
                NewbieNewbie
                6

                  May 06, 2006#8

                  Thanks for that, Mofi.

                  Do you think changing the unix-style regex in UE would help as well? It's very bizarre, this problem. You've shown me why it fails... but clearly, the parser is deficient when OR clauses come into the equation.

                  I've taken to using Actual Search & Replace by http://www.divlocsoft.com. It is a fantastic tool; exactly what I need. Unfortunately you have to pay for it, but hopefully I can finish up the work I need to do before the free trial expires.

                  Is the newest version of UE better or worse for regex? It seems it uses perl-style, but there are big problems with this. That's one of the reasons I don't wanna upgrade; that, and the fact that syntax highlighting is apparently horrific in version 12, with multi-language files like .asp files.

                  6,686585
                  Grand MasterGrand Master
                  6,686585

                    May 07, 2006#9

                    I use the UE regex style since I use UE. Regex was new for me as I started with UE, so I decided to learn the UE regex syntax instead of the Unix syntax. The UE style is still a little bit better than the Unix style because some things can be only done with UE style.

                    The Perl style has much more capabilities than the UE or Unix style. But there are currently some problems with it. I'm sure, the IDM developers will fix all of them within 3 months. If you want and can wait, I suggest to wait for v12.10 of UE. If you have purchased a license for v11, the upgrade to v12.10 will be still free for you.

                    I don't know about problems with the multi-language syntax highlighting of v12. Are there differences to the syntax highlighting of v11?

                    However, if you don't like the multi-language syntax highlighting feature, you can disable it by deleting all *_LANG keywords in the wordfile and remove most file extensions from the HTML definition and assign it to the appropriate language definition. See Syntax highlighting between different languages.
                    Best regards from an UC/UE/UES for Windows user from Austria

                    6
                    NewbieNewbie
                    6

                      May 08, 2006#10

                      I don't have too many problems with syntax highlighting, though occasionally UE gets confused when I enter stuff outside of ASP code block (close) and (open) tags. Everything goes commented-colored, until I switch to another tab and switch back (then it's ok). Also, some problems with multiple-line strings etc. (also in Perl).

                      However, I read on these forums that v12 highlighting is messed up with multi-lang... which is why i'd rather wait until I hear that all is well.

                      I have to say; I've just been using the "templates" feature of UE and it's a lifesaver for what I'm doing!