Problem with OR expression with using Unix engine

Problem with OR expression with using Unix engine

1
NewbieNewbie
1

    Apr 14, 2011#1

    Hi,

    I have a problem with a Find/Regular Expression.

    Version of UltraEdit: v15.00.0.1043
    Regular expression engine: Unix

    In a text file I need to look for a specific line where a part of this line match my regular expression.

    Here is my regular expression:

    Code: Select all

    ;2011(03|04)[0-9]*;[0-9]*;.*;.*;[0-9]*;SIG;ORL;(PDR|RRS);
    My regular expression work well for this example:
    166004;000064168098 - 000064168098;571277388;20;000064168098;20110301133913;20110302101021;MAJ DE DONNEES 2011-03-02CG;BJ4768;20110405080225;SIG;ORL;PDR;M8;RNC 266;1;RF;2;T

    But doesn't work in this one:
    166004;000064168098 - 000064168098;571277388;20;000064168098;20110301133913;20110302101021;MAJ DE DONNEES 2011-03-02CG;BJ4768;20110405080225;SIG;ORL;RRS;M8;RNC 266;1;RF;2;T

    It's seems the OR expression "(PDR|RRS)" dont work well for me, at least when the string "RRS" is used... Any idea?

    Thank you!

    6,686585
    Grand MasterGrand Master
    6,686585

      Apr 15, 2011#2

      You raised here an interesting problem with the Unix regular expression engine. Also in UE v17.00 your search string fails. But using the Perl compatible regular expression with the same search string works. I played with search strings to find out the reason and found something interesting.

      Search string ;.*; is non greedy which means it matches only 1 data in your CSV file with the surrounding semicolons, in other words as less as possible to return a true result. But using ;.*;SIG makes the expression greedy. This search string matches now everything from first semicolon in a line up to word SIG, in other words as much as possible to return a true result.

      And that unexpected greedy behavior lets the OR expression at the end fail. Possible workarounds are using the Perl compatible regular expression with your string where .* is always non greedy, or the UltraEdit regular expression engine where [color]*[/color] is available for non greedy and ?+ for greedy search strings, or using following search string for the Unix engine:

      ;20110[34][0-9]*;[0-9]*;[^;]*;[^;]*;[0-9]*;SIG;ORL;(PDR|RRS);

      As you can see I have replaced .* by [^;]* to get the expression non greedy. 0[34] is just optimized for (03|04) which would also work.