Different result on Count All for % in comparison to $

Different result on Count All for % in comparison to $

2
NewbieNewbie
2

    Oct 12, 2011#1

    Hi

    Text:

    1
    2
    3
    4
    5
    6
    7
    8
    9


    With UltraEdit Regular expression Engine
    Search -> Find -> Find What:% ->Count All
    The result is 9

    With UltraEdit Regular expression Engine
    Search -> Find -> Find What:$ ->Count All
    The result is 18
    Why is the result not 9?


    Thank you

    6,602548
    Grand MasterGrand Master
    6,602548

      Oct 12, 2011#2

      Well, both % and $ do not find really a string. They are just anchors for a search. In other words they do not match (select) characters.

      % means start a search at beginning of a line which means start a search after carriage-return + line-feed for a DOS file (CRLF), or after line-feed for a UNIX file (LF) or after carriage-return for a MAC file (CR). It does not match the line ending characters itself. Therefore % should be used only once in a search string, at beginning of the search string.

      $ means stop a search at end of a line which means stop a search before CRLF for a DOS file, or before LF for a UNIX file or before CR for a MAC file. It does not match the line ending characters itself. Therefore $ should be used only once in a search string, at end of the search string.

      Because these two special characters are used to define just the position for start or end of a search not matching any characters, a tagged expression using ^(...^) should never include % and $. So correct is %^(*^)$, but ^(%*$^) would be not correct although it can work sometimes, but not granted.

      Because % and $ do not match anything, it is in real a misuse of using only one of these two regular expressions without any other character in the search string. Well, it is possible to search for just % and use as replace string any string which should be inserted at beginning of every line (=after every line ending), but it is not really a correct usage. It works, but it is nevertheless not a correct usage. A misuse is also searching for just $ and use as replace string any string which should be inserted at end of a line. This works and I have also suggested this a few times, but in real it is a misuse of the end of line anchor expression.

      Both of your searches are also a misuse of these 2 anchor expressions. Searching for just % does not match anything. But when you execute a find using just % manually several times (Ctrl+F and F3), you will see that the caret jumps down in the file from the beginning of the current line to beginning of the next line.

      If you manually execute a find for just $ you will see what happens for a DOS file and why the result is doubled. DOS files have the CRLF pair as line termination. $ means before line ending. The first find moves caret to end of line 1 because this is the position before line ending character CR. If you execute now with F3 a second find for just $, you will see that the caret is blinking on character right. It is now positioned between CR and LF. The LF character is a line termination respectively line ending character and therefore positioning the caret before LF is correct.

      So while % is for starting a search after ANY line ending character and therefore CRLF is interpreted like LF only or CR only, $ is for ending a search before ANY line ending character which results in a separated interpretation of CRLF pair.

      It's hair-splitting to discuss what should be the correct behavior for a search containing in search string only % or only $ for files with the 3 type of line terminators because % and $ should be never used without any other character in a regular expression search string.

      To get the number of lines, an UE regular expression or a non regular expression search for ^p would be correct for a DOS file because that counts how many CRLF a file has and it really matches the line termination character pair. Of course the result is only correct if the last line of the file has also a line termination, otherwise there is just a string at end of file, but not really a line.

      2
      NewbieNewbie
      2

        Oct 12, 2011#3

        Thank you very much.