Counting TABs in a line

Counting TABs in a line

4

    Jan 31, 2007#1

    I'd like to create a macro to count tabs in each line of a text file. If the line has 4 tabs (desired), the macro goes to the next line w/o marking. If the line has more or fewer than 4 tabs, the macro marks the line (say "***" at the end of the line) and moves on.

    I've only written a couple of UE macros -- I have some VBA macro experience w/ M$ eXcel and word, but I used to write a lot of macros for WP4.0, 5.1, and 6.x > for Win. Dang, I miss PerfectScript.

    Peace,
    - Sequoia

    6,602548
    Grand MasterGrand Master
    6,602548

      Jan 31, 2007#2

      The following macro should to the job. It is designed to run on a DOS file (^p = CRLF). It first checks if the last line of the file as a line termination and if not, it appends one.

      Next it converts the character » at start of every line to a special string because the » is later used to mark the lines which have exactly 4 tabs.

      The next 2 regular expressions inserts at start of every line which has exactly 4 tabs the marker character ».

      The fourth regular expression marks now all lines without the marker character » at start of the line with "***" at the end of the line as requested.

      Next the marker character » at start of every line is removed and last the special escape string is converted back to character ». You can remove first and last regular expression find and replace all if your file never contains the character » at start of a line.

      The macro property Continue if a Find with Replace not found must be checked for this macro.

      InsertMode
      ColumnModeOff
      HexOff
      UnixReOff
      Bottom
      IfColNum 1
      Else
      "
      "
      EndIf
      Top
      Find RegExp "%»"
      Replace All "MaRkErChAr"
      Find RegExp "%^([~^t^p]++^t[~^t^p]++^t[~^t^p]++^t[~^t^p]++^t^)$"
      Replace All "»^1"
      Find RegExp "%^([~^t^p]++^t[~^t^p]++^t[~^t^p]++^t[~^t^p]++^t[~^t^p]+^)$"
      Replace All "»^1"
      Find RegExp "%^([~»]*^)$"
      Replace All "^1***"
      Find RegExp "%»"
      Replace All ""
      Find MatchCase RegExp "%MaRkErChAr"
      Replace All "»"

      Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
      Best regards from an UC/UE/UES for Windows user from Austria

      4

        Jan 31, 2007#3

        Thanks for the code, Mofi, but I should I have explained better. I want to evaluate each line of text in the file to make sure that it has a TOTAL of 4 tabs, which may or may not be sequential (i.e., the tabs may be in different positions within the line and may be separated by other text).

        The macro you wrote seems to be designed to find 4 tabs in sequence. Also, it marks some lines that do have the 4 tabs in sequence, doesn't mark others, and leaves an unknown character (looks like a box) at the end of some lines. I've attached a sample file before and after running the macro. I apologize for all the X's, but I had to replace potentially company-confidental text with nonsense quickly. (note: I tried to attach a file, but it did not work. E.mail me through the forum if you'd like me to send the sample file directly)

        Does the UE macro facility have variables? I'd like to be able to do something like (I know some of the commands below are not UE commands, and furthermore I'm mixing syntax from varous programming launguages -- it is just for illustration purposes):

        Set Variable TabCount = 0

        Code: Select all

        While Not at end of file
        
          Sub: CountTabs
          While NextCharacter is not ^p
             If NextCharacter is ^t
               Set Variable TabCount = (TabCount + 1)
             End If
          Key RIGHT ARROW
          EndWhile
          If TabCount <> 4
               ***
          EndIf
          Key DOWN ARROW
          Set Variable TabCount = 0
          EndSub
        
         GoSub CountTabs
        
        EndWhile
        I know that's sloppy, but I hope it gets the point across.

        Thank you so much for your help.

        Peace,
        - Ssequoia

        6,602548
        Grand MasterGrand Master
        6,602548

          Feb 01, 2007#4

          treedude2525 wrote:I want to evaluate each line of text in the file to make sure that it has a TOTAL of 4 tabs, which may or may not be sequential (i.e., the tabs may be in different positions within the line and may be separated by other text).

          The macro you wrote seems to be designed to find 4 tabs in sequence.
          No! I have written the macro exactly for what you want. I have created quickly a test file to test it which has contained lines with less than 4 tabs, lines with more than 4 tabs, lines with exactly 4 tabs, some in sequence and some with text between. And the macro worked perfect with UE v11.20b and UE v12.20b+1.
          treedude2525 wrote:Also, it marks some lines that do have the 4 tabs in sequence, doesn't mark others, and leaves an unknown character (looks like a box) at the end of some lines.
          It looks like your version of UltraEdit has a problem with the regex replaces or your file format is different. I tested it with a ASCII DOS file.
          treedude2525 wrote:Does the UE macro facility have variables?
          No, because the UltraEdit macro interface is not a real script language. That's the reason why I have developed the macro with regex searches. Your code example could be realized also with an UE macro, but it would be extremly complicated and extremly slow.

          Post some lines of your file enclosed in and use for example # as place holder for every tab in your example. I can convert the # character back to a real tab with a simple search and replace all. And tell me which format the file as - see second box in the status bar of the UltraEdit window.
          Best regards from an UC/UE/UES for Windows user from Austria

          4

            Feb 01, 2007#5

            Hi Mofi,

            First, I really appreciate your help. Thanks for the forum tip on code/code. Now I know how not to have the spaces "eaten" in my code examples.

            I'm working w/ UE ver. 10.20c -- I know it is a few generations behind the current ver., but my company isn't keen on spending money on software upgrades for "odd-ball" programs (I think I'm the only person in the office who uses UE at all, and one of the few outside of IT who even know what a text editor is). The file format is DOS. When I repoened my example file, it converted it back to DOS and stripped the "box" characters.

            Here is the example w/ hash marks substituted for tabs:

            Code: Select all

            // before running macro
            
            XX XXXXXXX (XXX / #XX) XXXXXXXXX XX XXXXXXXX XX XX XX/XX/XXXX####
            XXXXXXXXX#XXXXXXXX#XXXXXXX#XXXXXX#XXXXXX XXXXXXXXXXX
            XXX:####XXXXXXXXXX
            XXXX XXXX:####XXXXXXXXXX
            XXX-XXXXX:####XXXXXXXXXX
            XXXX XXXXX:####XXXXXXXXXX
            XXXX XXXXX:####XXXXXXXXXX
            XXXX XXXXXX XXXXXXXX:####XXXXXXXXXX
            XXX XXXXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXX XXXXXXXXX XXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXX XXXXXXXXXXX XXXXX:###XX/XX/XX#XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXX XXXXXXXX:###XX/XX/XX#XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX XXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX XXXX XXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX X XXXXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXX XXXXXXXXXXX XXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX XXXXXX XXXXXXXX/XXXXXXXX:####XXXXXXXXXX
            XXXXXXXXXXXX XXXXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXXXX XXXX ##XXXXXXXX:####XXXXXXXXX XXXXX - XXXXX XXXXX
            XXX#XXXXXXXXX XXXXXXXXX XXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXX XXXXXXXXXX XXXXX:####XXXXXXXXXX
            XXXXX XXXXXXX XXXXXXXX:####XXXXXXXXXX
            XXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX XXXXXXXX - XX XXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXX/XXXXX XXXXXXXX:####XXXXXXXXXX
            XXXXXX XXXXXXXX:####XXXXXXXXXX
            XXXXX/XXXXXX XXXXX:###XX/XX/XX#XXXXXXXXXX
            XXXX / XXXXX XXXXXXXXXXXX XXXXXXXX:####XXXXXXXXXX
            XXXXXXXX XXXXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXXX XXXXXXXXXXXX XXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
            XXXXXXXXX XXXXXXX:####XXXXXXXXX XXXXX - XXXXX XXXXX
            XXXXXXXXX XXXXX:##"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
            XXXXXXXXX XXXXXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
            XXXXXXXX XXXXXXXXXXXX XXXXXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
            XXXXXXXXXXXXX XXXXX:####XXXXXXXX XXXX XXXXXXXXX - XXXX XXXXXX
            XXXXXXXXXXXXX XXXX##XXXX:##XXXXXXXX XXXX XXXXXXXXX - XXXX XXXXXX
            XXXXX XXXX / XXXXXXX XXXXXXXX:#XX/XX/XX###XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX XXXXX XXX:#####XXXXXXXXXX
            XXXXXXX XXXXX XXX:####XXXXXXXXXX
            XXXXXXX XXXXX XXX:##XXXXXXXXXX
            
            // after running macro
            
            XX XXXXXXX (XXX / #XX) XXXXXXXXX XX XXXXXXXX XX XX XX/XX/XXXX####
            XXXXXXXXX#XXXXXXXX#XXXXXXX#XXXXXX#XXXXXX XXXXXXXXXXX
            XXX:####XXXXXXXXXX***
            XXXX XXXX:####XXXXXXXXXX
            XXX-XXXXX:####XXXXXXXXXX***
            XXXX XXXXX:####XXXXXXXXXX
            XXXX XXXXX:####XXXXXXXXXX***
            XXXX XXXXXX XXXXXXXX:####XXXXXXXXXX
            XXX XXXXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
            XXX XXXXXXXXX XXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXX XXXXXXXXXXX XXXXX:###XX/XX/XX#XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
            XXXXX XXXXXXXX:###XX/XX/XX#XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX XXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
            XXXXXXX XXXX XXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX X XXXXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
            XXXXXX XXXXXXXXXXX XXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX XXXXXX XXXXXXXX/XXXXXXXX:####XXXXXXXXXX***
            XXXXXXXXXXXX XXXXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXXXX XXXX ##XXXXXXXX:####XXXXXXXXX XXXXX - XXXXX XXXXX***
            XXX#XXXXXXXXX XXXXXXXXX XXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXX XXXXXXXXXX XXXXX:####XXXXXXXXXX
            XXXXX XXXXXXX XXXXXXXX:####XXXXXXXXXX***
            XXXXXXX XXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX XXXXXXXX - XX XXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX***
            XXX/XXXXX XXXXXXXX:####XXXXXXXXXX
            XXXXXX XXXXXXXX:####XXXXXXXXXX***
            XXXXX/XXXXXX XXXXX:###XX/XX/XX#XXXXXXXXXX
            XXXX / XXXXX XXXXXXXXXXXX XXXXXXXX:####XXXXXXXXXX***
            XXXXXXXX XXXXXXXXXX:####XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXXX XXXXXXXXXXXX XXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"***
            XXXXXXXXX XXXXXXX:####XXXXXXXXX XXXXX - XXXXX XXXXX
            XXXXXXXXX XXXXX:##"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"***
            XXXXXXXXX XXXXXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"
            XXXXXXXX XXXXXXXXXXXX XXXXXXXX:####"XXXXXXXXXXXX XXXXXXX, XX - XXXXX XXXXXX"***
            XXXXXXXXXXXXX XXXXX:####XXXXXXXX XXXX XXXXXXXXX - XXXX XXXXXX
            XXXXXXXXXXXXX XXXX##XXXX:##XXXXXXXX XXXX XXXXXXXXX - XXXX XXXXXX***
            XXXXX XXXX / XXXXXXX XXXXXXXX:#XX/XX/XX###XXXXXXXX XX XXXXXXXXXXXX - XXXXX XXXXXXX
            XXXXXXX XXXXX XXX:#####XXXXXXXXXX***
            XXXXXXX XXXXX XXX:####XXXXXXXXXX
            XXXXXXX XXXXX XXX:##XXXXXXXXXX***
            
            Thanks again for your help.

            - Sequoia

            6,602548
            Grand MasterGrand Master
            6,602548

              Feb 02, 2007#6

              Okay, I solved the problem. I already know that UE versions prior v11.10c (I think, I'm not sure) has problems with $ in some replaces. But I don't use such an outdated version anymore. I have just archived it for situations like this one. My first posted macro worked perfect on your example with UE v11.20b.

              Here is the macro which should work also for your version of UE. I have tested it with UE v10.20d. It avoids the problem with $ by inserting the character « at end of every line temporarily and uses this character as end of line indicator.

              The macro property Continue if a Find with Replace not found must be checked for this macro.

              InsertMode
              ColumnModeOff
              HexOff
              UnixReOff
              Bottom
              IfColNum 1
              Else
              "
              "
              EndIf
              Top
              Find RegExp "%»"
              Replace All "MaRkErChAr1"
              Find "«"
              Replace All "MaRkErChAr2"
              Find "^p"
              Replace All "«^p"
              Find RegExp "%^([~^t«]++^t[~^t«]++^t[~^t«]++^t[~^t«]++^t«^)"
              Replace All "»^1"
              Find RegExp "%^([~^t«]++^t[~^t«]++^t[~^t«]++^t[~^t«]++^t[~^t«]+«^)"
              Replace All "»^1"
              Find RegExp "%^([~»]*^)«"
              Replace All "^1***«"
              Find RegExp "%»"
              Replace All ""
              Find RegExp "«$"
              Replace All ""
              Find MatchCase RegExp "%MaRkErChAr1"
              Replace All "»"
              Find MatchCase "MaRkErChAr2"
              Replace All "«"

              Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
              Best regards from an UC/UE/UES for Windows user from Austria

              4

                Feb 03, 2007#7

                Thanks Mofi,

                Works perfrectly. Now I just need to get time to study it so I can figger out how to do similar stuff myself. I'm not really familiar with regular expresssions -- looks like I've have to check 'em out a bit more.

                Thanks again and have a great weekend!

                - Sequoia

                1
                NewbieNewbie
                1

                  Lines with more a less of a special character

                  Jul 07, 2007#8

                  Hello ...

                  I would like to find all lines of an text file which has more or less than 10 "|" characters.

                  I need this method to check csv-text files (seperated by "|") before I import them into my database. I use UltraEdit v12.10.

                  Kind regards
                  Ralf

                  236
                  MasterMaster
                  236

                    Re: Lines with more a less of a special character

                    Jul 07, 2007#9

                    Hi Ralf,

                    does this work? It's not beautiful or optimized, mostly because I don't know whether empty fields may occur. It does assume that the delimiter | doesn't occur within strings or escaped... So

                    Code: Select all

                    ^(?:[^|\r\n]*\|){1,9}[^|]*$|^(?:[^|\r\n]*\|){11,}[^|]*$
                    will match each line except lines 2 and 5 of the following:

                    Code: Select all

                    asd|sdf|sdf|asd|sdf|sdf|asd|sdf|sdf|asd|sdf|sdf|fg
                    |sdf|sdf|asd|sdf|sdf|asd|sdf|sdf|asd|xcc
                    |sdf|sdf|asd|sdf|sdf
                    asd|sdf|sdf|asd|sdf|sdf|asd|sdf||asd|sdf|sdf|asd|sdf|sdf|
                    ||||||||||
                    ||||||
                    |||||||||||
                    You'll run into trouble if lines may look like

                    Code: Select all

                    asd|sdf|sdf|"as|d"|sdf|sdf|asd|sdf|sdf|asd|xcc
                    where the | in "as|d" is not supposed to count as a delimiter.

                    You need to turn Perl regular expressions on and check the "regular expressions" checkbox in the replace dialog. I tested this with V13.10+3, I hope it also works in V12.

                    HTH,
                    Tim

                    6,602548
                    Grand MasterGrand Master
                    6,602548

                      Re: Lines with more a less of a special character

                      Jul 07, 2007#10

                      Pietzcker demonstrates here once again the power of the Perl engine.

                      The solution I used was to mark all lines which has the correct number of delimiters. Then mark all the remainig lines with a different marker string which were not marked before and last delete the first mark from the lines with correct number of delimiters. What remains are the lines which have not a correct number of delimiters and so are marked with *** at start of the line.

                      Maybe it would be a good idea to write a macro or script which asks the user for the delimiter character and the correct number of delimiters per line and then run the search to mark all the lines with wrong number of delimiters. Such a macro or script could also handle the exception that a delimiter character inside "..." should be ignored.
                      Best regards from an UC/UE/UES for Windows user from Austria

                      9
                      NewbieNewbie
                      9

                        Re: Lines with more a less of a special character

                        Aug 06, 2007#11

                        Mofi wrote:Maybe it would be a good idea to write a macro or script which asks the user for the delimiter character and the correct number of delimiters per line and then run the search to mark all the lines with wrong number of delimiters. Such a macro or script could also handle the exception that a delimiter character inside "..." should be ignored.
                        Yes, this would be extremely usefull.

                        we deal with a lot of databases and regularly receive txt files of data with delimited fields.

                        being able to verify that these files are free from structure errors would be fantastic and save a lot of time.

                        I have been trying to modify your "counting tabs in a line" macro but no real progress to date.