Keyword sorting macro

Keyword sorting macro

7
NewbieNewbie
7

    Mar 18, 2006#1

    Hello all.

    I have a large text file that contains over four thousand lines. I badly need a macro that will -

    1. Select lines that contain 'word1' or 'word2' (full word matching only) and copy the lines to a new tab.
    2. Delete the selected lines (leaving no blank line) from the main file.
    3. Save the new tab file to the folder of main file, with name 'word1'.
    4. Repeat the above process for another pair of words ....

    If full solution is not possible, please give some tips about the first two parts at least. I have been using UltraEdit for a long time, but quite new to this forum. Any help will be appreciated.

    Thanks and regards.

    6,675585
    Grand MasterGrand Master
    6,675585

      Mar 18, 2006#2

      Here is the macro which should do the job. Enable macro property Continue if a Find with Replace not found for this macro. The macro will only work with UltraEdit style regular expression because ^c in a regular expression can only be used with UltraEdit style. And try to understand how this macro works. I think, it is not to difficult.

      InsertMode
      ColumnModeOff
      HexOff
      UnixReOff
      Top
      Clipboard 9
      ClearClipboard
      "^{"
      GetString "Enter first word:"
      "^}^{"
      GetString "Enter second word:"
      "^}"
      SelectToTop
      Clipboard 8
      Cut
      Loop
      Find RegExp MatchWord "^c"
      IfFound
      Clipboard 9
      SelectLine
      CutAppend
      Clipboard 8
      Else
      ExitLoop
      EndIf
      EndLoop
      NewFile
      Clipboard 9
      Paste
      ClearClipboard
      Top
      IfEof
      "No line found !!!"
      Clipboard 8
      ClearClipboard
      Else
      Clipboard 8
      Paste
      Key HOME
      Key DEL
      Key DEL
      Find "^}^{"
      Delete
      Find Select "^}"
      Delete
      EndSelect
      ".txt"
      SelectToTop
      Cut
      SaveAs "^c"
      ClearClipboard
      EndIf
      Clipboard 0

      Add UnixReOn or PerlReOn (v12+ of UE) at the end off the macro if you do not use UltraEdit style expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
      Best regards from an UC/UE/UES for Windows user from Austria

      7
      NewbieNewbie
      7

        Mar 18, 2006#3

        Hey Mofi, thank you so much! It's wonderful to get such a quick and detailed answer.

        The macro works fine. The only problem I noticed was lines containing 'word2' were not 'transfered' to the new file. They were intact in the main file.

        Could you please check? I'm using version 10.10b.

        Thanks and regards.

        6,675585
        Grand MasterGrand Master
        6,675585

          Mar 19, 2006#4

          The macro works with UltraEdit v11.20a. I don't have access to a previous version on the computer I'm currently writing this. I will test the macro tomorrow on a computer where I also have UltraEdit v10.10c installed.

          I know that there are some problems with the OR expression. For example %*^{word1^}^{word2^}*^p which should find word 1 OR word2 and automatially select the whole line does not work for lines with word2 in v11.20a.

          Till I have tested the macro with v10.10c please test yourself the OR expression which is used by the macro in your version of UltraEdit.

          Open the find dialog, enter ^{word1^}^{word2^} in the find field, enable List Lines Containing String, Match Whole Word Only and Regular Expressions and press the button Find Next.

          Then look in the dialog with all lines containing either word1 OR word2 if you can see a line with word2 not containing word1. If you can't see such a line, v10.10b of UltraEdit has a real bad bug with the OR expression. Maybe disabling Match Whole Word Only helps.

          2006-03-20: I have tested the macro with UltraEdit v10.10c and the macro works perfect with v10.10c. You can only search for words and not for phrases with a space or a '-', etc. with this method.
          Best regards from an UC/UE/UES for Windows user from Austria

          7
          NewbieNewbie
          7

            Mar 28, 2006#5

            Thank you again.

            Now I have found that your macro is funtional in both versions. If I try 'book' and 'page' as word1 and word2 respectively, the macro delivers what is required.

            But if I try 'page' and 'pages' (regardless of their order) with the macro, then only lines containing 'page' is separated to another file. I've tried it with the latest version too.

            There must be a quick fix to this, but I am unable to find it. Could you please check?

            6,675585
            Grand MasterGrand Master
            6,675585

              Mar 29, 2006#6

              Indeed, this is a bug of UltraEdit. The problem is the option MatchWord. To easily verify this bug do following:

              Test file:

              line 1 contains word pages in the middle of the line
              line 2 contains word page in the middle of the line
              line 3 does not contain any of the 2 words
              line 4 contains word 1 at the end of the line - pages
              line 5 contains word 2 at the end of the line - page
              line 6 contains both words but in pagespage the space is missing
              pages line 7 contains word 1 at the start of the line
              page line 8 contains word 2 at the start of the line

              With this test file run a Find with regular expression in UltraEdit style with search string ^{pages ^}^{page ^} with option List Lines Containing String also enabled.

              First run the Find with option Match Whole Word Only disabled. UltraEdit will find the 2 words correctly in all lines except line 3. (But if we run this find without List Lines Containing String we can see that it always finds "page" and never "pages"!)

              Second run the Find with option Match Whole Word Only enabled. Now it finds only the line 2, 5 and 8 which are the lines containing only word "page". This is the bug.

              Please report this bug by email to IDM support. In your email just insert the link to this forum topic. This should be enough for the IDM developers to reproduce the bug.


              Back to the macro:

              In the macro code remove in the line Find RegExp MatchWord "^c" the option MatchWord.

              If in your file word 1 and word 2 always exist as whole words, the problem is solved.
              If word 1 or word 2 can also be part of an other word, you can for example enter " pages " and " page " (without the quotes but with the single spaces). But this will fail if word 1 or word 2 is only at start or end of any line.
              For this worst scenario additional regular expression replaces are needed in the macro code to first add a space at begin and end of every line in the source file and remove this spaces in all files produced by the macro and the source file at the end of the macro. I will post the modified macro with the full workaround (spaces instead of MatchWord) if you need it.
              Best regards from an UC/UE/UES for Windows user from Austria

              7
              NewbieNewbie
              7

                Mar 29, 2006#7

                A modified macro with the workaround will be highly appreciated. It will save me hours of repeated keystrokes.

                Regards.

                6,675585
                Grand MasterGrand Master
                6,675585

                  Mar 29, 2006#8

                  Well, here is the macro with the workaround code. It first makes sure that the last line of the file is terminated with a line break. Then it inserts a space at begin and end of every DOS terminated line and converts all tabs to a special string with surrouding spaces.

                  Next you can enter the 2 words. The surrounding spaces at each of the 2 words are automatically added by the macro. So just enter only the words.

                  After all lines are collected, it undos the 3 special replaces in the source file, pastes the lines collected into the new file and also undos the 3 special replaces there.

                  Attention: Copy first the following macro code to an edit window in UltraEdit and run Format - Trim Trailing Spaces. Then select all again and copy it to the edit macro dialog.

                  InsertMode
                  ColumnModeOff
                  HexOff
                  UnixReOff
                  Bottom
                  IfColNum 1
                  Else
                  "
                  "
                  EndIf
                  Top
                  Find RegExp "%^([~^p]^)"
                  Replace All " ^1"
                  Find "^p"
                  Replace All " ^p"
                  Find "^t"
                  Replace All " TABCHAR0x09 "
                  Clipboard 9
                  ClearClipboard
                  "^{ "
                  GetString "Enter first word:"
                  " ^}^{ "
                  GetString "Enter second word:"
                  " ^}"
                  SelectToTop
                  Clipboard 8
                  Cut
                  Loop
                  Find RegExp "^c"
                  IfFound
                  Clipboard 9
                  SelectLine
                  CutAppend
                  Clipboard 8
                  Else
                  ExitLoop
                  EndIf
                  EndLoop
                  Top
                  Find MatchCase " TABCHAR0x09 "
                  Replace All "^t"
                  Find RegExp "% "
                  Replace All ""
                  Find " ^p"
                  Replace All "^p"
                  NewFile
                  Clipboard 9
                  Paste
                  ClearClipboard
                  Top
                  Find MatchCase " TABCHAR0x09 "
                  Replace All "^t"
                  Find RegExp "% "
                  Replace All ""
                  Find " ^p"
                  Replace All "^p"
                  IfEof
                  "No line found !!!"
                  Clipboard 8
                  ClearClipboard
                  Else
                  Clipboard 8
                  Paste
                  Key HOME
                  Key DEL
                  Key DEL
                  Key DEL
                  Find "^}^{"
                  Delete
                  Find Select "^}"
                  Delete
                  EndSelect
                  Key BACKSPACE
                  ".txt"
                  SelectToTop
                  Cut
                  SaveAs "^c"
                  ClearClipboard
                  EndIf
                  Clipboard 0
                  Best regards from an UC/UE/UES for Windows user from Austria

                  7
                  NewbieNewbie
                  7

                    Apr 23, 2006#9

                    Thanks Mofi. It works like a magic.
                    My apologies I could not get back to the thread in time.

                    Regards.