Word Frequency Counter

Word Frequency Counter

7
NewbieNewbie
7

    Mar 29, 2006#1

    Hello,

    Is there any macro that can count the frequency of each words in a text file?

    Regards.

    6,686585
    Grand MasterGrand Master
    6,686585

      Apr 01, 2006#2

      No! If you want to know how often a certain word is in the current file, use the Count All button from the Find dialog. Although a macro could be created to list all words and how often each of them is used in the current file, I don't see a practical usage for such a macro.
      Best regards from an UC/UE/UES for Windows user from Austria

      7
      NewbieNewbie
      7

        Apr 23, 2006#3

        A macro would be great. Possible usages are:

        1. Analyzing writings of particular writers
        2. Keyword research for SEO (what I am looking for)

        Regards.

        6,686585
        Grand MasterGrand Master
        6,686585

          May 25, 2006#4

          Here is the macro set to create a statistic which terms (words) exist in a file and how often. If you want the statistic for many files, copy all files together to a single file. The Windows console command to copy all files in a directory together is: copy *.* BigFileWithAllContents.txt

          The main problem was how to count how often a term exists in the file. Before I started, I have had 2 ideas:

          1) Use FindInFiles with results to an edit window. The results will contain in the last line the total number of founds of the searched string.

          2) Count it "manually" with an appropriate macro code and the CountUp macro I have written already and posted at How to insert an incrementing number in a file using a counter in a macro?

          Solution 1 is maybe faster but I know there are many problems with FindInFiles with results to an edit window like the focus problem and the problem with Unicode because since v12 of UE and v5.50 of UES the results are listed in a Unicode file and not in an ASCII file like in previous versions. But I wanted that this macro (set) works also for previous versions. Solution 1 would need also many window switches which slows down macro execution speed. So I have decided me for solution 2.

          Because nesting of loops is not possible in UE/UES macro language 3 macros are needed to do the job.

          Enable the macro property Continue if a Find with Replace not found respectively Continue if search string not found for all 3 macros. Disable the macro property "Show Cancel Dialog for this macro" for all 3 macros to increase speed. The macro execution can be still breaked with key ESC which must be pressed until the main macro exits.

          I have inserted comments to explain the important steps of the macros. I'm sure every user has to adapt it a little bit for his personal needs. You have to delete the green comments before copying it to the edit macro dialog.

          The terms/words are sorted and counted with ignoring case. If you want it case sensitive, remove the red IgnoreCase sort parameter in the main macro CountTerms and insert the MatchCase find parameter at the Find command in submacro CountDuplicates.

          I think the macro set could be also for interest of users which do not need it because it contains some new macro techniques never posted before.

          Add UnixReOn or PerlReOn (v12+ of UE) at the end of the main macro CountTerms if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.

          Attention: It is important that you create the macros in following order: first CountUp, then CountDuplicates and last CountTerms. Or the PlayMacro commands will be automatically removed by UltraEdit/UEStudio without any warning when closing/updating the macro.

          The main macro CountTerms:

          InsertMode
          ColumnModeOff
          HexOff
          // Select the whole content of the active file and copy it to a new file.
          // But before pasting it into the new file make sure, the new file is an
          // ASCII file with DOS line terminations. There are configuration options
          // which determine the format of a new file and the macro needs this format.

          SelectAll
          IfSel
          Copy
          EndSelect
          Top
          Else
          ExitMacro
          EndIf
          UnixReOff
          NewFile
          UnicodeToASCII
          UnixMacToDos
          Paste
          // The last line of the file must be terminated with CR/LF. The cursor
          // is now at bottom of the file. If the cursor is not at column 1, the
          // line termination CR/LF must be inserted.

          IfColNum 1
          Else
          "
          "
          EndIf
          Top
          // Back at top of the file delete all trailing and preceding spaces and
          // tabs. Next replace a sequence of spaces, page breaks and tabs by a line
          // break. This regular expression replace is the term/word creation part
          // of the macro. If you want to specify also other delimiter characters,
          // insert it into the square bracket of the second Find RegExp command.
          // Use ^ if a delimiter character is also an UltraEdit style regex character.

          TrimTrailingSpaces
          Find RegExp "%[ ^t]+"
          Replace All ""
          Find RegExp "[ ^b^t]+"
          Replace All "^p"
          // Remove all terms/words which consists only of a single character like
          // the words I, a, ... They are normally not of interest. Remove the first
          // Find/Replace if you want these single character terms too. The last
          // regex replace removes characters from the end of the terms/words which
          // are known as punctuation marks. Then the terms are sorted without
          // removing the duplicates because the macro has to count it.

          Find RegExp "%?^p"
          Replace All ""
          Find RegExp "[.,;:!?^-]+$"
          Replace All ""
          // If you want you can insert here some Find/Replaces to delete
          // words like "the", "and", ... which are normally not of interest.

          SortAsc IgnoreCase 1 -1 0 0 0 0 0 0
          // After the sort all blank lines at top of the file are deleted.
          Loop
          Find "^p^p"
          Replace All ""
          IfNotFound
          ExitLoop
          EndIf
          EndLoop
          Key END
          IfColNum 1
          DeleteLine
          Else
          Key HOME
          EndIf
          // The list of terms can contain also terms which are substrings of
          // other larger terms. Because the macro cannot use a regular expression
          // in the macro CountDuplicates later, the start and end of a term must
          // be marked with special character sequences which hopefully never exist
          // in a source file in a term.

          Find RegExp "%^(*^)$"
          Replace All "SOP>>>^1<<<EOP"
          // Clipboard 9 will hold always the current term whose duplicates has
          // to be counted and removed. Clipboard 8 will contain always the current
          // count number. The following loop is executed until the end of the file
          // is reached.

          Clipboard 9
          Loop
          IfEof
          ExitLoop
          EndIf
          // Select the term/word with the special surrounding marker strings and
          // copy it to user clipboard 9 for usage in the submacro CountDuplicates.

          StartSelect
          Key END
          Copy
          EndSelect
          // Unselect and insert the starting count number 1 at end of the line
          // of the current term and copy this number also to user clipboard 8.

          Key LEFT ARROW
          Key RIGHT ARROW
          Clipboard 8
          "1"
          StartSelect
          Key LEFT ARROW
          Copy
          EndSelect
          Key RIGHT ARROW
          // Run now the submacro which counts the duplicates of the term in
          // clipboard 9 and deletes also the duplicates from the file. Then move
          // the cursor to column 1 of the next term if end of file is not reached.

          PlayMacro 1 "CountDuplicates"
          Key HOME
          Key DOWN ARROW
          EndLoop
          // All terms/words are counted. The macro needs no more a clipboard. Clear
          // the contents of the 2 used clipboards to free RAM and switch back to the
          // windows clipboard.

          ClearClipboard
          Clipboard 8
          ClearClipboard
          Clipboard 0
          // Move the cursor back to the top of the file. Delete the 2 special marker
          // strings and move the counted number from the end of the line to start of
          // the line. Here are also inserted 21 spaces. Why? Read further.

          Top
          Find RegExp "SOP>>>^(*^)<<<EOP^([0-9]+^)$"
          Replace All "^2 times:                     ^1"
          // With a simple cursor move at the first line of the file the cursor is
          // set to a column near or exactly on the first character of the first term.
          // From this cursor position every column before in the whole file will be
          // selected now. If no term is counted more than 999 999 999 999 999 999 999
          // times, the last selected column contains only spaces.

          Loop 29
          Key RIGHT ARROW
          EndLoop
          ColumnModeOn
          StartSelect
          SelectToBottom
          Key UP ARROW
          // The select columns will be aligned right now. Why? Well, there are
          // numbers with 1 digit, numbers with 2 digits, ... and without right
          // alignment the final statistic would look not very pretty.

          ColumnRightJustify
          EndSelect
          // Back to top of the file and still in column mode, there is still one
          // problem after the right alignment of the numbers. We have too much
          // preceding empty columns and the terms are not aligned left anymore.
          // So in a loop the macro selects always only the first column. In the
          // selected column only all non space characters are replaced by itself.
          // This regular expression find and replace does not change anything. But
          // I get with this trick the information, if this column contains any
          // non space character or not. If the selected column does not contain
          // any non space character (digit), it can be deleted. With this trick
          // the unnecessary preceding spaces are deleted and the preceding spaces
          // needed for the right alignment of all numbers remain.

          Loop
          Top
          StartSelect
          SelectToBottom
          Key UP ARROW
          Key RIGHT ARROW
          Find RegExp "^([~ ]^)"
          Replace All SelectText "^1"
          IfFound
          EndSelect
          Top
          ExitLoop
          Else
          Delete
          EndSelect
          EndIf
          EndLoop
          ColumnModeOff
          // Back to normal edit mode 2 regular expressions are used to left align
          // the terms with a tab as delimiter. You can also use a comma or ; if
          // you want a CSV file. But to get a valid CSV file with a , or a ; as
          // delimiter you have to run some extra find and replaces because , and
          // ; can still exist in the terms, but not a tab.

          Find RegExp "%^( ++1 time^)s: ++"
          Replace All "^1: ^t"
          Find RegExp "%^( ++[0-9]+ times:^) ++"
          Replace All "^1^t"

          The first submacro is CountDuplicates:

          // This macro is very small and simple. It searches from the current cursor
          // position - end of the line with the current term and the count number -
          // for the duplicate of the current term which after the sort in the main
          // macro must be at the next line. If a duplicate is found, the line is
          // deleted and with the submacro CountUp the already existing count number
          // at the of the line above with the term is increased by 1. If no duplicate
          // of the current term is found anymore, this little submacro exits and the
          // macro execution is continued in the main macro.

          Loop
          Clipboard 9
          Find "^c"
          IfNotFound
          ExitLoop
          Else
          DeleteLine
          Key UP ARROW
          Find RegExp "[0-9]+$"
          Clipboard 8
          PlayMacro 1 "CountUp"
          EndIf
          EndLoop

          The second submacro is CountUp.

          This is the universal i++ macro posted at How to insert an incrementing number in a file using a counter in a macro?
          Nothing is modified. So I do not explain it here.

          Paste
          EndSelect
          InsertMode
          "|"
          Key LEFT ARROW
          Key LEFT ARROW
          OverStrikeMode
          Loop
          IfCharIs "0"
          "1"
          ExitLoop
          EndIf
          IfCharIs "1"
          "2"
          ExitLoop
          EndIf
          IfCharIs "2"
          "3"
          ExitLoop
          EndIf
          IfCharIs "3"
          "4"
          ExitLoop
          EndIf
          IfCharIs "4"
          "5"
          ExitLoop
          EndIf
          IfCharIs "5"
          "6"
          ExitLoop
          EndIf
          IfCharIs "6"
          "7"
          ExitLoop
          EndIf
          IfCharIs "7"
          "8"
          ExitLoop
          EndIf
          IfCharIs "8"
          "9"
          ExitLoop
          EndIf
          IfCharIs "9"
          "0"
          Key LEFT ARROW
          IfColNum 1
          InsertMode
          "1"
          ExitLoop
          EndIf
          Key LEFT ARROW
          IfCharIs "0123456789"
          Else
          Key RIGHT ARROW
          InsertMode
          "1"
          ExitLoop
          EndIf
          EndIf
          EndLoop
          InsertMode
          Loop
          IfColNum 1
          ExitLoop
          EndIf
          Key LEFT ARROW
          IfCharIs "0123456789"
          Else
          Key RIGHT ARROW
          ExitLoop
          EndIf
          EndLoop
          StartSelect
          Find Select "|"
          Key LEFT ARROW
          Copy
          EndSelect
          Key RIGHT ARROW
          Key LEFT ARROW
          Key DEL
          Best regards from an UC/UE/UES for Windows user from Austria

          344
          MasterMaster
          344

            May 26, 2006#5

            Hi Mofi

            Nice work.

            Just for the lazy guys - like me - here is the ultimate "unmofi-Macro" ;-)
            That cuts out the comments //bla.
            Just let it run over the in UE pasted macros.
            Macro setting like the others above.

            Code: Select all

            InsertMode
            ColumnModeOff
            HexOff
            PerlReOn
            Top
            Find RegExp "//.*"
            Replace All ""
            TrimTrailingSpaces
            Loop 
            Find "^p^p"
            Replace All "^p"
            IfNotFound
            ExitLoop
            EndIf
            EndLoop
            rds Bego
            Normally using all newest english version incl. each hotfix. Win 10 64 bit