Here is the macro set to create a statistic which terms (words) exist in a file and how often. If you want the statistic for many files, copy all files together to a single file. The Windows console command to copy all files in a directory together is:
copy *.* BigFileWithAllContents.txt
The main problem was how to count how often a term exists in the file. Before I started, I have had 2 ideas:
1) Use
FindInFiles with results to an edit window. The results will contain in the last line the total number of founds of the searched string.
2) Count it "manually" with an appropriate macro code and the
CountUp macro I have written already and posted at
How to insert an incrementing number in a file using a counter in a macro?
Solution 1 is maybe faster but I know there are many problems with FindInFiles with results to an edit window like the
focus problem and the problem with Unicode because since v12 of UE and v5.50 of UES the results are listed in a Unicode file and not in an ASCII file like in previous versions. But I wanted that this macro (set) works also for previous versions. Solution 1 would need also many window switches which slows down macro execution speed. So I have decided me for solution 2.
Because nesting of loops is not possible in UE/UES macro language 3 macros are needed to do the job.
Enable the macro property
Continue if a Find with Replace not found respectively
Continue if search string not found for all 3 macros. Disable the macro property "Show Cancel Dialog for this macro" for all 3 macros to increase speed. The macro execution can be still breaked with key ESC which must be pressed until the main macro exits.
I have inserted
comments to explain the important steps of the macros. I'm sure every user has to adapt it a little bit for his personal needs. You have to delete the green comments before copying it to the edit macro dialog.
The terms/words are sorted and counted with ignoring case. If you want it case sensitive, remove the red
IgnoreCase sort parameter in the main macro
CountTerms and insert the
MatchCase find parameter at the Find command in submacro
CountDuplicates.
I think the macro set could be also for interest of users which do not need it because it contains some new macro techniques never posted before.
Add UnixReOn or PerlReOn (v12+ of UE) at the end of the main macro CountTerms if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
Attention: It is important that you create the macros in following order: first CountUp, then CountDuplicates and last CountTerms. Or the PlayMacro commands will be automatically removed by UltraEdit/UEStudio without any warning when closing/updating the macro.
The main macro
CountTerms:
InsertMode
ColumnModeOff
HexOff
// Select the whole content of the active file and copy it to a new file.
// But before pasting it into the new file make sure, the new file is an
// ASCII file with DOS line terminations. There are configuration options
// which determine the format of a new file and the macro needs this format.
SelectAll
IfSel
Copy
EndSelect
Top
Else
ExitMacro
EndIf
UnixReOff
NewFile
UnicodeToASCII
UnixMacToDos
Paste
// The last line of the file must be terminated with CR/LF. The cursor
// is now at bottom of the file. If the cursor is not at column 1, the
// line termination CR/LF must be inserted.
IfColNum 1
Else
"
"
EndIf
Top
// Back at top of the file delete all trailing and preceding spaces and
// tabs. Next replace a sequence of spaces, page breaks and tabs by a line
// break. This regular expression replace is the term/word creation part
// of the macro. If you want to specify also other delimiter characters,
// insert it into the square bracket of the second Find RegExp command.
// Use ^ if a delimiter character is also an UltraEdit style regex character.
TrimTrailingSpaces
Find RegExp "%[ ^t]+"
Replace All ""
Find RegExp "[ ^b^t]+"
Replace All "^p"
// Remove all terms/words which consists only of a single character like
// the words I, a, ... They are normally not of interest. Remove the first
// Find/Replace if you want these single character terms too. The last
// regex replace removes characters from the end of the terms/words which
// are known as punctuation marks. Then the terms are sorted without
// removing the duplicates because the macro has to count it.
Find RegExp "%?^p"
Replace All ""
Find RegExp "[.,;:!?^-]+$"
Replace All ""
// If you want you can insert here some Find/Replaces to delete
// words like "the", "and", ... which are normally not of interest.
SortAsc
IgnoreCase 1 -1 0 0 0 0 0 0
// After the sort all blank lines at top of the file are deleted.
Loop
Find "^p^p"
Replace All ""
IfNotFound
ExitLoop
EndIf
EndLoop
Key END
IfColNum 1
DeleteLine
Else
Key HOME
EndIf
// The list of terms can contain also terms which are substrings of
// other larger terms. Because the macro cannot use a regular expression
// in the macro CountDuplicates later, the start and end of a term must
// be marked with special character sequences which hopefully never exist
// in a source file in a term.
Find RegExp "%^(*^)$"
Replace All "SOP>>>^1<<<EOP"
// Clipboard 9 will hold always the current term whose duplicates has
// to be counted and removed. Clipboard 8 will contain always the current
// count number. The following loop is executed until the end of the file
// is reached.
Clipboard 9
Loop
IfEof
ExitLoop
EndIf
// Select the term/word with the special surrounding marker strings and
// copy it to user clipboard 9 for usage in the submacro CountDuplicates.
StartSelect
Key END
Copy
EndSelect
// Unselect and insert the starting count number 1 at end of the line
// of the current term and copy this number also to user clipboard 8.
Key LEFT ARROW
Key RIGHT ARROW
Clipboard 8
"1"
StartSelect
Key LEFT ARROW
Copy
EndSelect
Key RIGHT ARROW
// Run now the submacro which counts the duplicates of the term in
// clipboard 9 and deletes also the duplicates from the file. Then move
// the cursor to column 1 of the next term if end of file is not reached.
PlayMacro 1 "CountDuplicates"
Key HOME
Key DOWN ARROW
EndLoop
// All terms/words are counted. The macro needs no more a clipboard. Clear
// the contents of the 2 used clipboards to free RAM and switch back to the
// windows clipboard.
ClearClipboard
Clipboard 8
ClearClipboard
Clipboard 0
// Move the cursor back to the top of the file. Delete the 2 special marker
// strings and move the counted number from the end of the line to start of
// the line. Here are also inserted 21 spaces. Why? Read further.
Top
Find RegExp "SOP>>>^(*^)<<<EOP^([0-9]+^)$"
Replace All "^2 times: ^1"
// With a simple cursor move at the first line of the file the cursor is
// set to a column near or exactly on the first character of the first term.
// From this cursor position every column before in the whole file will be
// selected now. If no term is counted more than 999 999 999 999 999 999 999
// times, the last selected column contains only spaces.
Loop 29
Key RIGHT ARROW
EndLoop
ColumnModeOn
StartSelect
SelectToBottom
Key UP ARROW
// The select columns will be aligned right now. Why? Well, there are
// numbers with 1 digit, numbers with 2 digits, ... and without right
// alignment the final statistic would look not very pretty.
ColumnRightJustify
EndSelect
// Back to top of the file and still in column mode, there is still one
// problem after the right alignment of the numbers. We have too much
// preceding empty columns and the terms are not aligned left anymore.
// So in a loop the macro selects always only the first column. In the
// selected column only all non space characters are replaced by itself.
// This regular expression find and replace does not change anything. But
// I get with this trick the information, if this column contains any
// non space character or not. If the selected column does not contain
// any non space character (digit), it can be deleted. With this trick
// the unnecessary preceding spaces are deleted and the preceding spaces
// needed for the right alignment of all numbers remain.
Loop
Top
StartSelect
SelectToBottom
Key UP ARROW
Key RIGHT ARROW
Find RegExp "^([~ ]^)"
Replace All SelectText "^1"
IfFound
EndSelect
Top
ExitLoop
Else
Delete
EndSelect
EndIf
EndLoop
ColumnModeOff
// Back to normal edit mode 2 regular expressions are used to left align
// the terms with a tab as delimiter. You can also use a comma or ; if
// you want a CSV file. But to get a valid CSV file with a , or a ; as
// delimiter you have to run some extra find and replaces because , and
// ; can still exist in the terms, but not a tab.
Find RegExp "%^( ++1 time^)s: ++"
Replace All "^1: ^t"
Find RegExp "%^( ++[0-9]+ times:^) ++"
Replace All "^1^t"
The first submacro is
CountDuplicates:
// This macro is very small and simple. It searches from the current cursor
// position - end of the line with the current term and the count number -
// for the duplicate of the current term which after the sort in the main
// macro must be at the next line. If a duplicate is found, the line is
// deleted and with the submacro CountUp the already existing count number
// at the of the line above with the term is increased by 1. If no duplicate
// of the current term is found anymore, this little submacro exits and the
// macro execution is continued in the main macro.
Loop
Clipboard 9
Find "^c"
IfNotFound
ExitLoop
Else
DeleteLine
Key UP ARROW
Find RegExp "[0-9]+$"
Clipboard 8
PlayMacro 1 "CountUp"
EndIf
EndLoop
The second submacro is
CountUp.
This is the universal
i++ macro posted at
How to insert an incrementing number in a file using a counter in a macro?
Nothing is modified. So I do not explain it here.
Paste
EndSelect
InsertMode
"|"
Key LEFT ARROW
Key LEFT ARROW
OverStrikeMode
Loop
IfCharIs "0"
"1"
ExitLoop
EndIf
IfCharIs "1"
"2"
ExitLoop
EndIf
IfCharIs "2"
"3"
ExitLoop
EndIf
IfCharIs "3"
"4"
ExitLoop
EndIf
IfCharIs "4"
"5"
ExitLoop
EndIf
IfCharIs "5"
"6"
ExitLoop
EndIf
IfCharIs "6"
"7"
ExitLoop
EndIf
IfCharIs "7"
"8"
ExitLoop
EndIf
IfCharIs "8"
"9"
ExitLoop
EndIf
IfCharIs "9"
"0"
Key LEFT ARROW
IfColNum 1
InsertMode
"1"
ExitLoop
EndIf
Key LEFT ARROW
IfCharIs "0123456789"
Else
Key RIGHT ARROW
InsertMode
"1"
ExitLoop
EndIf
EndIf
EndLoop
InsertMode
Loop
IfColNum 1
ExitLoop
EndIf
Key LEFT ARROW
IfCharIs "0123456789"
Else
Key RIGHT ARROW
ExitLoop
EndIf
EndLoop
StartSelect
Find Select "|"
Key LEFT ARROW
Copy
EndSelect
Key RIGHT ARROW
Key LEFT ARROW
Key DEL