Tapatalk

Is there a simple way to find in files of a directory for a list of words but with some rules?

Is there a simple way to find in files of a directory for a list of words but with some rules?

71
NewbieNewbie
71

    Aug 31, 2021#1

    A folder is created for the purpose of individual files, all files exist in text format. Find in Files works great and shows easily which file has the "word" or "exact sentence".

    How would one use a regular expression for a list of words (or just find without regex) for something like this?

    Searching for the word Help finds it in all files in the specified directory - no problem. It finds Help Helping Helped Helps Helper etc. Would it be simple to exclude one or more, i.e. Help Helped but not Helping Helps?

    How would one search for a special symbol, for example colon : The way it now works is that Find in Files finds lots of special characters in addition to lines with :

    How would one search for a simple pattern in Find in Files, like Help? (find only Help and Helps with a delimiter to 5 characters max.

    6,685587
    Grand MasterGrand Master
    6,685587

      Aug 31, 2021#2

      The legacy UltraEdit and Unix regular expression engines support only an OR with two arguments, i.e. word1 OR word2. The help page opened on pressing key F1 on floating or docked Find and Replace window having the input focus contains a link to the help page Regular Expressions explaining the syntax for the legacy UltraEdit and Unix regular expression engines.

      The extremely powerful Perl regular expression engine supports an OR expression with multiple arguments, not an unlimited list of arguments due to stack memory limitations, but 50 or even more arguments (words/phrases/expressions) are no problem in general.

      After clicking on the gearwheel button to make the advanced options visible, checking the option Regular expressions and selecting Perl, the button Regular expression builder becomes available which has in UltraEdit for Windows v28.10.0.154 the symbol .* (Perl syntax for any character except newline character 0 or more times). Clicking on this button above the Find what or the Replace with field displays a list of regular expressions which can be used in the search or the replace string according to the regular expression engine currently selected. The lists are complete for the UltraEdit and the Unix regular expression engines as their capabilities are really limited. For the Perl regular expression engine the two lists contain the most often needed expressions. Clicking on an item in a regular expression builder list inserts the expression at current position of the caret in the search or the replace string.

      The help of UltraEdit contains also a link to the page Perl Regular Expressions with a description of most often needed Perl regular expressions. The announcement topic Readme for the Find/Replace/Regular Expressions forum contains lots of links to pages or even entire websites explaining the usage of regular expressions and third-party tools which can be used also in UltraEdit via a user tool to help finding a complex regular expression for a specific find/replace task.

      So with the Perl regular expression engine the search expression could be either the expression \<(?:Helped|Help)\> or the expression \b(?:Helped|Help)\b

      The expressions mean:

      \< ... beginning of a word

      (?:...) ... a non-marking / non-capturing group used here for the OR expression.

      | ... means here OR.

      \> ... end of word.

      \b ... any word boundary (beginning or end of a word).

      The Unicode consortium has defined which character is a word character and the Perl regular expression engine makes use of that definition to find out which sequence of characters form a word.

      The case-sensitivity is controlled by the option Match case. There is also a possibility to control the case-sensitivity in the search expression itself on using the Perl regular expression engine, but I do not explain that here as you are a beginner in using regular expressions and I don't want to confuse you.

      It would be also possible to use in this case searching for just the two words Help and Helped the Perl regular expression search string \<Help(?:ed)?\> or the search string \bHelp(?:ed)?\b which means find the word Help optionally with ed appended, i.e. optionally also Helped. The question mark after the non-marking group means here that the expression (simple string) in the non-capturing group can be applied  either 0 or optionally exactly once for a positive match.

      The colon : has itself no special meaning for any regular expression engine. It can have a special meaning in Perl regular expressions depending on which character is left to the colon as it can be seen on the search expressions above. So using Find in Files with just : as string to find results in finding all lines containing a colon anywhere inside the line and get those lines displayed in the output window or the results window (and the number of lines before and after the found line on making use of that additional options).
      Best regards from an UC/UE/UES for Windows user from Austria

      71
      NewbieNewbie
      71

        Aug 31, 2021#3

        Thank you.

        Can I fix the results to Edit windows as it spits out find results with strings that look like this

        Let’s read
        ...
        It’s found at

        The original file has the clean text string ... Lets read and Its found at

        6,685587
        Grand MasterGrand Master
        6,685587

          Sep 01, 2021#4

          It looks like the files you are searching for text are UTF-8 encoded. I suggest to explicitly enable in Find and Replace window on tab Find in Files at bottom the option Use encoding and select in the list either Auto-detect at top or 65001 (UTF-8) at the bottom of the list. The usage of 65001 (UTF-8) makes the Find in Files faster in comparison to Auto-detect, especially on search expression containing characters with a code value greater than 127 decimal which are encoded in UTF-8 with multiple bytes.
          Best regards from an UC/UE/UES for Windows user from Austria