Extract RegExp Matches from Doc to Clipboard

Extract RegExp Matches from Doc to Clipboard

2
NewbieNewbie
2

    Aug 03, 2011#1

    This script performs a regular expression (regexp) search of the active document, writing unique matches to the Windows clipboard.
    • This script was written and tested using UEStudio version 10.20.0.1001 running on Windows Server 2008 R2 Enterprise.

      RegExp is enabled before the search is executed, activating your default regexp engine (I use the UE engine). You can force the script to use the UE, Perl, or Unix engine by uncommenting/commenting a couple of lines near the search execution command.

      Each regexp match is placed on the Windows clipboard sorted (ascending, case-insensitive) and with duplicate matches removed. The duplicate removal can be changed by commenting/uncommenting lines in this script's sortDoc() function. You can also modify any of the sorting option lines to suit your needs.
    Remember that this script is designed to extract unique RegExp matches to a list. If you enter a search term that returns one or more of the same result, you will have a list of one item. For example, if you searched is script for the term

    Code: Select all

    index
    the result would be one line on the clipboard:

    Code: Select all

    index
    Although there are more than a few occurrences of "index" in this script, duplicates are removed before the list is written to the clipboard. By default the sort is case-insensitive; ergo a single line is left when (case-insensitive) duplicates are removed.

    Ok, so how is this script useful?

    I use UE extensively to write PowerShell scripts. I often create XML configuration files for automated product installation. These installations usually required a number of Windows domain user accounts which must be embedded in these XML files. In addition, these accounts are sometimes coded in more than one place.

    As part of installation setup I must create each user account. To this end I use a list of user accounts which I run against a batch file. I use this script to create the account list by running the script against the
    populated XML configuration file using this regexp:

    Code: Select all

    domainName\\[a-zA-Z][a-zA-Z0-9]++
    The result is a list of unique user account names as I use them in my account creation process
    • Note: This particular regexp may not work for you as Windows allows characters not specified here and there is a length limit to account names...I just don't need those details for my environment.
    I have subsequently used this script to help me document scripts and other source code. For example, if you run the following regexp against this script file:

    Code: Select all

    var [a-zA-Z][a-zA-Z0-9\\-_]++
    you get a list of formally defined variables:

    Code: Select all

    var callerDocIdx
    var callerIdx
    var frMatchCase
    var frMatchWord
    var frMode
    var frRegExp
    var frSearchAscii
    var frSearchDown
    var frSearchInColumn
    var homeDocIdx
    var i
    var outputDocIdx
    var searchUserName
    var tabindex
    Here is the complete script:

    Code: Select all

    ////////////////////////////////////////////////////////
    /// extractRegExpReults.js
    ///
    /// Searches a document and saves results to the
    /// Windows clipboard.
    ///
    /// o The search is a regular expression. RegExp is enabled before the search is executed, so it
    ///   uses the default UE RegExp engine. You can use the Perl or Unex engine by uncommenting
    ///   lines in this script (near the search execution).
    ///
    /// o Each RegExp match is placed on the Windows clipboard: Sorted (ascending) and duplicates
    ///   removed. The duplicate removal can be changed by uncommenting line(s) in the sortDoc() 
    ///   function below. You can also modify any of the sorting option lines to suit your needs.
    ///
    /////////////////////////////////////////////////////////////////////////////////////////////////
    
    ////////////////////////////////////////////////////
    // main function - called by last line in this file
    ////////////////////////////////////////////////////
    function main() {
    
        var searchUserName = UltraEdit.getString("Enter a search expression.",1);   // Get the search expression.
        
        var homeDocIdx = getActiveDocumentIndex();  // Remember the target document index
        var outputDocIdx = UltraEdit.document.length;  // Remember temp document index
        UltraEdit.newFile();                                          // Create a temp document (becomes the active document)
        UltraEdit.document[homeDocIdx].setActive();    // Reset source document to active after newFile()
        UltraEdit.activeDocument.top();                        // Start the search at the beginning of the document
        
        // Save current UE search settings
        var frMode                 = UltraEdit.activeDocument.findReplace.mode;
        var frMatchCase         = UltraEdit.activeDocument.findReplace.matchCase;
        var frMatchWord        = UltraEdit.activeDocument.findReplace.matchWord;
        var frRegExp             = UltraEdit.activeDocument.findReplace.regExp;
        var frSearchAscii        = UltraEdit.activeDocument.findReplace.searchAscii;
        var frSearchDown      = UltraEdit.activeDocument.findReplace.searchDown;
        var frSearchInColumn = UltraEdit.activeDocument.findReplace.searchInColumn;
        
        UltraEdit.activeDocument.findReplace.mode=0;                // Set search options
        UltraEdit.activeDocument.findReplace.matchCase=false;
        UltraEdit.activeDocument.findReplace.matchWord=false;
        UltraEdit.activeDocument.findReplace.regExp=true;
        UltraEdit.activeDocument.findReplace.searchAscii=false;
        UltraEdit.activeDocument.findReplace.searchDown=true;
        UltraEdit.activeDocument.findReplace.searchInColumn=false;
        
        // Uncomment the appropriate line for the regExp engine you want to use:
        //UltraEdit.perlReOn();   // Perl
        UltraEdit.ueReOn();       // UltraEdit
        //UltraEdit.unixReOn();   // Unix
        
        // Find all regExp matches and write them to the temporary document
        while(UltraEdit.activeDocument.findReplace.find(searchUserName)) {
            
            UltraEdit.document[outputDocIdx].write(UltraEdit.activeDocument.selection + "\r\n");    
        }
        
        sortDoc(outputDocIdx);                                              // Sort the results in the temporary document
        copyDocToClipboard(outputDocIdx, 0);                                // Copy to clipboard - Using Widows clipboard.
        UltraEdit.closeFile(UltraEdit.document[outputDocIdx].path,2);       // Dispose of the temporary document
        
        // Restore the original settings for UE search
        UltraEdit.activeDocument.findReplace.mode           = frMode;
        UltraEdit.activeDocument.findReplace.matchCase      = frMatchCase;
        UltraEdit.activeDocument.findReplace.matchWord      = frMatchWord;
        UltraEdit.activeDocument.findReplace.regExp         = frRegExp;
        UltraEdit.activeDocument.findReplace.searchAscii    = frSearchAscii;
        UltraEdit.activeDocument.findReplace.searchDown     = frSearchDown;
        UltraEdit.activeDocument.findReplace.searchInColumn = frSearchInColumn;
        
        UltraEdit.activeDocument.top();
        
        return  // Ends main() - effectively exits the script.
    }
    
    // ////////////////////////////////////////////////////////
    // // sub functions
    // ////////////////////////////////////////////////////////
    
    // //////////////////////////////////////
    // sortDoc()
    // //////////////////////////////////////
    // Sorts specified document by line and ascending;
    // removes duplicate lines. Target document is 
    // referenced by index number.
    function sortDoc(targetIdx) {
        
        var callerIdx = getActiveDocumentIndex();           // Remember the caller's active document
        UltraEdit.document[targetIdx].setActive();          // Set the sort target 
            
        UltraEdit.activeDocument.sort.ascending=true;       // Set sort options
        UltraEdit.activeDocument.sort.col1Start=1;
        UltraEdit.activeDocument.sort.col1End=-1;
        UltraEdit.activeDocument.sort.col2Start=0;
        UltraEdit.activeDocument.sort.col2End=0;
        UltraEdit.activeDocument.sort.col3Start=0;
        UltraEdit.activeDocument.sort.col3End=0;
        UltraEdit.activeDocument.sort.col4Start=0;
        UltraEdit.activeDocument.sort.col4End=0;
        UltraEdit.activeDocument.sort.ignoreCase=true;
        UltraEdit.activeDocument.sort.removeDuplicates=2;
        UltraEdit.activeDocument.sort.remKey1=true;
        UltraEdit.activeDocument.sort.remKey2=true;
        UltraEdit.activeDocument.sort.remKey3=true;
        UltraEdit.activeDocument.sort.remKey4=true;
        UltraEdit.activeDocument.sort.type=0;
        
        UltraEdit.activeDocument.sort.sort();               // Sort the target document
        UltraEdit.document[callerIdx].setActive();          // Restore the caller's active document
        
        return // done.
    }
    
    // //////////////////////////////////
    // copyDocToClipboard()
    // //////////////////////////////////
    // Copy entire document (specified by index number)
    // to clipboard (also specified by index number)
    function copyDocToClipboard(documentIdx, clipIdx) {
        
        var callerDocIdx = getActiveDocumentIndex();                    // Remember the current active document
        try { 
            UltraEdit.document[documentIdx].setActive()                 // Activate copy source document
        }
        catch (e) { 
            throw "copyDocToClipboard: Invalid document index.";
        }
             
        if (clipIdx < 0) { throw "Clipboard index out of range."; }     // Check clipboard index range.
        if (clipIdx > 9) { throw "Clipboard index out of range."; }
        
        UltraEdit.selectClipboard(clipIdx);                             // Select and clear the active clipboard
        UltraEdit.clearClipboard();
        UltraEdit.activeDocument.selectAll();                           // Select the text, copy to active clipboard
        UltraEdit.clipboardContent = UltraEdit.activeDocument.selection;
        
        UltraEdit.document[callerDocIdx].setActive();                   // Restore the caller's active document    
        
        return // done.    
    }
    
    // Get the index for the active document.
    // I lifted this from the UE user forum. Thank
    // you jorrasdk,
    function getActiveDocumentIndex() {
       var tabindex = -1; /* start value */
    
       for (var i = 0; i < UltraEdit.document.length; i++)
       {
          if (UltraEdit.activeDocument.path==UltraEdit.document[i].path) {
             tabindex = i;
             break;
          }
       }
       return tabindex;
    }
    
    main()

    5

      Dec 15, 2011#2

      Thank you very much for this script. :) It was almost what I was looking for.

      I commented a few lines so the temporary document isn't closed and focus isn't returned to the original document, cause I'd like to see the found matches after the script has been run.

      Instead of only searching the active document I would have liked to search all files matching a pattern, for example "C:\dummy\*.txt". My scriptwriting ability is not that good, and so far I've only added one line:

      var searchFiles = UltraEdit.getString("Enter files to search.",1); // Search all files matching this pattern

      Would it be hard to alter the rest of the script ? Or is there perhaps an easier way to achieve what I need ?
      To avoid confusion, maybe I should point out that I only want one resulting list of unique matches, not one list per file.
      Meanwhile, I'll see if I can find any more hints in the forum to guide me.

      Best Regards
      UltraFanatic

      6,603548
      Grand MasterGrand Master
      6,603548

        Dec 15, 2011#3

        What bmatsoukas surely needs to know: Are the files matching the entered pattern already opened in UltraEdit or must the script search for files matching the pattern and open them?

        Another question:

        Why do you not use command Find in Files and use some regular expression replaces to remove all information from the results written to output window or (in this case better) to a new document window you don't want like the file names, line numbers, how often the search string was found, etc.?

        Using Find in Files is much quicker for your task then a script solution. Of course you could code a script to run Find in Files with an entered file name pattern and an entered search string and which removes the unwanted information automatically in the new file.

        5

          Dec 15, 2011#4

          Thanks for your reply Mofi. :)

          My intention was that the script would find and open files too. However, your suggestion of using find in files seems to work well too.

          First I perform the regular expression search on all files, writing the result to an edit window, and then run bmatsoukas script with the same regular expression to extract the unique strings from the result window. Is that how you meant ?

          I've set the Find in Files parameter "Found Line" to $S. I can't see that there is a $-command to only write "found text" in the search result instead of the full line, but I may have missed it. Otherwise I could have just used File - Sort and remove duplicates from the search result.

          Take care
          UltraFanatic

          6,603548
          Grand MasterGrand Master
          6,603548

            Dec 15, 2011#5

            UltraFanatic wrote:Is that how you meant?
            Yes, that is one method. Of course you can modify the script written by bmatsoukas and insert at beginning the command to execute Find in Files with results to an edit window directly from within the script. Just move the block

            Code: Select all

                // Uncomment the appropriate line for the regExp engine you want to use:
                //UltraEdit.perlReOn();   // Perl
                UltraEdit.ueReOn();       // UltraEdit
                //UltraEdit.unixReOn();   // Unix
            to top of function main() to define the regular expression engine to use also for the Find in Files command executed perhaps first. And insert below the line

            Code: Select all

            var searchUserName = UltraEdit.getString("Enter a search expression.",1);   // Get the search expression.
            following block:

            Code: Select all

                var sFilePattern = UltraEdit.getString("Enter file search pattern:",1);     // Get the file pattern.
                if (sFilePattern != "") {
                   UltraEdit.frInFiles.searchSubs=false;
                   UltraEdit.frInFiles.directoryStart="";
                   UltraEdit.frInFiles.searchInFilesTypes=sFilePattern;
                   UltraEdit.frInFiles.filesToSearch=0;
                   UltraEdit.frInFiles.matchCase=false;
                   UltraEdit.frInFiles.matchWord=false;
                   UltraEdit.frInFiles.regExp=true;
                   UltraEdit.frInFiles.unicodeSearch=false;
                   UltraEdit.frInFiles.reverseSearch=false;
                   UltraEdit.frInFiles.useOutputWindow=false;
                   if (typeof(UltraEdit.frInFiles.openMatchingFiles) == "boolean")
                       UltraEdit.frInFiles.openMatchingFiles=false;
                   UltraEdit.frInFiles.find(searchUserName);
                }
            If you enter nothing for file search pattern, the script runs simply on active file as designed by bmatsoukas. Otherwise first the Find in Files command is executed with results written to an edit window.

            It would be perhaps better to clean up the results list before running the rest of the script on the results inside the IF condition posted above at the end. That's not needed for real regular expression searches, but if you just search for a simple word there are usually also the lines containing the searched strings in the results and then the original script will always find the searched word even when not found in any file.

            Well, it looks like you have disabled all options to get as less information as possible in the results file. Therefore you don't need to run regular expression replaces to remove unwanted lines, the file names and the line numbers from the results of Find in Files command.

            Because Find in Files always returns lines containing a found string there is no variable for just the found string.

            5

              Dec 18, 2011#6

              Mofi wrote:Because Find in Files always returns lines containing a found string there is no variable for just the found string.
              Thanks for confirming that and for your addition to the script.
              After the line with "var homeDocIdx" I remove duplicates from the file when a filepattern has been given to minimize data for the second part of the script to process. In one case the number of lines changed from 11783 down to 3149.

              Code: Select all

              if (sFilePattern != "") {
                // Sort the search results, and removing duplicates
                sortDoc(homeDocIdx);
              }
              I saw in another thread that you recommended to store values in a string before writing them to a document, so I altered that part too.

              Code: Select all

              var sFoundLines = "";
              // Find all regExp matches and add them to the string followed by a linebreak
              while(UltraEdit.activeDocument.findReplace.find(searchUserName)) {
                sFoundLines += UltraEdit.activeDocument.selection + "\r\n";
              }
              
              // Write found matches to the temporary document
              UltraEdit.document[outputDocIdx].write(sFoundLines);
              It would have been handy if the output format could be set temporarily in the script, so I don't have to go to Advanced - Configuration - Set Find Output Format to uncheck header, summaries and changing Found Line to $S. Then go back and reset them after running the script. But I'm not complaining, it just would have been convenient since I mostly prefer using the default values.
              I guess I could keep the default setting and add code to the script that cleans up the search result as you suggest, but then I might lose more time on the script runnning than it takes for me to set the output format manually. ;)

              6,603548
              Grand MasterGrand Master
              6,603548

                Dec 19, 2011#7

                Why do you change the find output format before running the script. It would be much easier to delete with the script the not needed lines as I suggested. For example for default English find output format the lines of no interest for this script could be removed with following commands inserted below the line UltraEdit.frInFiles.find(searchUserName); when UltraEdit regular expression is set at start of the script.

                Code: Select all

                       UltraEdit.activeDocument.findReplace.mode=0;
                       UltraEdit.activeDocument.findReplace.matchCase=true;
                       UltraEdit.activeDocument.findReplace.matchWord=false;
                       UltraEdit.activeDocument.findReplace.regExp=false;
                       UltraEdit.activeDocument.findReplace.searchDown=false;
                       if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
                           UltraEdit.activeDocument.findReplace.searchInColumn=false;
                       // Delete find summary and everything below.
                       if (UltraEdit.activeDocument.findReplace.find("Search complete, found")) {
                          UltraEdit.activeDocument.selectToBottom();
                          UltraEdit.activeDocument.deleteText();
                       }
                       UltraEdit.activeDocument.top();
                       UltraEdit.activeDocument.findReplace.regExp=true;
                       UltraEdit.activeDocument.findReplace.searchDown=true;
                       UltraEdit.activeDocument.findReplace.mode=0;
                       UltraEdit.activeDocument.findReplace.preserveCase=false;
                       UltraEdit.activeDocument.findReplace.replaceAll=true;
                       UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
                       // Delete headers and file summaries.
                       UltraEdit.activeDocument.findReplace.replace("%---*^p", "");
                       UltraEdit.activeDocument.findReplace.replace("%F[iou]+nd*^p", "");
                       // Delete the file names at start of every found line.
                       UltraEdit.activeDocument.findReplace.replace("%*([0-9]+): ", "");