Find all lines of an open file in a directory containing multiple files?

Find all lines of an open file in a directory containing multiple files?

81
Advanced UserAdvanced User
81

    Apr 06, 2017#1

    I have a *.txt file containing multiple regex pattern line by line, for example:

    Code: Select all

    aff id="[a-z]+
    <author-footnotes>^p<sup>[0-9]+-[a-z]+</sup>
    <copyright-statement>[~&]
    <label>^p^p</label>
    <caption><p>^(*^)</p></caption>
    -graphic-[0-9]+[a-z]
    I'm trying to create a script which will go through each line of the regex pattern and find if there is a (or multiple) match found in one or more of the *.xml files in the user defined directory and basically shortlist them to a new file, i.e. if for example <copyright-statement>[~&] and <caption><p>^(*^)</p></caption> do not not match anything in the files then the output file should contain:

    Code: Select all

    aff id="[a-z]+
    <author-footnotes>^p<sup>[0-9]+-[a-z]+</sup>
    <label>^p^p</label>
    -graphic-[0-9]+[a-z]
    Below is the script code I'm currently using:

    Code: Select all

    if (UltraEdit.document.length > 0)
    {
       UltraEdit.insertMode();
       if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
       else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
       UltraEdit.activeDocument.hexOff();
    
       UltraEdit.activeDocument.selectAll();
       if (UltraEdit.activeDocument.isSel())
       {
          var asLines = UltraEdit.activeDocument.selection.split("\r\n");
          UltraEdit.activeDocument.top();
          
          var sDirectory = UltraEdit.getString("Enter full directory path:",1);
          if (sDirectory[sDirectory.length-1] != '\\') sDirectory += '\\';
    
          UltraEdit.frInFiles.filesToSearch=0;
          UltraEdit.ueReOn();
          UltraEdit.frInFiles.directoryStart=sDirectory;
          UltraEdit.frInFiles.searchInFilesTypes="*.xml";
          UltraEdit.frInFiles.useEncoding=false;
          UltraEdit.frInFiles.ignoreHiddenSubs=true;
          UltraEdit.frInFiles.matchCase=false;
          UltraEdit.frInFiles.reverseSearch=false;
          UltraEdit.frInFiles.matchWord=false;
          UltraEdit.frInFiles.openMatchingFiles=false;
          UltraEdit.frInFiles.displayLinesDoNotMatch=false;
          UltraEdit.frInFiles.useOutputWindow=false;
          UltraEdit.frInFiles.searchSubs=true;
          UltraEdit.frInFiles.regExp=true;
    
          for (var nLineNum = 0; nLineNum < asLines.length; nLineNum++)
          {
             if (!asLines[nLineNum].length) continue;
             UltraEdit.frInFiles.find(asLines[nLineNum]);
          }
          UltraEdit.activeDocument.top();
          UltraEdit.insertMode();
          if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
          else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
          UltraEdit.activeDocument.hexOff();
          UltraEdit.ueReOn();
          UltraEdit.activeDocument.findReplace.matchCase=false;
          UltraEdit.activeDocument.findReplace.matchWord=false;
          UltraEdit.activeDocument.findReplace.regExp=true;
          UltraEdit.activeDocument.findReplace.searchDown=true;
          UltraEdit.activeDocument.findReplace.searchInColumn=false;
          UltraEdit.activeDocument.findReplace.preserveCase=false;
          UltraEdit.activeDocument.findReplace.replaceAll=true;
          UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
          UltraEdit.activeDocument.findReplace.replace("----------------------------------------", "");
          UltraEdit.activeDocument.findReplace.replace("%[~F]*$", "");
          UltraEdit.activeDocument.findReplace.replace("Found '", "");
          UltraEdit.activeDocument.findReplace.replace("' * time(s).", "");
          UltraEdit.activeDocument.findReplace.replace("%[^t ]++[^r^n]+", "");
          UltraEdit.activeDocument.sortAsc(0,false,true,1,-1);
       }
    }
    But the script has some issues:
    • I want the new output file to show matched patterns identical to the first file, i.e. <label>^p^p</label> should be written into the new file as <label>^p^p</label> on one line.
    • The scripts takes too much time to operate. Can it be made more efficient?
    • I've approximately 100 patterns to search for and some of them are using Perl regex. Is it possible to tell the script that from line 1 to say line 60 use UltraEdit regex and use Perl regex from line 61 to end? If possible, how?

    6,602548
    Grand MasterGrand Master
    6,602548

      Apr 26, 2017#2

      Here is a script for this very unusual task:

      Code: Select all

      // Definition of variables used in main script code as well as in functions.
      var g_asXmlFiles;    // List of XML file names with full path found in
                           // specified directory tree ignoring hidden directories.
      
      var g_asRegExps;     // List of UltraEdit and Perl regular expression find
                           // strings loaded from active file on script start.
      
      var g_sDirectory=""; // Directory path entered by user of script.
                           // It is also possible to define the path here.
      
      // The function OutputMessage opens output window and prints a message into it.
      
      function OutputMessage (sTextToOutput)
      {
         UltraEdit.outputWindow.clear();
         UltraEdit.outputWindow.write(sTextToOutput);
         if (UltraEdit.outputWindow.visible == false)
         {
            UltraEdit.outputWindow.showWindow(true);
         }
         UltraEdit.outputWindow.showStatus=false;
      }
      
      
      // The function GetRegExps validates if active file has the expected header
      // and reads the regular expression from active file. It returns false if
      // an error occurred making it not possible to execute this script further.
      
      function GetRegExps ()
      {
         // Is there no file opened in UltraEdit?
         if (UltraEdit.document.length < 1)
         {
            OutputMessage("There is no file with regular expressions opened in UltraEdit.");
            return false;
         }
      
         // Select insert mode and turn off column mode.
         UltraEdit.insertMode();
         if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
         else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
      
         // Get line and column of current caret position.
         var nLine = UltraEdit.activeDocument.currentLineNum;
         var nColumn = UltraEdit.activeDocument.currentColumnNum;
         if (typeof(UltraEdit.activeDocumentIdx) == "undefined") nColumn++;
      
         // Is the active file an empty file?
         UltraEdit.activeDocument.top();
         if (UltraEdit.activeDocument.isEof())
         {
            OutputMessage("Active file is empty.");
         }
      
         // Load first line of active file into a string variable.
         UltraEdit.activeDocument.selectLine();
         var sFirstLine = UltraEdit.activeDocument.selection;
      
         // If the following replace can't be done sucessfully because first line
         // does not contain the expected information about regular expression
         // type, the string variable sRegExpType is a copy of string variable
         // sFirstLine instead of "okay".
         var sRegExpType = sFirstLine.replace(/^# (?:UltraEdit|Perl):\s+$/i,"okay");
      
         // Is the first line of active file not the expected header line?
         if (sRegExpType != "okay")
         {
            UltraEdit.activeDocument.gotoLine(nLine,nColumn);
            OutputMessage("First line of active file starts whether with\n\n# UltraEdit:\n\nnor with\n\n# Perl:\n\nor there are no regular expressions.");
            return false;
         }
      
         // Select everything in active file.
         UltraEdit.activeDocument.selectAll();
      
         // Load the selected lines into an array of strings.
         var sLineTerm = sFirstLine.replace(/^[^\r\n]+([\r\n]+)$/,"$1");
         g_asRegExps = UltraEdit.activeDocument.selection.split(sLineTerm);
      
         // Remove the last string if being an empty string
         // because of file ends with a line termination.
         if (!g_asRegExps[g_asRegExps.length-1].length) g_asRegExps.pop();
      
         // Restore initial caret position in active file.
         UltraEdit.activeDocument.gotoLine(nLine,nColumn);
      
         // A simple check if the active file contains at least 1 line with a regular expression.
         for (var nIndex = 1; nIndex < g_asRegExps.length; nIndex++)
         {
            // Is the string not empty?
            if (g_asRegExps[nIndex].length)
            {
               // Is the string whether "# UltraEdit:" nor "# Perl:"?
               if (g_asRegExps[nIndex].search(/^# (?:UltraEdit|Perl):[\t ]*$/i) < 0)
               {
                  return true;   // This string is a regular expression string.
               }
            }
         }
         OutputMessage("There are no regular expressions in active file.");
         return false;
      }  // End of function GetRegExps
      
      
      // The function FindXmlFiles is a customized version of public published
      // function GetListOfFiles which gets the file names of all XML files
      // with full path found in the specified directory and its non hidden
      // subdirectories. It returns false if no XML file could be found at all.
      
      function FindXmlFiles ()
      {
         var sSummaryInfo = "Search complete, found";
         var sResultsDocTitle = "** Find Results ** ";
      
         UltraEdit.ueReOn();
         UltraEdit.frInFiles.directoryStart=g_sDirectory;
         UltraEdit.frInFiles.filesToSearch=0;
         UltraEdit.frInFiles.matchCase=false;
         UltraEdit.frInFiles.matchWord=false;
         UltraEdit.frInFiles.regExp=false;
         UltraEdit.frInFiles.ignoreHiddenSubs=true;
         UltraEdit.frInFiles.searchInFilesTypes="*.xml";
         UltraEdit.frInFiles.searchSubs=true;
         UltraEdit.frInFiles.useEncoding=false;
         UltraEdit.frInFiles.unicodeSearch=false;
         UltraEdit.frInFiles.useOutputWindow=false;
         UltraEdit.frInFiles.openMatchingFiles=false;
         UltraEdit.frInFiles.find("");
      
         var bListCreated = false;
         if (UltraEdit.activeDocument.path == sResultsDocTitle) bListCreated = true;
         else
         {
            for (var nDocIndex = 0; nDocIndex < UltraEdit.document.length; nDocIndex++)
            {
               if (UltraEdit.document[nDocIndex].path == sResultsDocTitle)
               {
                  UltraEdit.document[nDocIndex].setActive();
                  bListCreated = true;
                  break;
               }
            }
         }
      
         if (bListCreated == true && sSummaryInfo.length)
         {
            UltraEdit.activeDocument.findReplace.searchDown=false;
            UltraEdit.activeDocument.findReplace.matchCase=true;
            UltraEdit.activeDocument.findReplace.matchWord=false;
            UltraEdit.activeDocument.findReplace.regExp=false;
            UltraEdit.activeDocument.findReplace.find(sSummaryInfo);
            bListCreated = UltraEdit.activeDocument.isFound();
         }
      
         UltraEdit.activeDocument.findReplace.searchDown=true;
         if (bListCreated == false)
         {
            OutputMessage("There is a problem with frInFiles command or the strings of the script variables\n\"sSummaryInfo\" or \"sResultsDocTitle\" are not adapted to your version of UE/UES!");
            return false;
         }
      
         if (sSummaryInfo.length) UltraEdit.activeDocument.deleteLine();
      
         UltraEdit.activeDocument.top();
         UltraEdit.activeDocument.key("RIGHT ARROW");
      
         if (UltraEdit.activeDocument.currentPos > 1) UltraEdit.activeDocument.unicodeToASCII();
         else UltraEdit.activeDocument.top();
      
         if (UltraEdit.activeDocument.isEof())
         {
            UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
            OutputMessage("There are no *.xml files found in:\n\n" + g_sDirectory);
            return false;
         }
      
         // Determine line termination of find in files results file and
         // load the found file names with path into an array of strings.
         UltraEdit.activeDocument.selectLine();
         var sLineTerm = UltraEdit.activeDocument.selection.replace(/^[^\r\n]+([\r\n]+)$/,"$1");
         UltraEdit.activeDocument.selectAll();
         g_asXmlFiles = UltraEdit.activeDocument.selection.split(sLineTerm);
         UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
         g_asXmlFiles.pop();
      
         return true;
      }  // End of function FindXmlFiles
      
      
      // The function RegExpSearch runs an UltraEdit or Perl regular expression
      // Find in Files on each XML file in list until either a matching string
      // was found in an XML file or nothing found in any XML file. The return
      // value is true if a string could be found in an XML file, otherwise false.
      // The code expects Find in Files summary is written to results file with
      // the string "(0 file(s))" in case of searched string not found in XML file.
      
      function RegExpSearch (sSearchRegExp)
      {
         for (var nXmlIndex = 0; nXmlIndex < g_asXmlFiles.length; nXmlIndex++)
         {
            UltraEdit.frInFiles.searchInFilesTypes = g_asXmlFiles[nXmlIndex];
            UltraEdit.frInFiles.find(sSearchRegExp);
            if(!UltraEdit.activeDocument.findReplace.find("(0 file(s))"))
            {
               return true;
            }
            UltraEdit.activeDocument.selectAll();
            UltraEdit.activeDocument.deleteText();
         }
         return false;
      }
      
      
      // Here starts main code of script.
      
      if(GetRegExps())
      {
         // Let script user enter the directory path which must be a non empty string,
         // except the directory path is already defined at top of this script.
         while (!g_sDirectory.length)
         {
             g_sDirectory = UltraEdit.getString("Enter full directory path:",1);
         }
      
         // Make sure the directory path ends with a backslash as
         // required for find in files used later by the script.
         if (g_sDirectory[g_sDirectory.length-1] != '\\') g_sDirectory += '\\';
      
         // Get list of XML files in that directory and its non hidden subdirectories.
         if (FindXmlFiles())
         {
            // Define once all parameters for the Find in Files
            // executions never altered on running them next.
            UltraEdit.frInFiles.directoryStart="";
            UltraEdit.frInFiles.filesToSearch=0;
            UltraEdit.frInFiles.matchCase=false;
            UltraEdit.frInFiles.matchWord=false;
            UltraEdit.frInFiles.regExp=true;
            UltraEdit.frInFiles.ignoreHiddenSubs=false;
            UltraEdit.frInFiles.searchSubs=false;
            UltraEdit.frInFiles.useEncoding=false;
            UltraEdit.frInFiles.unicodeSearch=false;
            UltraEdit.frInFiles.useOutputWindow=false;
            UltraEdit.frInFiles.openMatchingFiles=false;
      
            // Define once the parameters for the Find executed on results
            // file in upwards direction to examine the summary information.
            UltraEdit.activeDocument.findReplace.mode=0;
            UltraEdit.activeDocument.findReplace.matchCase=true;
            UltraEdit.activeDocument.findReplace.matchWord=false;
            UltraEdit.activeDocument.findReplace.regExp=false;
            UltraEdit.activeDocument.findReplace.searchDown=false;
            UltraEdit.activeDocument.findReplace.searchInColumn=false;
      
            // Process the regular expression by running each of them on one
            // XML file after the other find out as fast as possible if the
            // regular expression matches any string in any XML file.
            var nRegExpCount = 0;
            var nRegExpDeleted = 0;
            for (var nRegExpIndex = 0; nRegExpIndex < g_asRegExps.length; nRegExpIndex++)
            {
               var sRegExp = g_asRegExps[nRegExpIndex];
               // Skip empty strings.
               if (!sRegExp.length) continue;
      
               // Set appropriate regular expression on string being either
               // "# UltraEdit:" or "# Perl:" and continue array processing.
               if (sRegExp.search(/^# (?:UltraEdit|Perl):[\t ]*$/i) == 0)
               {
                  if ((sRegExp[2] == 'U') || (sRegExp[2] == 'u'))
                  {
                     UltraEdit.ueReOn();
                  }
                  else
                  {
                     UltraEdit.perlReOn();
                  }
                  continue;
               }
      
               // This string is a regular expression string to count and process.
               nRegExpCount++;
      
               // Does this regular expression not match anything in any XML file?
               if (!RegExpSearch(sRegExp))
               {
                  // Remove this regular expression string from array.
                  g_asRegExps.splice(nRegExpIndex,1);
                  nRegExpDeleted++;
                  nRegExpIndex--;
               }
            }
      
            // Close the Find in Files results file without saving.
            UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
      
            // Has no regular expression matched any string in any file.
            var sPluralS = (nRegExpCount > 1) ? "s" : "";
            if (nRegExpCount == nRegExpDeleted)
            {
               OutputMessage("The "+nRegExpCount+" regular expression"+sPluralS+
                             " did not match any string in any XML file.");
            }
            else
            {
               g_asRegExps.push("");   // For last line termination in file.
      
               // Join the strings to a block and escape each occurrence of
               // ^b, ^c, ^n, ^p, ^r, ^t, ^s, ^^ with inserting an additional
               // caret before each caret to get those special character
               // sequences written correct into the new file.
               var sBlock = g_asRegExps.join("\r\n");
               sBlock = sBlock.replace(/\^\^/g,"^^^^");
               sBlock = sBlock.replace(/(\^[bcnprst])/g,"^$1");
      
               // Create a new file with DOS line terminations
               // and write the block into this new file.
               UltraEdit.newFile();
               UltraEdit.activeDocument.unixMacToDos();
               UltraEdit.activeDocument.write(sBlock);
      
               // Finally output a summary information.
               var nRegExpMatch = nRegExpCount - nRegExpDeleted;
               OutputMessage(nRegExpMatch+" from "+nRegExpCount+" regular expression"+
                             sPluralS+" matched a string in an XML file.");
            }
         }
      }
      
      The script requires that first line in active file is either # UltraEdit: or # Perl: to indicate that all lines below are UltraEdit respectively Perl regular expression search strings. Those two lines can exist in active file on script start multiple times to set the corresponding regular expression engine for the next regular expression search strings in file.

      I thought this is easier to manage than a number at top of the file which tells the script how many regular expression strings are in UE syntax and rest in Perl syntax. It would be needed to update this number correct whenever strings are added/removed from file which is more difficult to handle than a line telling the script of what type are the regular expressions below.

      How the script improves efficiency and what it does to write a regular expression string with ^p right into output file can be read on the comments in script.
      Best regards from an UC/UE/UES for Windows user from Austria

      81
      Advanced UserAdvanced User
      81

        Jul 09, 2017#3

        Dear Mofi,
        The script seems to work. However, when I put "ultraedit" patterns first and "perl" patterns second, then for some reason all "perl" patterns are shown in result window even though not all patterns are matched. Also the script takes quite some time to process.

        Sorry for this late reply, as I did not get any response even after 7 days of posting this topic, I forgot about it.
        I do appreciate your help :)

        6,602548
        Grand MasterGrand Master
        6,602548

          Jul 09, 2017#4

          I created first a file Test.txt as ANSI file with following lines for quick testing the script:

          Code: Select all

          # UltraEdit:
          [0-9]+
          Test[0-9]+
          # Perl:
          \d+
          Test\d+
          Next I created a UTF-8 encoded XML file with name Test.xml with the following lines:

          Code: Select all

          <?xml version="1.0" encoding="UTF-8"?>
          <Test>
             <Number>1</Number>
             <String>Test</String>
          </Test>
          The XML file was closed after creating, the file Test.txt remained open.

          Then I executed the script from Script List and entered the directory path to folder containing all 3 files. The result was a new file with the lines:

          Code: Select all

          # UltraEdit:
          [0-9]+
          # Perl:
          \d+
          That is absolutely correct for UltraEdit and for Perl regular expression search strings.

          When I modify in XML file <String>Test</String> to <String>Test23</String> the new file contains the same lines as Test.txt which is again the right result.
          Best regards from an UC/UE/UES for Windows user from Austria

          81
          Advanced UserAdvanced User
          81

            Jul 09, 2017#5

            Thanks :)