Script to find multiple items and output them

tahoejunk · May 31, 2012#12012-05-31T22:30+00:00

I am looking for a script or macro to find 2 text strings like below in a large file:

***2012
Item1
Item
Item3
Line XYZ Bad Note
Item7
Item9
Line XYZ Good Note
Item10

***2012
Item7
Item
Item5
Line XYZ Bad Note
Item6
Item3
Line XYZ Good Note
Item3

I want to find any sections that have both "Line XYZ Bad Note" and "Line XYZ Good Note" and then copy all the sections found into a new file.
In order to define a section each section will always start with ***2012 and I want to to count 8 lines down from the start of the section ***2012 to be included as the section.
Would it also be possible to give me a count of the number of sections it finds that contain those 2 strings?

Mofi · Jun 01, 2012#22012-06-01T07:24+00:00

The following script works for your example. I just suspect that it will not work for your real data as the real data are different. However, the technique to use is demonstrated already by this script.

Code: Select all

// Is any file currently open?
if (UltraEdit.document.length > 0) {

   UltraEdit.insertMode();
   if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
   else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
   UltraEdit.activeDocument.hexOff();

   UltraEdit.activeDocument.bottom();
   // Is the last line of the file terminated with a line ending?
   if (UltraEdit.activeDocument.isColNumGt(1)) {
      // No, insert missing line ending (DOS, UNIX or MAC).
      UltraEdit.activeDocument.insertLine();
      // If auto indent is enabled and last line starts with white-spaces,
      // delete the automatically inserted whitespaces in the new last line.
      if (UltraEdit.activeDocument.isColNumGt(1)) {
         UltraEdit.activeDocument.deleteToStartOfLine();
      }
   }

   UltraEdit.activeDocument.top();
   var nActiveClipboard = UltraEdit.clipboardIdx;
   UltraEdit.selectClipboard(9);
   UltraEdit.clearClipboard();

   UltraEdit.perlReOn();
   UltraEdit.activeDocument.findReplace.mode=0;
   UltraEdit.activeDocument.findReplace.searchDown=true;
   UltraEdit.activeDocument.findReplace.matchCase=true;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=true;
   
   var nCount = 0;
   while( UltraEdit.activeDocument.findReplace.find("^\\*\\*\\*20\\d\\d(?:.*\\r\\n){4}Line.+Bad Note(?:.*\\r\\n){3}Line.+Good Note.*\\r\\n.*\\r\\n")) {
      UltraEdit.activeDocument.copyAppend();
      UltraEdit.clipboardContent += "\r\n";
      nCount++;
   }

   UltraEdit.activeDocument.top();
   if (nCount) {
      UltraEdit.newFile();
      UltraEdit.activeDocument.paste();
      UltraEdit.activeDocument.top();
      UltraEdit.clearClipboard();
      UltraEdit.messageBox("Found "+nCount+" section"+(nCount==1?"":"s")+" with a bad and a good note.");
   } else UltraEdit.messageBox("There is no section containing a bad and a good note!");
   UltraEdit.selectClipboard(nActiveClipboard);
} else UltraEdit.messageBox("You should have a file open when you run this script!");

tahoejunk · Jun 01, 2012#32012-06-01T14:10+00:00

Thank you this works for most items. However I should have mentioned some additional variables. One is that a "Line XYZ Bad Note" may not always be 4 lines after *** 2012. Can the script be modified to find a "Line XYZ Bad Note" first then search up for the *** 2012 to count that as the start of the section then count 8 lines down from that to end the section?

Mofi · Jun 02, 2012#42012-06-02T09:59+00:00

The script could be rewritten to search for Bad Note, the search upwards for ***2012 and use one more Find to select the block for being copied. But that makes the script much slower. I think, with the following Perl Regular expression used in above script as search string in findReplace command, you get the same much faster.

"^\\*\\*\\*20\\d\\d.*\\r\\n(?:[^*\\r\\n].*\\r\\n)*.*Bad Note.*\\r\\n(?:[^*\\r\\n].*\\r\\n)*.*Good Note.*\\r\\n(?:[^*\\r\\n].*\\r\\n)*"

This quite complex Perl regular expression string is in real

"^\*\*\*20\d\d.*\r\n(?:[^*\r\n].*\r\n)*.*Bad Note.*\r\n(?:[^*\r\n].*\r\n)*.*Good Note.*\r\n(?:[^*\r\n].*\r\n)*"

It finds a block starting with ***20 and two more digits and containing up to a blank line or another line starting with an asterisk the string Bad Note AND Good Note in this sequence.