How to copy all lines in data file A containing a string in list file B to a new file?

How to copy all lines in data file A containing a string in list file B to a new file?

1
NewbieNewbie
1

    Dec 16, 2015#1

    As a new user I'm still challenged by the many things this tool can do but I've run into a need that I haven't been able to figure out yet.

    Is it possible to create a script that will take records from file A and copy them to a new file C, but only if a text string matches to a list of values in file B?

    For example:

    File A contains:

    Code: Select all

    DTL123456              John Doe, etc         (hundred thousand rows+)
    File B contains:

    Code: Select all

    123456                                       (under a few thousand rows)
    And I'd like to create File C with the full record from A where positions 4-23 match positions 1-20 on any one of the records in file B.

    I've tried searching through the forum topics but didn't really see anything that addressed this.

    Any advice would be appreciated.
    Thanks

    6,602548
    Grand MasterGrand Master
    6,602548

      Dec 16, 2015#2

      It is not the first time that I wrote a script to copy / extract / grep data from a data file to a new file depending on data in a list file.

      One example is How to find lines containing a number stored in a list into another file?

      But this task can't be done with a general script because there are often special requirements.

      Please read the comments in script before using it the first time.

      Requirements for script execution:
      • English UltraEdit is used, or the 4 string values at top are adjusted as explained in comment.
      • First opened file must be the data file which can't be a new, unsaved file.
      • Second opened file must be the list file with the numbers which can be also a new, unsaved file.
      • Other files are ignored by the script with exception of an unsaved Find in Files results file which
        is automatically closed by the script without saving it.
      Third file could be the script file itself with the code posted below saved as ASCII/ANSI file with DOS line terminators with a name like FindAndCopyLinesFromList.js and executed with clicking on Run Active Script in menu Scripting.

      Code: Select all

      // This script is written for English UltraEdit.
      
      // For other languages adapt the strings defined below.
      
      // Run any Find in Files with results written to an output window to
      // get the localized strings of document title for results file, the
      // file header, the file summary and the find summary information at
      // bottom of results file. Click in menu "Advanced" on "Configuration"
      // and navigate in tree to "Search - Set Find Output Format" which must
      // match to script code below and the variables.
      
      var sResultsDocTitle = "** Find Results ** ";  // Note the space at end!
      var sFindSummaryInfo = "Search complete, found";
      // Perl regular expression search strings for header and file summary.
      var sFileHeaderInfo  = "Find '.+' in '.+':";
      var sFileSummaryInfo = "Found '.+' [0-9]+ time[(s).]+"
      
      // Define maximum number of data for each search, i.e. strings from list
      // file in one OR expression in Perl regular expression search string.
      // The search string can't be of unlimited length which is the reason
      // for this constant which should be set according to length of data
      // in list file. The longer the lines, the smaller should be the value.
      var nMaxDataPerSearch = 100;
      
      // Close Find in Files results file without saving if existing at all.
      
      for (var nDocIndex = 0; nDocIndex < UltraEdit.document.length; nDocIndex++)
      {
         if (UltraEdit.document[nDocIndex].path == sResultsDocTitle)
         {
            UltraEdit.closeFile(UltraEdit.document[nDocIndex].path,2);
            break;
         }
      }
      
      if (UltraEdit.document.length >= 2)  // Are at least 2 files opened?
      {
         // Define the environment for the script.
         UltraEdit.insertMode();
         if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
         else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
         UltraEdit.perlReOn();
      
         // The first opened file - most left in open file tabs bar - must be the
         // named data file (new, unsaved file not possible) containing the lines
         // which should be searched for. From this data file is needed just the
         // file name with full path.
      
         var sNameOfDataFile = UltraEdit.document[0].path;
      
         // The second opened file must be the list file containing the numbers
         // to search for in data file. This can be also a new, unnamed file. It
         // must be small enough to load it completely into memory as an array
         // of strings. It is expected that it does not contain empty lines.
      
         UltraEdit.document[1].selectAll();
         if (UltraEdit.document[1].isSel())
         {
            var sLineTerm;
            if (UltraEdit.document[1].lineTerminator <= 0) sLineTerm = "\r\n";
            else if (UltraEdit.document[1].lineTerminator == 1) sLineTerm = "\n";
            else sLineTerm = "\r";
      
            // Get the selected lines into as an array of strings.
            var asSearchData = UltraEdit.document[1].selection.split(sLineTerm);
            UltraEdit.document[1].top();  // Just for discarding the selection.
      
            // If last string in array is an empty string because the list file
            // ends with a line termination, remove this empty string from array.
            if (!asSearchData[asSearchData.length-1].length) asSearchData.pop();
      
            // Define parameters for the Find in Files executed below in a
            // loop for the lines with the data to search for in data file.
            UltraEdit.frInFiles.filesToSearch=0;               // Search in a directory.
            UltraEdit.frInFiles.directoryStart="";             // Search in just 1 named file.
            UltraEdit.frInFiles.searchInFilesTypes=sNameOfDataFile;
            UltraEdit.frInFiles.useEncoding=false;             // Run an ANSI search.
            UltraEdit.frInFiles.ignoreHiddenSubs=true;         // Ignore hidden subdirectories.
            UltraEdit.frInFiles.matchCase=true;                // Run a case sensitive search.
            UltraEdit.frInFiles.reverseSearch=false;           // Do not find files not containing searched string.
            UltraEdit.frInFiles.matchWord=false;               // Search for strings and not entire words.
            UltraEdit.frInFiles.openMatchingFiles=false;       // Do not open files with string found.
            UltraEdit.frInFiles.displayLinesDoNotMatch=false;  // Do not find lines not containing search string.
            UltraEdit.frInFiles.useOutputWindow=false;         // Output find results to edit window.
            UltraEdit.frInFiles.searchSubs=false;              // Do not search in subdirectories.
            UltraEdit.frInFiles.regExp=true;                   // Run a regular expression search.
      
            // Make sure max data per search is at least 1 (no OR expression used).
            if (nMaxDataPerSearch < 1) nMaxDataPerSearch = 1;
      
            var sFindExp;
            var nDataIndex = 0;
      
            while(nDataIndex < asSearchData.length)
            {
               var nOrCount = asSearchData.length - nDataIndex;
               if (nOrCount > nMaxDataPerSearch) nOrCount = nMaxDataPerSearch;
      
               // The lines to find contain the data to search for
               // at start of the line after 3 other characters.
               sFindExp = "^...";
               if (nOrCount == 1)   // Just one search data left?
               {
                  sFindExp += asSearchData[nDataIndex];
                  nDataIndex++;
               }
               else  // Use an OR expression because of multiple search data.
               {
                  sFindExp += "(?:" + asSearchData[nDataIndex];
                  nDataIndex++;
                  for (nOrCount--; nOrCount > 0; nOrCount--)
                  {
                     sFindExp += "|" + asSearchData[nDataIndex];
                     nDataIndex++;
                  }
                  sFindExp += ")";
               }
               // Make sure the searched number is not found within a larger number.
               sFindExp += "\\b";
               UltraEdit.frInFiles.find(sFindExp);
            }
      
            // The results file is the active file now. Move caret to top
            // of this file and convert the file from Unicode to ASCII/ANSI.
            UltraEdit.activeDocument.top();
            UltraEdit.activeDocument.unicodeToASCII();
      
            // Clean up the results file by removing all headers, all file and
            // find summaries and all occurrences of file name with line number
            // at beginning of each found line.
      
            // Commenting the lines below make it possible to see in results
            // file which search strings where used on each Find in Files.
      
            if (UltraEdit.activeDocument.lineTerminator <= 0) sLineTerm = "\r\n";
            else if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
            else sLineTerm = "\r";
      
            UltraEdit.activeDocument.findReplace.mode=0;
            UltraEdit.activeDocument.findReplace.matchCase=true;
            UltraEdit.activeDocument.findReplace.matchWord=false;
            UltraEdit.activeDocument.findReplace.regExp=true;
            UltraEdit.activeDocument.findReplace.searchDown=true;
            UltraEdit.activeDocument.findReplace.searchInColumn=false;
            UltraEdit.activeDocument.findReplace.preserveCase=false;
            UltraEdit.activeDocument.findReplace.replaceAll=true;
            UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
      
            sFindExp = "^(?:-{40}|" +             // first line of header
                       sFileHeaderInfo + "|" +    // second line of header
                       sFileSummaryInfo + "|" +   // file summary information
                       sFindSummaryInfo + ".+)" + // find summary information
                       sLineTerm;                 // line termination
      
            UltraEdit.activeDocument.findReplace.replace(sFindExp,"");
      
            // Remove also name of file with path with line number in parentheses
            // and a colon and a space from start of all the remaining lines.
            // Each backslash in file name with path must be escaped with one
            // more backslash for the Perl regular expression search string.
            sFindExp = "^" + sNameOfDataFile.replace(/\\/g,"\\\\") + "\\(\\d+\\): ";
      
            UltraEdit.activeDocument.findReplace.replace(sFindExp,"");
         }
      }
      
      Best regards from an UC/UE/UES for Windows user from Austria