Pulling Common Lines from 4 Files to Make a New One

gutzy · Feb 19, 2012#12012-02-19T00:02+00:00

I have common data in four files, call them A.csv, B.csv, C.csv & D.csv, that I want to move to a new file User.csv and repeat for all instances of "user". The files look like:

"User","Sal","First","Last","Attn","Country",/ /,"Ffapplied_","Group"
"CCD","","","NEW HOUSING","","",/ /,"",""
"CCD","","BOB & JANE","SMITH","","",/ /,"",""
"CCD","","","CAMBRIA HEIGHTS","C/O THE CORRESP","",/ /,"",""
"Evan","","BILL & PAT","ALEX","","",/ /,"",""
"Evan","","RICHARD & BARBARA","ANDERSON","","",/ /,"",""
"Halls","","SANDY","CRICKET","","",/ /,"",""
"Halls","MRS","FERN","MAXWELL","TAZOR HILLS","",/ /,"",""
"Halls","","HOWARD & GLADYS","TESLA","","",/ /,"",""
"Richard Bob","","","1040 WINDOW INC","","",/ /,"","no email"
"Richard Bob","","","CHAPEL UMC","","",/ /,"",""
"Richardson","","SCOTT & CAROL","BASSEL","","",/ /,"","email"
"Richardson","","","BEACON COMMUNITY CHURCH","","",/ /,"",""
"Richardson","","","FIRST STREET PLUMBING","C/O MR ROBERTS","",/ /,"",""

The above list is from file A.csv which has all 5 instances that will be in the output files to be created (CCD.csv, Evan.csv, etc). The other three files will NOT have all instances, i.e. B.csv will only have Evan and Richardson. All files are sorted by the first field "user" with the exception of the header in line one, which is present in all files.

The resultant files will look like: filename = CCD.csv

From File A.csv
"User","Sal","First","Last","Attn","Country",/ /,"Ffapplied_","Group"
"CCD","","","NEW HOUSING","","",/ /,"",""
"CCD","","BOB & JANE","SMITH","","",/ /,"",""
"CCD","","","CAMBRIA HEIGHTS","C/O THE CORRESP","",/ /,"",""

From File B.csv
"User","Sal","First","Last","Attn","Country",/ /,"Ffapplied_","Group"

From File C.csv
"User","Sal","First","Last","Attn","Country",/ /,"Ffapplied_","Group"
"CCD","","","Old HOUSING","","",/ /,"",""

From File D.csv
"User","Sal","First","Last","Attn","Country",/ /,"Ffapplied_","Group"
"CCD","","BOB","JONES","","",/ /,"",""
"CCD","MRS","","MAXWELL","","",/ /,"",""

So far I have the first line deleted from the four source files and can add the extra text above each header. Where I'm stuck is with file manipulation and selecting the blocks of text to be extracted. I can select CCD as my desired search text and destination filename. How do I select all the lines in a file that start with "CCD" (CCD may exist anywhere else in the line but only if it start with CCD should it be copied. All lines starting with "CCD" are together in a file.)?

I plan to open A.csv and run the macro. It will open B.csv, C.csv, & D.csv. End results the four initial files will be empty and I'll have created CCD.csv, Evan.csv, Halls.csv, Richard Bob.csv, and Richardson.csv. Other than using PreviousDocument and NextDocument is there a better way to manipulate the files?

Thanks in advance!!

Mofi · Feb 19, 2012#22012-02-19T18:48+00:00

Although that could be done with an UltraEdit macro, it could be much easier done with an UltraEdit script.

Which version of UltraEdit do you use?

See Help - About dialog. Starting with UE v13.00 scripts are supported and for this task the limited scripting commands list of UE v13.00 would be already enough.

Next question: How large are the 4 files in MB?

If they are small (not several hundred MB) the script could do everything in memory making it very fast.

gutzy · Feb 20, 2012#32012-02-20T01:06+00:00

I'm using UltraEdit 15. File A.csv is 4.7MB (37K lines) but the other three are more like 100KB.

Mofi · Feb 22, 2012#42012-02-22T07:01+00:00

I decided to do not all in memory because that makes it more difficult to code the script. Instead a Perl regular expression Find is used to find the lines starting with same string in every file.

You have to open all *.csv input files with the largest one being the first file (most left on open file tabs bar). You can run the script either from the Scripts List after adding the script file to the list, or you open the script file as last file (most right) and execute it with Scripting - Run Active Script.

On first run test it with 4 small files and have the output window open in case an error message is printed to the output window. I have tested the script only with 4 small files containing the lines you posted with beta 2 of UE v18.00. So I don't know if a script property or command is used not available in your version of UltraEdit.

Code: Select all

if (UltraEdit.document.length > 1)  // At least 2 documents must be open.
{
   // Define the environment for the script.
   UltraEdit.insertMode();
   UltraEdit.columnModeOff();
   UltraEdit.perlReOn();
   var nCsvFileCnt = UltraEdit.document.length;;
   // If the last opened file is this script file, ignore that file.
   if (UltraEdit.document[nCsvFileCnt-1].path.indexOf(".js") > 0) nCsvFileCnt--;

   // Create already a new file to avoid display updates.
   UltraEdit.newFile();

   // Make sure that every CSV file ends with a line terminator
   // and set caret to start of line 2 in all CSV files.
   for (var nFileIndex = 0; nFileIndex < nCsvFileCnt; nFileIndex++)
   {
      UltraEdit.document[nFileIndex].bottom();
      if (UltraEdit.document[nFileIndex].isColNumGt(1))
      {
         UltraEdit.document[nFileIndex].insertLine();
      }
      UltraEdit.document[nFileIndex].gotoLine(2,1);
      // Define once the find parameters use for all finds and make
      // the Perl regular expression engine active for the finds.
      UltraEdit.document[nFileIndex].findReplace.mode=0;
      UltraEdit.document[nFileIndex].findReplace.matchCase=true;
      UltraEdit.document[nFileIndex].findReplace.matchWord=false;
      UltraEdit.document[nFileIndex].findReplace.regExp=true;
      UltraEdit.document[nFileIndex].findReplace.searchDown=true;
   }

   // Get header line from first file.
   UltraEdit.document[0].top();
   UltraEdit.document[0].selectLine();
   var sHeader = UltraEdit.document[0].selection;
   var sLineTerm = "\r\n";

   // In the main loop the first string at start at line is selected
   // with a regular expression search. This loop is executed until
   // there is no line anymore with a double quoted string at start
   // of the line (= end of file reached).
   while (UltraEdit.document[0].findReplace.find('^".*?"'))
   {
      var sName = UltraEdit.document[0].selection;
      var_dump(sName);
      UltraEdit.document[0].gotoLine(1,1);
      var sList = UltraEdit.document[0].path + sLineTerm + sHeader;

      // The Perl regular expression below selects multiple lines
      // starting with same string. Usually a single search should
      // be enough to find all lines starting with that string if the
      // lines are sorted. But for security the search is executed
      // within a loop to really find and select all lines starting
      // with that string although this makes the script slower.
      var sSearch = "^(:?" + sName + ".*\\r\\n)+";

      // Get all lines starting with current name from all files. It
      // is important the lines in all files are sorted alphabetically.
      var nFileIndex = 0;
      do
      {
         while (UltraEdit.document[nFileIndex].findReplace.find(sSearch))
         {
            sList += UltraEdit.document[nFileIndex].selection;
         }
         // Discard the selection in active file.
         UltraEdit.document[nFileIndex].key("HOME");
         // Prepare output for the next file.
         if (++nFileIndex < nCsvFileCnt)
         {
            sList += sLineTerm + UltraEdit.document[nFileIndex].path + sLineTerm + sHeader;
         }
      }
      while (nFileIndex < nCsvFileCnt);

      // Write into new file the results list and save the file.
      UltraEdit.activeDocument.write(sList);
      var sFileName = sName.substr(1,sName.length-2) + ".csv";
      UltraEdit.saveAs(sFileName);
      // Close the just saved file with the results of current run.
      UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
      UltraEdit.newFile();
   }
   // After processing all lines of first file close
   // the reaining not needed new file without saving it.
   UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
}