In other words you want to
search in large data file for all strings in smaller list file and want reported which strings from list file are not found in data file.
A file comparison tool like
UltraCompare can't be used for this task except the strings are sorted alphabetically in both files. And even in this case with lots of lines with very similar data lines the comparison result could be wrong as the comparison tool does not know that it should compare always only entire lines. A text comparison tool is not designed for making the job of a database application used usually for such tasks.
A text editor like
UltraEdit is also not designed for database tasks like this one. But UltraEdit can open any text file of any size and has built-in scripting support making it possible to code a small script for this "check each line from list file against each line in data file" task.
Requirements for script execution in
UltraEdit:
- First opened file must be the large data file.
- Second opened file must be the small list file.
- Other files are ignored by the script.
Third file could be the script file itself with the code posted below saved as ASCII/ANSI file with DOS line terminators with a name like
FindUniqueLinesInList.js and executed with clicking on
Run Active Script in menu
Scripting.
A new file with all lines from second file not found in first file is created only if there are lines from list file not found in data file at all. Otherwise a message box is displayed with the information that all object identifiers in list file were found in the data file.
Code: Select all
if (UltraEdit.document.length >= 2) // Are at least 2 files opened?
{
// Define the environment for the script.
UltraEdit.insertMode();
if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
UltraEdit.perlReOn();
// The first opened file - most left in open file tabs bar - must
// be the large data file with the millions of rows (lines).
// Define the parameter for the case sensitive Perl regular expression
// finds executed in the loop below for each line in list file.
UltraEdit.document[0].findReplace.mode=0;
UltraEdit.document[0].findReplace.matchCase=true;
UltraEdit.document[0].findReplace.matchWord=false;
UltraEdit.document[0].findReplace.regExp=true;
UltraEdit.document[0].findReplace.searchDown=true;
UltraEdit.document[0].findReplace.searchInColumn=false;
// Move caret in data file to top of file.
UltraEdit.document[0].top();
// The second opened file must be the list file containing the lines
// to search for in data file. It must be small enough to load it
// completely into memory as an array of strings for this script.
// This file is made active which avoids display updates on first file
// if document windows are displayed maximized on script start. Frequent
// display updates result in a much longer script execution time.
UltraEdit.document[1].setActive();
UltraEdit.document[1].selectAll();
if (UltraEdit.document[1].isSel())
{
var sLineTerm;
if (UltraEdit.document[1].lineTerminator <= 0) sLineTerm = "\r\n";
else if (UltraEdit.document[1].lineTerminator == 1) sLineTerm = "\n";
else sLineTerm = "\r";
// Get the selected lines into as an array of strings.
var asSearchData = UltraEdit.document[1].selection.split(sLineTerm);
UltraEdit.document[1].top(); // Just for discarding the selection.
// The finds in first file are done with the strings from list file
// in reverse order to make it easy to remove all strings found in
// data file from the array.
nDataIndex = asSearchData.length;
while(nDataIndex > 0)
{
nDataIndex--;
// Is the next search string empty, remove it from array.
if (asSearchData[nDataIndex].length == 0)
{
asSearchData.splice(nDataIndex,1);
}
else
{ // A Perl regular expression is used to make sure to search
// always for entire lines and not just substrings. This
// requires data strings which do not contain characters with
// special meaning in Perl regular expression search strings.
sFindExp = "^" + asSearchData[nDataIndex] + "$";
if (UltraEdit.document[0].findReplace.find(sFindExp))
{
// This line is found in data file, remove it from list.
asSearchData.splice(nDataIndex,1);
UltraEdit.document[0].top();
}
}
}
// Are there lines from list file not found in data file?
if (asSearchData.length)
{
// Append an empty string to have finally the last
// line in new file also with a line termination.
asSearchData.push("");
// Create a new file and determine type of line termination.
UltraEdit.newFile();
if (UltraEdit.activeDocument.lineTerminator <= 0) sLineTerm = "\r\n";
else if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
else sLineTerm = "\r";
// Write all not found lines into new file line by line as one block.
UltraEdit.activeDocument.write(asSearchData.join(sLineTerm));
UltraEdit.activeDocument.top();
}
else
{
UltraEdit.messageBox("All object identifiers from list file found in data file.","Object Identifiers Check");
}
}
}