I'm creating a script to find all files that are missing the final line termination at the end of the file. These are both pipe-delimited and tab-delimited files. This script will 'eventually' become the core of a recursive call into a folder structure to do the following.
1) Open a file (read-only if necessary)
2) Go to the bottom line and check for a final "/r/n"
3) Select all 'complete' rows of text and read into an array
4) Get the length of the array
5) Persist the file name, the number of 'good' rows (array length), and whether "/r/n" is missing in the last row (as a CSV line) in a new tab.
6) Close the open file without saving.
The end result should be a CSV file that has all files in a folder structure with the number of lines and whether the final line termination is present. The 'core' script seems to be fine - as long as the file size is not too large. Files with row counts at 500,000 are no problem.
THE PROBLEM: large files (around 900,000 rows and larger) will cause UltraEdit to crash - sometimes creating a dump file, sometimes not.
Here is the 'core' script. I have intentionally stripped out all of the file-check conditional logic keep things down to the core functionality.
To check whether it's the size of the file itself causing the problem, I modified the selection pattern to grab a limited set of row (5-10) from the bottom of a 1+million-row file, and the script ran without complaining. So I'm guessing that it's either the undo buffer or the array buffer that's being overrun when doing a split that's so large.
BEAR IN MIND: I'm not committed to using this approach. I've toyed with the idea of doing a
or something similar to do a count of the first character of each line and 'manually' iterate through the lines in the selected range. But I can't seem to get that conditional logic to work. From what I read in the scripting commands page, the .regExp parameter in findReplace is boolean, but the error I get back seems to indicate that's not valid. My point is - I'm open to any approach that *works* reliably for large file sizes.
Any insight/help/pointers would be greatly appreciated. Thanks in advance!
1) Open a file (read-only if necessary)
2) Go to the bottom line and check for a final "/r/n"
3) Select all 'complete' rows of text and read into an array
4) Get the length of the array
5) Persist the file name, the number of 'good' rows (array length), and whether "/r/n" is missing in the last row (as a CSV line) in a new tab.
6) Close the open file without saving.
The end result should be a CSV file that has all files in a folder structure with the number of lines and whether the final line termination is present. The 'core' script seems to be fine - as long as the file size is not too large. Files with row counts at 500,000 are no problem.
THE PROBLEM: large files (around 900,000 rows and larger) will cause UltraEdit to crash - sometimes creating a dump file, sometimes not.
Here is the 'core' script. I have intentionally stripped out all of the file-check conditional logic keep things down to the core functionality.
Code: Select all
UltraEdit.insertMode();
if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
UltraEdit.activeDocument.hexOff();
UltraEdit.activeDocument.findReplace.searchDown=true;
UltraEdit.activeDocument.findReplace.matchCase=false;
UltraEdit.activeDocument.findReplace.matchWord=false;
var strings;
//variable which holds selection
var stringArray = new Array();
//create array to hold string values
var arrayLength = 0;
//array length -- asserting a value here seems to make no difference in the ability to parse a greater line count
var lineTerminator = "\r\n";
//line terminator character pattern
var rCount = 0;
UltraEdit.activeDocument.bottom();
// Is the last line of the file terminated with a line ending?
if (UltraEdit.activeDocument.isColNumGt(1)) {
// Count if there's a line at the bottom of the file without the \r\n terminators
rCount++;
UltraEdit.outputWindow.write("Found "+rCount+" bad row.");
// Go to beginning of last line to start selection from last line with good terminator
UltraEdit.activeDocument.cancelSelect();
UltraEdit.activeDocument.key("HOME");
}
UltraEdit.activeDocument.startSelect();
// Now select only 'full' lines - based on conditional cursor placement above
UltraEdit.activeDocument.selectToTop();
// Get selection
strings = UltraEdit.activeDocument.selection;
//split string at line terminator characters
stringArray = strings.split(lineTerminator);
// assign array length into a variable
arrayLength = stringArray.length
// make 'totalLines' the correct length by subtracting 1 from the array length. Dunno why this is required.
var totalLines = (arrayLength - 1);
UltraEdit.outputWindow.write("Found "+totalLines+" lines in array.")
BEAR IN MIND: I'm not committed to using this approach. I've toyed with the idea of doing a
Code: Select all
if (UltraEdit.activeDocument.findReplace.regExp("%[A-Za-z0-9]"))
Any insight/help/pointers would be greatly appreciated. Thanks in advance!