The script for
How to find and replace two consecutive link separated by a comma? was not already easy to code. But the script for this task was really a hard challenge. I needed more than 15 hours to code it mainly for thinking on how to handle all those various inputs and possible combinations of cross-references and produce a suitable output.
Here is the code of the script:
Code: Select all
function outputDebugInfo(sDebugInfo)
{
// UltraEdit.outputWindow.write(sDebugInfo);
}
if (UltraEdit.document.length > 0) // Is any file opened?
{
// Define environment for this script.
UltraEdit.insertMode();
UltraEdit.columnModeOff();
// Define from which file to read the formula identifiers.
var oFormulaDoc = UltraEdit.activeDocument;
var nRefIndex;
var nFormulaIndex;
var anFormulas = new Array();
// Select the entire file with the formula identifiers.
// There is nothing selected if the file is an empty file.
oFormulaDoc.selectAll();
if (oFormulaDoc.isSel())
{
// Get all formula identifiers with a single number and convert each
// number from string to integer. The formula identifier numbers are
// stored in a two dimensional array. On a single number first and
// second number are identical in the two columns in the current row.
var asFormulas = oFormulaDoc.selection.match(/<formula id="dqn\d+"/g);
if (asFormulas != null)
{
for (nFormulaIndex = 0; nFormulaIndex < asFormulas.length; nFormulaIndex++)
{
var nFormula = parseInt(asFormulas[nFormulaIndex].replace(/^.+dqn(\d+).$/,"$1"),10);
anFormulas.push([nFormula,nFormula]);
}
}
// Get all formula identifiers with a number range and convert
// the two numbers of each number range from string to integer.
// The FROM number is stored in first column of formula row and
// the TO number is stored in second column.
var asFormulas = oFormulaDoc.selection.match(/<formula id="dqn\d+-\d+"/g);
if (asFormulas != null)
{
for (nFormulaIndex = 0; nFormulaIndex < asFormulas.length; nFormulaIndex++)
{
var nFrom = parseInt(asFormulas[nFormulaIndex].replace(/^.+dqn(\d+)-.+$/,"$1"),10);
var nTo = parseInt(asFormulas[nFormulaIndex].replace(/^.+dqn\d+-(\d+).$/,"$1"),10);
anFormulas.push([nFrom,nTo]);
}
}
// Cancel the selection and move caret to top of file.
oFormulaDoc.top();
}
// The formula references can be only checked and updated
// if there was before at least 1 formula identifier found.
if (anFormulas.length > 0)
{
// Dumps formular identifiers table to output window.
// var_dump(anFormulas);
var nRefsFound = 0; // Counts the number of found formula reference sequences.
var nRefsModified = 0; // Counts the number of modified formula reference sequences.
// Define start tag with the attributes and end tag of a formula reference.
var sRefStart = '<xref ref-type="d-formula" rid="dqn';
var sRefEnd = '</xref>';
// Define some search and replace strings used in the main loop below.
var sFindRefs = sRefStart + '[\\d\\-]+">.+?' + sRefEnd + '(?:[\\s,-]*(?:–\\s*)?' +
sRefStart + '[\\d\\-]+">.+?' + sRefEnd + ')*';
outputDebugInfo("sFindRefs = " + sFindRefs);
var sReplRange = ')' + sRefEnd + '#' + sRefStart + '$1">($1';
outputDebugInfo("sReplRange = " + sReplRange);
var sRemoveRef = sRefStart + '\\d+">\\((\\d+)\\)' + sRefEnd;
outputDebugInfo("sRemoveRef = " + sRemoveRef);
var oRemoveRef = new RegExp(sRemoveRef,"g");
// A Perl regular expression find is used to find 1 or more formula
// references in a loop which are processed and if needed updated. The
// formula references are searched and processed always on active file.
UltraEdit.perlReOn();
UltraEdit.activeDocument.findReplace.mode=0;
UltraEdit.activeDocument.findReplace.matchCase=true;
UltraEdit.activeDocument.findReplace.matchWord=false;
UltraEdit.activeDocument.findReplace.regExp=true;
UltraEdit.activeDocument.findReplace.searchDown=true;
UltraEdit.activeDocument.findReplace.searchInColumn=false;
UltraEdit.activeDocument.top();
while (UltraEdit.activeDocument.findReplace.find(sFindRefs))
{
nRefsFound++;
var sReferencesFound = UltraEdit.activeDocument.selection;
outputDebugInfo("\nsReferencesFound = " + sReferencesFound);
// Replace - and – with or without surrounding whitespaces
// between > and < (tags) or between ) and ( (reference numbers)
// by a hash. Hash character is used as it does not exist anymore
// in the reference sequence string after this replace in contrast
// to dash character and # is no regular expression character.
var sRefModified = sReferencesFound.replace(/([)>])\s*(?:-|–)\s*([(<])/g,"$1#$2");
// Remove all whitespaces and commas between the tags.
sRefModified = sRefModified.replace(/>[\s,]+</g,"><");
// Replace already existing rid attribute values with a range
// by rid attribute value with a single reference number.
sRefModified = sRefModified.replace(/\d+-\d+\">\((\d+)\)/g,'$1">($1)');
// Convert a reference number range like (5)#(10) into a complete
// reference range like <xref ref-type="d-formula" rid="dqn5">
// (5)</xref>#<xref ref-type="d-formula" rid="dqn10">(10)</xref>
sRefModified = sRefModified.replace(/\)#\((\d+)/g,sReplRange);
outputDebugInfo("sRefModified (1) = " + sRefModified);
// Remove the reference start and end tags to get just the reference
// single numbers and reference number ranges separated by comma.
sRefModified = sRefModified.replace(oRemoveRef,"$1,");
// Insert a comma after each hash character used to mark a range.
sRefModified = sRefModified.replace(/#/g,"#,");
// Remove the comma at end of the string.
sRefModified = sRefModified.substr(0,sRefModified.length-1);
outputDebugInfo("sRefModified (2) = " + sRefModified);
// Split up the reference numbers and reference ranges
// for cross-checking them with the formula identifiers.
var asReferences = sRefModified.split(",");
// Convert the number strings into integer numbers and replace
// also the ranges by appropriate sequences of integer numbers.
var nRefNumber;
var anRefNumbers = new Array();
for (nRefIndex = 0; nRefIndex < asReferences.length; nRefIndex++)
{
if (asReferences[nRefIndex] != '#')
{
nRefNumber = parseInt(asReferences[nRefIndex],10);
anRefNumbers.push(nRefNumber);
}
else
{
var nEndNumber = parseInt(asReferences[++nRefIndex],10);
while (nRefNumber < nEndNumber) anRefNumbers.push(++nRefNumber);
}
}
outputDebugInfo("anRefNumbers = " + anRefNumbers.join(","));
// Append to the array the value 0 to end any sequence always
// with in the next loop processing the formula references.
anRefNumbers.push(0);
/* Rebuilt the reference(s) by putting together as much formula
references as possible.
The base rule for putting a sequence of references together
is: The reference numbers are one after the other with each
reference number being +1 of previous reference number and
no gap between.
There are three types of ranges:
1. A reference number range within a single reference because the
formula identifiers are already put together. E.g. there is
<formula id="dqn2-6">...</formula>
and the reference is
<xref ref-type="d-formula" rid="dqn2-6">(3)–(5)</xref>
It is possible for this type of a range that end number is
just start number plus 1, for example
<xref ref-type="d-formula" rid="dqn2-6">(5)–(6)</xref>
2. A formula reference range because the formula identifiers
are still separated. For example there are
<formula id="dqn7">...</formula>
<formula id="dqn8">...</formula>
<formula id="dqn9">...</formula>
and the matching reference range is
<xref ref-type="d-formula" rid="dqn7">(7)</xref>
<xref ref-type="d-formula" rid="dqn8"/>–
<xref ref-type="d-formula" rid="dqn9">(9)</xref>
3. The third range type is a combination of first and second like
<xref ref-type="d-formula" rid="dqn2-6">(3)</xref>
<xref ref-type="d-formula" rid="dqn7"/>
<xref ref-type="d-formula" rid="dqn8"/>–
<xref ref-type="d-formula" rid="dqn9">(9)</xref>
But not put together with a dash is a sequence of exactly
2 references with different formula identifiers for first
and second reference like
<formula id="dqn7">...</formula>
<formula id="dqn8">...</formula>
referenced with
<xref ref-type="d-formula" rid="dqn7">(7)</xref>,
<xref ref-type="d-formula" rid="dqn8">(8)</xref>
*/
var nFormulaStart = -1;
var nFormulaEnd = -1;
var nReferenceStart = -1;
var nReferenceEnd = -1;
var nPreviousNumber = -1;
nRefIndex = 0;
sRefModified = "";
while(true) // This loop is exited when a reference number
{ // with the value 0 is loaded from the array.
var bFormulaFound = false;
nRefNumber = anRefNumbers[nRefIndex];
if(nRefNumber > 0)
{
// Find the matching formula identifier for this reference.
nFormulaIndex = 0;
do
{
if ((nRefNumber >= anFormulas[nFormulaIndex][0]) &&
(nRefNumber <= anFormulas[nFormulaIndex][1]))
{
bFormulaFound = true;
break;
}
}
while(++nFormulaIndex < anFormulas.length);
}
// Is there no formula matching the reference number or is the
// current reference number not +1 of previous reference number
// and the reference is not the first one of a possible range?
if((!bFormulaFound || (nRefNumber != (nPreviousNumber+1))) && (nPreviousNumber > 0))
{
// Append single or start formula reference.
if (nFormulaStart >= 0)
{
// Append a comma and a space if the rebuilt
// reference string is not empty anymore.
if (sRefModified.length != 0) sRefModified += ", ";
sRefModified += sRefStart + anFormulas[nFormulaStart][0];
if (anFormulas[nFormulaStart][0] != anFormulas[nFormulaStart][1])
{
sRefModified += "-" + anFormulas[nFormulaStart][1];
}
sRefModified += '">(' + nReferenceStart + ')'
// Is there a reference range processing in progress?
if(nFormulaEnd >= 0)
{
// Is this a reference range of first type?
if (nFormulaStart == nFormulaEnd)
{
sRefModified += '–(' + nReferenceEnd + ')';
}
// Are there only 2 references and they don't reference the
// same formula identifier, interpret them separated with
// a comma and not as formula reference range with a dash.
else if ((nReferenceStart+1) == nReferenceEnd)
{
sRefModified += sRefEnd + ", " + sRefStart + anFormulas[nFormulaEnd][0];
if (anFormulas[nFormulaEnd][0] != anFormulas[nFormulaEnd][1])
{
sRefModified += "-" + anFormulas[nFormulaEnd][1];
}
sRefModified += '">(' + nReferenceEnd + ')';
}
else // More than 2 references with different
{ // start and end formula identifiers.
sRefModified += sRefEnd;
// Append the empty formula references between start and
// end reference number depending on existing formulas.
var nSequenceRef = anFormulas[nFormulaStart][1];
while (++nSequenceRef < anFormulas[nFormulaEnd][0])
{
for(var nSequenceIndex = 0; nSequenceIndex < anFormulas.length; nSequenceIndex++)
{
if ((nSequenceRef >= anFormulas[nSequenceIndex][0]) &&
(nSequenceRef <= anFormulas[nSequenceIndex][1]))
{
sRefModified += sRefStart + anFormulas[nSequenceIndex][0];
if (anFormulas[nSequenceIndex][0] != anFormulas[nSequenceIndex][1])
{
nSequenceRef = anFormulas[nSequenceIndex][1];
sRefModified += "-" + nSequenceRef;
}
sRefModified += '"/>';
break;
}
}
}
// Append end formula reference.
sRefModified += '–' + sRefStart + anFormulas[nFormulaEnd][0];
if (anFormulas[nFormulaEnd][0] != anFormulas[nFormulaEnd][1])
{
sRefModified += "-" + anFormulas[nFormulaEnd][1];
}
sRefModified += '">(' + nReferenceEnd + ')';
}
nFormulaEnd = -1;
nReferenceEnd = -1;
}
nFormulaStart = -1;
nReferenceStart = -1;
sRefModified += sRefEnd;
}
}
// Are all reference numbers in array processed?
if (nRefNumber < 1) break;
// Is there no formula for the current formula reference number?
if (!bFormulaFound)
{
// This case should never occur, but must be nevertheless
// taken into account to avoid corruption of file content.
// Append a comma and a space if the rebuilt
// reference string is not empty anymore.
if (sRefModified.length != 0) sRefModified += ", ";
// Append a formula reference like when the formula would really exist.
sRefModified += sRefStart + nRefNumber + '">(' + nRefNumber + ')' + sRefEnd;
nPreviousNumber = -1;
// Output a warning message to output window and make the output
// window visible if not already visible on running the script.
if (UltraEdit.outputWindow.visible == false)
{
UltraEdit.outputWindow.showWindow(true);
}
UltraEdit.outputWindow.write("WARNING: Found no formula for formula reference " + nRefNumber);
}
else
{
// Is the current reference number +1 of previous number?
if (++nPreviousNumber == nRefNumber)
{
nReferenceEnd = nRefNumber;
nFormulaEnd = nFormulaIndex;
}
else // First reference of a possible new range.
{
nPreviousNumber = nRefNumber;
nReferenceStart = nRefNumber;
nFormulaStart = nFormulaIndex;
}
}
nRefIndex++; // Next reference from array of references.
}
if (sRefModified != sReferencesFound)
{
nRefsModified++;
// Overwrite the found reference sequence by rebuilt string.
UltraEdit.activeDocument.write(sRefModified);
outputDebugInfo("sRefModified (3) = " + sRefModified);
}
}
UltraEdit.activeDocument.top();
UltraEdit.messageBox("Number of reference sequences found / modified: " + nRefsFound + " / " + nRefsModified);
}
}
Please study the script line by line from top to bottom and ask if something is unclear. I'm quite sure this is the last script I wrote for your company which takes more than 2 hours to code. I suggest that your company hires a programmer for scripting tasks like this one. I'm sure this is a full time job in your company.
Line 5 of the output file produced from input file is different than posted above. Instead of creating
Code: Select all
<xref ref-type="d-formula" rid="dqn2-6">(3)</xref><xref ref-type="d-formula" rid="dqn2-6">(4)</xref><xref ref-type="d-formula" rid="dqn2-6">(5)</xref>
the script creates
Code: Select all
<xref ref-type="d-formula" rid="dqn2-6">(3)–(5)</xref>
The explanation for this merging behavior can be found in the large block comment in the script which describes the 3 different types of cross-reference ranges handled by the script.
Uncomment the line in function
outputDebugInfo in third line of the script if you want to see in output window what is going on script execution. You could also uncomment the line
var_dump(anFormulas); to see the values of the two dimensional array which holds the identifiers from the formulas/equations as integer numbers.
In case of file with the formulas/equations (FileA) is different to the file with the formula references/links (FileB), first modify in the script the line
Code: Select all
var oFormulaDoc = UltraEdit.activeDocument;
to
Code: Select all
var oFormulaDoc = UltraEdit.document[0];
Save the script file, close it and add the script to the
Script List.
Then open first
FileA and second
FileB and run the script from the
Script List with
FileB as active file.