It is impossible to use a regular expression search to search for any string of any length which exists more than once in two consecutive paragraphs. Regular expression searches are based on clearly defined rules. Those vague requirements make it impossible to define those rules. I was wrong with this statement as fleggy demonstrates below.
You may use the following script. Please read the comments of the script and adjust value of variable
nMinimalEqualLength to your requirements.
Code: Select all
if (UltraEdit.document.length > 0) // Is any file opened?
{
// Define environment for this script. This script is designed to
// run a search from current position in active file to end of file.
UltraEdit.insertMode();
if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
var nMinimalEqualLength = 4;
var bDuplicateFound = false;
// Get current caret position in active file.
var nLine = UltraEdit.activeDocument.currentLineNum;
var nColumn = UltraEdit.activeDocument.currentColumnNum;
// Move caret to beginning of current line respectively to
// first non whitespace character in current line depending
// on configuration setting Home key always goes to column 1.
UltraEdit.activeDocument.key("HOME");
// Define the parameters to find two consecutive reference paragraphs
// using a case-sensitive UltraEdit regular expression search.
UltraEdit.ueReOn();
UltraEdit.activeDocument.findReplace.mode=0;
UltraEdit.activeDocument.findReplace.matchCase=true;
UltraEdit.activeDocument.findReplace.matchWord=false;
UltraEdit.activeDocument.findReplace.regExp=true;
UltraEdit.activeDocument.findReplace.searchDown=true;
if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
{
UltraEdit.activeDocument.findReplace.searchInColumn=false;
}
// Run this find for two consecutive reference paragraphs until either
// nothing found anymore up to end of file or duplicate data found at
// beginning of two found reference paragraphs.
while (UltraEdit.activeDocument.findReplace.find('<p class="ref">*</p>^r^n<p class="ref">*</p>'))
{
// Get from the two found reference paragraphs just the text without the
// tags <p class="ref"> and </p> and split the two lines up to two strings.
var asParagraphsText = UltraEdit.activeDocument.selection.replace(/<p class="ref">|<\/p>/g,"").split("\r\n");
// Run a case-sensitive character by character comparison in a
// loop from beginning of each paragraph text until either end of
// first found paragraph text or end of second found paragraph text
// is reached or the currently compared characters are not equal.
nCharIndex = 0;
while ((nCharIndex < asParagraphsText[0].length) && (nCharIndex < asParagraphsText[1].length))
{
if (asParagraphsText[0][nCharIndex] != asParagraphsText[1][nCharIndex]) break;
nCharIndex++;
}
// Are there at least so many equal characters at beginning of the
// found reference paragraphs as the defined at top of this script?
if (nCharIndex >= nMinimalEqualLength)
{
// Set caret in file to beginning of current line with second paragraph.
UltraEdit.activeDocument.key("HOME");
// Search case-sensitive for the equal string to select it and
// exit script. Note: ^ in search string is not escaped although
// it would be required to find a duplicate string containing ^.
UltraEdit.activeDocument.findReplace.regExp=false;
UltraEdit.activeDocument.findReplace.find(asParagraphsText[0].substring(0,nCharIndex));
bDuplicateFound = true;
break;
}
// There are not enough equal characters at beginning of the two
// found reference paragraphs. So move caret in file to beginning
// of current line with second paragraph and run regular expression
// find once again.
UltraEdit.activeDocument.key("HOME");
}
// Set caret to initial position if there could not be found any
// duplicate string with at least nMinimalEqualLength at beginning
// of two consecutive reference paragraphs.
if (!bDuplicateFound)
{
if (typeof(UltraEdit.activeDocumentIdx) == "undefined") nColumn++;
UltraEdit.activeDocument.gotoLine(nLine,nColumn);
}
}
It was tested on following data:
Code: Select all
<p class="ref">Craig, Robert Fenton. University of California, Los Angeles, Special Collections.</p>
<p class="ref">Craig, Robert Fenton. University of California, Los Angeles, Special Collections.</p>
<p class="ref">Craig, Robert Fenton. University of Oxford, Special Collections. 2014, pp. 20-13.</p>
<p class="ref">Craig, Robert Fenton. University of California, Los Angeles</p>
<p class="ref">Doe, Jone. University of Oxford, Special Collections. 2014, pp. 20-13.</p>
<p class="ref">Doe, Jane. University of California, Los Angeles, Special Collections.</p>
<p class="ref">Asterix, gaul</p>
<p class="ref">Asterix and Obelix, gauls</p>
<p class="ref">Miraculix, gaul</p>
The 1. script run from top of file selects entire string within the paragraph tags in line 2.
The 2. script run selects entire string within the paragraph tags in line 3.
The 3. script run selects "
Craig, Robert Fenton. University of " in line 4.
The 4. script run selects "
Doe, J" in line 6.
The 5. script run selects "
Asterix" in line 8.
The 6. script run results in canceling the selection and keeping caret positioned in line 8 column 23.