User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

Help with writing and running scripts
7 posts Page 1 of 1
Hi everyone

How to create duplicate list of duplicate id from files?

If there is no duplicate id in files, then insert file name in href.

Process step by step like:

  1. Enter file path (put in message box).
  2. Enter file extension (put in message box).
  3. Replace in Files href="[^<>\r\n"]*?#([^<>\r\n"]*?)" with href="@@@.xhtml#$1".
  4. Search for id=".+?" in all files in path and create a file with a list of duplicate identifiers in same path.
  5. If there is no duplicate id, search for (?<= href="@@@.xhtml#)([^<>\r\n"]*?)(?=") and replace the hypertext reference to the id with the reference to file name without path containing the id and the id.
This task description does not make sense to me.

Why should in all files of a specified folder matching specified file extension pattern all existing references to local anchors/identifiers in same or another file be replaced by a general file name string @@@.xhtml#anchor/identifier? Doing this results in losing cross-references between files. This is in my point of view a very bad idea.

And why should it not be allowed having for example <p id="example1">Example 1: ....</p> in a file and reference this example multiple times with: See <a href="#example1">example 1</a>.

An anchor name or identifier must be defined unique within a file, but not defined unique within all files. File A can have id="example1" and file B can also have id="example1". It is a common technique to give a section like a menu or header or footer an identifier unique within each file and define appropriate CSS properties for this identifier in a *.css file embedded by all HTML/XHTML files of a project.

And references to anchor names or identifiers can of course exist as often as needed. References must not be unique.

What would make much more sense is running a Find in Files with Perl regular expression search string \b(?:id|name)=(["']).+?\1 and reformat the results file with several regular expression replaces to have finally only duplicate identifier strings found in same file in the list. Attention: A line with an anchor name or identifier found in a file listed in Find in Files results can have 1 or more anchor names/identifiers in the line.
Best regards from Austria
Hi Mofi,

Thanks for comments.

I want to make an automation script, which is enter the file name, which is set in the reference identifier.

I know
An anchor name or identifier must be defined unique within a file, but not defined unique within all files. File A can have id="example1" and file B can also have id="example1". It is a common technique to give a section like a menu or header or footer an identifier unique within each file and define appropriate CSS properties for this identifier in a *.css file embedded by all HTML/XHTML files of a project

But I am not using same name/identifier for all files.

Like
Code: Select all
chapter1.xhtml identifier "ch1" and chapter2.xhtml identifier "ch2" not "ch1"
and not like
Code: Select all
chapter1.xhtml identifier "ch1" and chapter2.xhtml identifier "ch1"

Like
Code: Select all
chapter1.xhtml footnote identifier "fn1_1","fn1_2" and chapter2.xhtml footnote identifier "fn2_1","fn2_2"
and not like
Code: Select all
chapter1.xhtml footnote identifier "fn1","fn2" and chapter2.xhtml footnote identifier "fn1","fn2"

I want to replace all #anchor/identifier with @@@.xhtml#anchor/identifier because there can be inserted wrong file name in the #anchor/identifier.

Like
Code: Select all
href="#ch1" or href="##.html#ch1" or href="chapter2.xhtml#ch1" or href="chapter1.xhtml#ch1"
in several files.

I want to re inserted or re correction file name for all #anchor/identifier in all files like
Code: Select all
href="chapter1.xhtml#ch1" or href="chapter1.xhtml#ch1" or href="chapter1.xhtml#ch1" or href="chapter1.xhtml#ch1" in several files

Accepted automation script should be

  • If using same name/identifier for several files in one folder then create a duplicate list and stop the script.
  • Then manually correction duplicate identifier.
  • After correction identifier for all files base duplicate.list.
  • Then run script for insert file name in all #anchor/identifier.
Please Mofi, can you provide me one or two script for duplicate list and auto insert file name search all #anchor/identifier for all files in one folder based on my attachment Input and Output.
Below is the script code written in about 240 minutes. It makes the Replace in Files and it searches for all identifiers in all files and check for duplicates.

The list of duplicate identifiers is not written into a file but output into the output window. That makes it possible to double click on the line to open the file and correct the identifier or use the output window commands Next Message and Previous Message via context menu of output window or even better by their hotkeys Ctrl+Shift+Down Arrow and Ctrl+Shift+Up Arrow from within current document window.

The script stores the entered directory and file extension in user clipboard 9 in case of duplicate identifiers found and reload them from user clipboard 9 on next run to avoid the need to enter same directory path and file extension more than once until the entire job is done.

Code: Select all
// Anything else than an empty string skips the appropriate user input query.
var g_sDirectory = "";
var g_sExtension = "";
var g_asIDsNames;

// === Function GetDirectoryExtension ========================================

function GetDirectoryExtension ()
{
   // If there is no directory and no file extension predefined, check if
   // user clipboard 9 contains directory and file extension entered by the
   // user from a previous run of this script which found duplicate ids.
   if ((g_sDirectory.length == 0) && (g_sExtension.length == 0))
   {
      var nActiveClipboardIndex = UltraEdit.clipboardIdx;
      UltraEdit.selectClipboard(9);
      if (UltraEdit.clipboardContent.substr(0,21) == "DuplicateIdDirectory=")
      {
         g_sDirectory = UltraEdit.clipboardContent.replace(/^DuplicateIdDirectory=(.+?)\r\n.+\r\n$/,"$1");
         g_sExtension = UltraEdit.clipboardContent.replace(/^.+?\r\nDuplicateIdExtension=(.+?)\r\n$/,"$1");
         UltraEdit.clearClipboard();
      }
      UltraEdit.selectClipboard(nActiveClipboardIndex);
   }

   // The code below makes sure to have at least 1 file opened in UE/UES. This
   // is necessary for UE for Windows < v22.20.0.37 and UES < v15.30.0.12 as
   // otherwise the function getString() does not return the string entered by
   // the user of the script.
   var bNewFileCreated = false;
   if ((UltraEdit.document.length < 1) && ((g_sDirectory.length == 0) || (g_sExtension.length == 0)))
   {
      bNewFileCreated = true;
      UltraEdit.newFile();
   }

   while (g_sDirectory.length == 0)
   {
      g_sDirectory = UltraEdit.getString("Please enter directory path:",1);
   }

   // There are lots of Windows users not knowing that the directory separator
   // on Windows is \ and not /. So make sure the directory path has backslashes.
   g_sDirectory = g_sDirectory.replace(/\//g,"\\");
   // And the directory path must end with a backslash to be correct interpreted
   // later on Find in Files and Replace in Files.
   if (g_sDirectory[g_sDirectory.length-1] != '\\') g_sDirectory += '\\';

   while (g_sExtension.length == 0)
   {
      g_sExtension = UltraEdit.getString("Please enter file extension:",1);
   }

   // The finally used file extension must start with an asterisk.
   if (g_sExtension[0] != '*') g_sExtension = '*' + g_sExtension;
   // And there must be a point after the asterisk.
   if (g_sExtension[1] != '.') g_sExtension = "*." + g_sExtension.substr(1);

   // Close active file if there was one created before just
   // for entering the directory path and the file extension.
   if (bNewFileCreated) UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
}


// === Function GetIdentifiersList ===========================================

function GetIdentifiersList ()
{
   UltraEdit.outputWindow.clear();

   UltraEdit.perlReOn();
   UltraEdit.frInFiles.searchInFilesTypes=g_sExtension;
   UltraEdit.frInFiles.directoryStart=g_sDirectory;
   UltraEdit.frInFiles.openMatchingFiles=false;
   UltraEdit.frInFiles.ignoreHiddenSubs=true;
   UltraEdit.frInFiles.filesToSearch=0;
   UltraEdit.frInFiles.useEncoding=true;
   UltraEdit.frInFiles.encoding=65001;  // The files are UTF-8 encoded!
   UltraEdit.frInFiles.useOutputWindow=false;
   UltraEdit.frInFiles.matchCase=true;
   UltraEdit.frInFiles.matchWord=false;
   UltraEdit.frInFiles.preserveCase=false;
   UltraEdit.frInFiles.searchSubs=true;
   UltraEdit.frInFiles.regExp=false;
   UltraEdit.frInFiles.find('id="');

   UltraEdit.activeDocument.findReplace.mode=0;
   UltraEdit.activeDocument.findReplace.matchCase=true;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=false;
   UltraEdit.activeDocument.findReplace.searchDown=false;
   UltraEdit.activeDocument.findReplace.searchInColumn=false;
   if (UltraEdit.activeDocument.findReplace.find(" 0 time(s)."))
   {
      // There was no id attribute found at all. No duplicate check needed.
      UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
      return true;
   }
   if (UltraEdit.activeDocument.findReplace.find("Search complete, found"))
   {
      UltraEdit.activeDocument.gotoLine(0,1);
      UltraEdit.activeDocument.selectToBottom();
      UltraEdit.activeDocument.deleteText();
   }

   UltraEdit.activeDocument.top();
   UltraEdit.activeDocument.findReplace.regExp=true;
   UltraEdit.activeDocument.findReplace.searchDown=true;
   UltraEdit.activeDocument.findReplace.preserveCase=false;
   UltraEdit.activeDocument.findReplace.replaceAll=true;
   UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
   UltraEdit.activeDocument.findReplace.replace("^(?:(?:-----|F[io]u?nd).+\\r\\n)+","");

   UltraEdit.activeDocument.top();
   // Remove the characters between file name with line number and first id attribute.
   UltraEdit.activeDocument.findReplace.replace('^(.+?\\):).+?(id=".+)$',"\\1 \\2");
   // Split up lines with multiple id attributes as often as needed.
   while (UltraEdit.activeDocument.findReplace.replace('^(.+?\\):)( id=".+?").+?id=',"\\1\\2\\r\\n\\1 id="))
   {
      UltraEdit.activeDocument.top();
   }
   // Remomve the characters after the id attribute up to end of line.
   UltraEdit.activeDocument.findReplace.replace('^(.+?id=".+?").+$',"\\1");

   // Exchange id and file name with line number in each line.
   UltraEdit.activeDocument.findReplace.replace('^(.+?\\):) (id=.+)$',"\\2 \\1");

   // Sort the lines according to identifier string.
   UltraEdit.activeDocument.sortAsc(0,false,false,1,-1);

   // Mark all lines where next line has same identifier.
   var bDuplicatesFound = false;
   while (UltraEdit.activeDocument.findReplace.find('^(id=".+?").+\\r\\n(?:\\1.+\\r\\n)+(?!\\1)'))
   {
      UltraEdit.activeDocument.findReplace.selectText=true;
      UltraEdit.activeDocument.findReplace.replace("^id","#id");
      UltraEdit.activeDocument.findReplace.selectText=false;
      UltraEdit.activeDocument.key("END");
      bDuplicatesFound = true;
   }

   UltraEdit.outputWindow.clear();
   if (UltraEdit.outputWindow.visible == false)
   {
      UltraEdit.outputWindow.showWindow(true);
   }

   if(!bDuplicatesFound)
   {
      UltraEdit.activeDocument.selectAll()
      sIDsNames = UltraEdit.activeDocument.selection;
      UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
      // Remove file paths and line numbers.
      sIDsNames = sIDsNames.replace(/\" .+\\(.+)\(\d+\):/g,"\" $1");
      // Split large string up into individual lines.
      g_asIDsNames = sIDsNames.split("\r\n");
      // Remove the last element which is an empty string.
      g_asIDsNames.pop();
      return true;
   }

   // Remember in user clipboard 9 the directory and file extension for
   // next run of this script in same instance of UltraEdit/UEStudio.
   var nActiveClipboardIndex = UltraEdit.clipboardIdx;
   UltraEdit.selectClipboard(9);
   UltraEdit.clipboardContent = "DuplicateIdDirectory=" + g_sDirectory +
                            "\r\nDuplicateIdExtension=" + g_sExtension +
                            "\r\n";
   UltraEdit.selectClipboard(nActiveClipboardIndex);

   // Remove all lines with unique identifier.
   UltraEdit.activeDocument.top();
   UltraEdit.activeDocument.findReplace.replace("^(?:[^#].+\\r\\n)+","");

   // Insert an empty line after each block with duplicate identifiers.
   UltraEdit.activeDocument.findReplace.replace('^(#id=".+?")( .+\\r\\n)(?!\\1)',"\\1\\2\\r\\n");

   // Exchange identifier and file name with line number once again.
   UltraEdit.activeDocument.findReplace.replace('^#(id=".+?") (.+)$',"\\2 \\1");

   // Copy the list into output window for manual processing.
   UltraEdit.activeDocument.selectAll();
   var sList = UltraEdit.activeDocument.selection.replace(/\r/g,"");
   sList = "The duplicate identifiers are:\n\n" + sList.replace(/\n+$/,"");
   UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
   UltraEdit.outputWindow.showStatus=false;
   UltraEdit.outputWindow.write(sList);
   return false;
}

// === Function GetListOfFiles ===============================================

// Tailor made version of http://www.ultraedit.com/files/scripts/GetListOfFiles.js

function GetListOfFiles ()
{
   // Run a Find In Files with an empty search string to get the
   // list of files stored in the specified directory in an edit
   // window and delete the last line with the summary info.
   UltraEdit.frInFiles.directoryStart=g_sDirectory;
   UltraEdit.frInFiles.filesToSearch=0;
   UltraEdit.frInFiles.matchCase=false;
   UltraEdit.frInFiles.matchWord=false;
   UltraEdit.frInFiles.regExp=false;
   UltraEdit.frInFiles.searchInFilesTypes=g_sExtension;
   UltraEdit.frInFiles.searchSubs=true;
   UltraEdit.frInFiles.unicodeSearch=false;
   UltraEdit.frInFiles.useOutputWindow=false;
   if (typeof(UltraEdit.frInFiles.openMatchingFiles) == "boolean")
   {
      UltraEdit.frInFiles.openMatchingFiles=false;
   }
   UltraEdit.frInFiles.find("");

   // Search for the summary info at bottom of the results and delete it.
   UltraEdit.activeDocument.findReplace.mode=0;
   UltraEdit.activeDocument.findReplace.matchCase=false;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=false;
   UltraEdit.activeDocument.findReplace.searchDown=false;
   if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
   {
      UltraEdit.activeDocument.findReplace.searchInColumn=false;
   }
   if (UltraEdit.activeDocument.findReplace.find("Search complete, found"))
   {
      UltraEdit.activeDocument.deleteLine();
   }

   // Convert the file into an ASCII text file for better handling of the
   // file names. Unicode file names are not supported by this function.
   UltraEdit.activeDocument.top();
   UltraEdit.activeDocument.unicodeToASCII();
}

// === Function GetNameOfFile ================================================

// Tailor made version of function GetNameOfFile in script file
// http://www.ultraedit.com/files/scripts/FileNameFunctions.js

function GetNameOfFile (sFullFileName)
{
   // Search for the last backslash which is used normally
   // as directory/file delimiter on Windows platforms.
   var nLastDirDelim = sFullFileName.lastIndexOf("\\");

   // The name of the file is everything from string index after the
   // last directory delimiter to end of the full name of the file.
   nLastDirDelim++;
   var sNameOfFile = sFullFileName.substring(nLastDirDelim);
   return sNameOfFile;
}

// === Function ReplaceReferences ============================================

function ReplaceReferences ()
{
   UltraEdit.outputWindow.clear();

   UltraEdit.perlReOn();
   UltraEdit.frInFiles.searchInFilesTypes=g_sExtension;
   UltraEdit.frInFiles.directoryStart=g_sDirectory;
   UltraEdit.frInFiles.openMatchingFiles=false;
   UltraEdit.frInFiles.ignoreHiddenSubs=true;
   UltraEdit.frInFiles.filesToSearch=0;
   UltraEdit.frInFiles.useEncoding=true;
   UltraEdit.frInFiles.encoding=65001;  // The files are UTF-8 encoded!
   UltraEdit.frInFiles.logChanges=true;
   UltraEdit.frInFiles.matchCase=true;
   UltraEdit.frInFiles.matchWord=false;
   UltraEdit.frInFiles.preserveCase=false;
   UltraEdit.frInFiles.searchSubs=true;
   UltraEdit.frInFiles.regExp=true;
   UltraEdit.frInFiles.replace('href="[^<>\\r\\n"]*?#([^<>\\r\\n"]*?)"','href="@@@.xhtml#\\1"');

   // Copy Replace in Files results from output window into user clipboard 9.
   var nActiveClipboardIndex = UltraEdit.clipboardIdx;
   UltraEdit.selectClipboard(9);
   UltraEdit.outputWindow.copy();

   // Was there any reference replaced at all?
   var bReplaceResult = true;
   if (UltraEdit.clipboardContent.indexOf("0 items replaced") >= 0)
   {
      // No! Show output window with Replace in Files results and
      // exit the script as there is nothing to do for this script.
      bReplaceResult = false;
      UltraEdit.outputWindow.showStatus=false;
      UltraEdit.outputWindow.write("");
      UltraEdit.outputWindow.write("Searched for " + g_sExtension + " in " + g_sDirectory);
      if (UltraEdit.outputWindow.visible == false)
      {
         UltraEdit.outputWindow.showWindow(true);
      }
   }

   // Clear user clipboard 9 and select previously active clipboard.
   UltraEdit.clearClipboard();
   UltraEdit.selectClipboard(nActiveClipboardIndex);
   return bReplaceResult;
}

// === Main routine of the script ============================================

// Get directory path and file extension from user clipboard 9 stored there
// from a previous script run or from user running the script if not already
// predefined at top of this script.
GetDirectoryExtension();

// Replace all existing hypertext refrences with a local anchor in reference
// string by a template reference in all files in specified directory (tree)
// matching the specified file extension.
if (ReplaceReferences())   // Was any reference replaced at all?
{
   // Get all id attributes from all files, sort them and check for duplicates.
   if (GetIdentifiersList())
   {
      // No duplicate identifiers found. Replace all placeholder references
      // in all files by the correct references if the referenced identifier
      // exists in any file.
      var nActiveClipboardIndex = UltraEdit.clipboardIdx;
      UltraEdit.selectClipboard(9);

      UltraEdit.frInFiles.directoryStart=g_sDirectory;
      UltraEdit.frInFiles.openMatchingFiles=false;
      UltraEdit.frInFiles.ignoreHiddenSubs=true;
      UltraEdit.frInFiles.filesToSearch=0;
      UltraEdit.frInFiles.useEncoding=true;
      UltraEdit.frInFiles.encoding=65001;  // The files are UTF-8 encoded!
      UltraEdit.frInFiles.logChanges=true;
      UltraEdit.frInFiles.matchCase=true;
      UltraEdit.frInFiles.matchWord=false;
      UltraEdit.frInFiles.preserveCase=false;
      UltraEdit.frInFiles.searchSubs=true;
      UltraEdit.frInFiles.regExp=false;
      UltraEdit.frInFiles.searchInFilesTypes=g_sExtension;

      var nTotalReplaces = 0;
      var nIndexID = 0;
      var nCountID = g_asIDsNames.length;

      while(nIndexID < g_asIDsNames.length)
      {
         var sSearch  = g_asIDsNames[nIndexID].replace(/id="(.+?)".*$/,'"@@@.xhtml#$1"');
         var sReplace = g_asIDsNames[nIndexID].replace(/id="(.+?)" (.*)$/,'"$2#$1"');
         UltraEdit.frInFiles.replace(sSearch,sReplace);

         // Copy Replace in Files results from output window into user clipboard 9.
         UltraEdit.outputWindow.copy();

         // Get total number of replaces for current identifier.
         var sReplacedItems = UltraEdit.clipboardContent.replace(/[\s\S]*(\d+) items replaced in \d+ files[\s\S]+/,"$1");
         var nReplacedItems = parseInt(sReplacedItems,10);

         if(nReplacedItems)
         {
            nTotalReplaces += nReplacedItems;
            g_asIDsNames.splice(nIndexID,1);
         }
         else
         {
            nIndexID++;
         }
         UltraEdit.outputWindow.clear();
      }

      UltraEdit.frInFiles.useOutputWindow=true;
      UltraEdit.frInFiles.find("@@@.xhtml");
      UltraEdit.outputWindow.copy();
      // Search complete, found '@@@.xhtml' 1200 time(s). (5 file(s)).

      // Get total number of placeholder references not updated.
      var sFindSummary = UltraEdit.clipboardContent.replace(/[\s\S]*Search complete,.+.xhtml\' (\d+) time.* \((\d+) file[\s\S]+/,"$1 $2");
      var sNotReplaced = sFindSummary.replace(/^(\d+) .+$/,"$1");
      var sNoFiles = sFindSummary.replace(/^\d+ (\d+)$/,"$1");

      // Get list of all file names with path.
      GetListOfFiles();
      UltraEdit.activeDocument.selectAll();
      var asFileNames = UltraEdit.activeDocument.selection.split("\r\n");
      UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
      while(asFileNames[asFileNames.length-1] == "") asFileNames.pop();

      // Run on each file a Replace in Files to replace "filename.xhtml# by "#.
      UltraEdit.frInFiles.directoryStart="";
      UltraEdit.frInFiles.searchSubs=false;

      var nFileCount = asFileNames.length;
      for (var nFileName = 0; nFileName < nFileCount; nFileName++)
      {
         UltraEdit.frInFiles.searchInFilesTypes = asFileNames[nFileName];
         UltraEdit.frInFiles.replace('"' + GetNameOfFile(asFileNames[nFileName]) + '#','"#');
      }

      UltraEdit.clearClipboard();
      UltraEdit.selectClipboard(nActiveClipboardIndex);

      UltraEdit.outputWindow.clear();
      UltraEdit.outputWindow.showWindow(true);

      // Write to output window some information about update process.
      UltraEdit.outputWindow.write("Processed " + nFileCount.toString(10) + " " + g_sExtension + " file" +
                                   ((nFileCount != 1) ? "s" : "") + " in directory: " + g_sDirectory);

      UltraEdit.outputWindow.write("Found " + nCountID.toString(10) + " unique identifier" + ((nCountID != 1) ? "s." : "."));

      var nFoundIDs = nCountID - g_asIDsNames.length;
      UltraEdit.outputWindow.write("Updated " + nFoundIDs.toString(10) + " identifier reference" + ((nFoundIDs != 1) ? "s." : "."));
      UltraEdit.outputWindow.write("Missed " + sNotReplaced + " identifier reference" + ((sNotReplaced != "1") ? "s in " : " in ") +
                                   sNoFiles + " file" + ((sNoFiles != "1") ? "s." : "."));

   }
}
Best regards from Austria
Hi Mofi,

At first I would like to thank you for this awesome script wrote by you. Thank you. The script works extremely well and quick, through files.

If possible, could you update the script a little bit? If the reference & the identifier present over in a same file, the file name would be ignored.

For your reference, I attached two .png & screen-shots.

after_run_script.png
after_run_script.png (56.18 KiB) Viewed 258 times

This is generated by your latest updated script. Here you can see in the highlighted portion, the file name is present. Although the identifier & the reference appeared in the same file.

actual_output_files.png
actual_output_files.png (57.54 KiB) Viewed 258 times

This output should be like this. Please see above screen-shot's highlighted portion. Here you can see the file name was ignored due to the identifier & the reference appeared in the same file.
I extended and updated the script above on 2017-03-04.
Best regards from Austria
Perfect ............
images.png
images.png (46.04 KiB) Viewed 244 times

Just Awesome

thumbs-up-192.png
thumbs-up-192.png (31.76 KiB) Viewed 244 times
7 posts Page 1 of 1