User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

Help with writing and running scripts
6 posts Page 1 of 1
Hi to all,
Dear mofi

I have directory with name Text. It has many files like
Code: Select all
01_Inhalt.xhtml
02_Kapitel01.xhtml
03_Kapitel02.xhtml
04_Kapitel03.xhtml
05_Kapitel04.xhtml
index.xhtml

The file index.xhtml contains:
Code: Select all
1. <p class="bib"><b>A</b>igen <a href="../Text/@@@.xhtml#index_1">44</a></p>
2. <p class="bib">Alter Markt <a href="../Text/@@@.xhtml#index_2">15</a> f., <a href="../Text/@@@.xhtml#index_4">21</a></p>
3. <p class="bib">Andr&#x00E4;kirche <a href="../Text/@@@.xhtml#index_5">23</a>, <a href="../Text/@@@.xhtml#index_6">36</a></p>
4. <p class="bib"><i>Anreise <a href="../Text/@@@.xhtml#index_7">81</a> f.</i></p>
5. <p class="bib">Antretterhaus <a href="../Text/@@@.xhtml#index_8">16</a></p>
6. <p class="bib">AugustinerBr&#x00E4;u <a href="../Text/@@@.xhtml#index_9">7</a>, <a href="../Text/@@@.xhtml#index_10">56</a></p>
7. <p class="bib"><i>Auskunft <a href="../Text/@@@.xhtml#index_11">82</a> f.</i></p>

The numbers 44, 15, 21, 23, 36, 81, ... are page numbers. The page tag is like <a id="page_44"></a>.
Those page tags are placed in either 02_Kapitel01.xhtml, 03_Kapitel02.xhtml, 03_Kapitel02.xhtml or 05_Kapitel04.xhtml.

The script should do:

  1. First open the file which contains the page tag like <a id="page_44"></a>.
    After that find the word Aigen.
    If this word is found then, put the link tag like <a id="index_1"></a> in front of the word Aigen like <a id="index_1"></a>Aigen
    But if this word is not found then, put the link tag like <a id="index_1"></a> after page tag like <a id="page_44"></a><a id="index_1"></a>.

  2. For the 2nd line:
    First open files containing the page tags <a id="page_15"></a> and <a id="page_21"></a>.
    After that find the phrase Alter Markt.
    If this string is found then put the link tag like <a id="index_2"></a> in front of the string Alter Markt like <a id="index_2"></a>Alter Markt.
    If the string is not found then put the link tag like <a id="index_2"></a> after page tag like <a id="page_15"></a><a id="index_2"></a>.
And continue this process for all lines in index.xhtml.

I hope you understand my problem.
I wait for your reply.
3 questions regarding this coding task:

  1. Does the file index.xhtml really contain just <a href="../Text/@@@.xhtml#index_1"> or is @@@ just in your example block used as a place holder for example for 04_Kapitel03.xhtml?
  2. Should the script remove all </?.*> (Perl regular expression to find start or end tags) in the string between <p class="bib"> and first <a href= and additionally trim whitespaces on both sides of the string before using the remaining string to search for in the opened file(s)?
  3. Does a string like Andr&#x00E4;kirche always exist in the referenced *.xhtml file also with umlaut ä encoded if existing at all?
Best regards from Austria
Dear Mofi,

Answer on 1st question:

index.xhtml contains <a href="../Text/@@@.xhtml#index_1"> at the first time of the process. When processing is completed, the name of the file containing <a id="index_1"></a> replaces @@@.xhtml. There were no missing text.

Answer on 2nd question:

Any formatting tag like <b>, <i>, <em> need to be removed after finding the text in index.xhtml, but the file index.xhtml should not be changed. Dear mofi, please mind that, the index file should not be changed, just remove the tags in memory for searching.

Answer of 3rd question:

All .xhtml files only contain hex entity. No Unicode character exists in any file.

Mofi I have attached a ZIP archive file (deleted later after mofi wrote the scripts). The files in the archive show what to do by the script done on the files in archive by us manually which of course takes a lot of time hence asking for automating this task by using a script.

Thanks in advance.
Here is the script for this task.

It requires English UltraEdit for Windows v16.00 or later as it is currently written.

The function GetFilePath must be copied into the script file, too.

Code: Select all
function OpenFile (sPageIdentifier)
{
   UltraEdit.outputWindow.clear();
   // Search in all *.xhtml files in folder of opened file for the page identifier.
   UltraEdit.frInFiles.find(sPageIdentifier);

   // The output is written into the output window. Copy the output window
   // content into user clipboard 9 and search for the results summary to
   // determine if the searched page identifer was found at all in any file.
   UltraEdit.outputWindow.copy();
   if(UltraEdit.clipboardContent.indexOf("0 time(s). (0 file(s)).") >= 0)
   {
      return -1;
   }

   // Get name of file with the page number with complete path.
   var sFileNameWithPath = UltraEdit.clipboardContent.replace(/^[\s\S]*([A-Za-z]:\\.+?\.xhtml)[\s\S]*$/,"$1");

   // Search in list of already opened files for this file. If the
   // file is already opened, move caret in this file to top of file.
   for(var nDocIndex = 0; nDocIndex < UltraEdit.document.length; nDocIndex++)
   {
      if(UltraEdit.document[nDocIndex].path == sFileNameWithPath)
      {
         UltraEdit.document[nDocIndex].top();
         return nDocIndex;
      }
   }

   // The file must be opened now.
   UltraEdit.open(sFileNameWithPath);
   return (UltraEdit.document.length - 1);
}


if (UltraEdit.document.length > 0)  // Is any file opened?
{
   // Define environment for this script.
   UltraEdit.insertMode();
   UltraEdit.columnModeOff();

   // Move caret to top of the active file.
   UltraEdit.activeDocument.top();

   // Find and select the block with the lines <p class="bib">...</p>.
   UltraEdit.perlReOn();
   UltraEdit.activeDocument.findReplace.mode=0;
   UltraEdit.activeDocument.findReplace.matchCase=true;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=true;
   UltraEdit.activeDocument.findReplace.searchDown=true;
   UltraEdit.activeDocument.findReplace.searchInColumn=false;

   // Do nothing if there is not at least one line.
   if(UltraEdit.activeDocument.findReplace.find('(?:<p class="bib">.+</p>[\\t ]*\r\n)+'))
   {
      // Get document index of active file.
      var nIndexFileDocIndex = UltraEdit.activeDocumentIdx;

      // Get path to folder of active file and initialize once all parameters
      // for Find in Files used to find the page numbers in all *.xhtml files.
      var sDirectory = GetFilePath();
      UltraEdit.frInFiles.filesToSearch=0;
      UltraEdit.frInFiles.directoryStart=sDirectory;
      UltraEdit.frInFiles.searchInFilesTypes="*.xhtml";
      UltraEdit.frInFiles.displayLinesDoNotMatch=false;
      UltraEdit.frInFiles.openMatchingFiles=false;
      UltraEdit.frInFiles.ignoreHiddenSubs=true;
      UltraEdit.frInFiles.useOutputWindow=true;
      UltraEdit.frInFiles.reverseSearch=false;
      UltraEdit.frInFiles.unicodeSearch=false;
      UltraEdit.frInFiles.useEncoding=false;
      UltraEdit.frInFiles.searchSubs=false;  // Enable this option for folder tree search.
      UltraEdit.frInFiles.matchWord=false;
      UltraEdit.frInFiles.matchCase=true;
      UltraEdit.frInFiles.regExp=false;

      // Select user clipboard 9 as active clipboard.
      UltraEdit.selectClipboard(9);

      // Get the selected lines into an array of strings.
      var asLines = UltraEdit.activeDocument.selection.split("\r\n");

      var bModifiedFile = false;

      // Ignore the last string which is of length 0 on further processing.
      var nLineCount = asLines.length - 1;

      for(var nLine = 0; nLine < nLineCount; nLine++)
      {
         // Does this line contain any page reference?
         if(asLines[nLine].search(/<a href=".+#index_\d+">\d+<\/a>/) < 0) continue;

         // Get the indexed string without leading and trailing
         // spaces/tabs and without any HTML tags in string.
         var sIndexedString = asLines[nLine].replace(/^.*<p class="bib">(.+?)<a href=.+$/,"$1");
         sIndexedString = sIndexedString.replace(/<\/?.+?>/g,"");
         sIndexedString = sIndexedString.replace(/^[\t ]*(.+?)[\t ]*$/,"$1");

         // Get all page references into an array of strings.
         var asPages = asLines[nLine].match(/<a href=".+?#index_\d+">\d+<\/a>/g);

         // Process each page reference.
         for(var nPage = 0; nPage < asPages.length; nPage++)
         {
            // Get index number and reformat it to an index identifier.
            var sIndexIdentifier = asPages[nPage].replace(/^.+#index_(\d+).+$/,'<a id="index_$1"></a>');

            // Get page identifier and open the file containing this page identifier.
            var sPageIdentifier = asPages[nPage].replace(/^.+>(\d+)<\/a>/,'<a id="page_$1"></a>');
            var nPageDocIndex = OpenFile(sPageIdentifier);

            // Skip this page reference if no file contains this page identifier.
            if(nPageDocIndex < 0) continue;

            // Get path of file with the path number relative to path of index
            // file which means here usually just the file name without any path.
            var sFileName = UltraEdit.document[nPageDocIndex].path.replace(sDirectory,"");
            sFileName = sFileName.replace("\\","/");

            // Update the reference to the page in line from index file.
            var sPageRef = asPages[nPage].replace(/\".+?#/,'"'+sFileName+'#');
            if(sPageRef != asPages[nPage])
            {
               asLines[nLine] = asLines[nLine].replace(asPages[nPage],sPageRef);
               bModifiedFile = true;
            }

            // Search in opened file from top for the page number identifier.
            UltraEdit.document[nPageDocIndex].findReplace.mode=0;
            UltraEdit.document[nPageDocIndex].findReplace.matchCase=true;
            UltraEdit.document[nPageDocIndex].findReplace.matchWord=false;
            UltraEdit.document[nPageDocIndex].findReplace.regExp=false;
            UltraEdit.document[nPageDocIndex].findReplace.searchDown=true;
            UltraEdit.document[nPageDocIndex].findReplace.searchInColumn=false;
            UltraEdit.document[nPageDocIndex].findReplace.find(sPageIdentifier);

            // Insert the index identifier either before indexed string or
            // after the page identifier if not already existing in the file.
            UltraEdit.document[nPageDocIndex].findReplace.matchCase=false;
            if(UltraEdit.document[nPageDocIndex].findReplace.find(sIndexedString))
            {
               // The column number get from file should be always the column
               // at end of found string where caret is blinking in the file.
               // But if the current file is not the active file, the returned
               // column number is wrong without moving the caret once in the file.
               UltraEdit.document[nPageDocIndex].key("LEFT ARROW");
               UltraEdit.document[nPageDocIndex].key("RIGHT ARROW");
               var nColumn = UltraEdit.document[nPageDocIndex].currentColumnNum;
               // Move caret to beginning of current line in current file.
               UltraEdit.document[nPageDocIndex].gotoLine(0,1);
               UltraEdit.document[nPageDocIndex].findReplace.regExp=true;
               // Does this line contain already the index identifier before indexed string?
               var sSearchRegExp = sIndexIdentifier + '(?:<a id="index_\\d+"></a>)*' + sIndexedString;
               if(!UltraEdit.document[nPageDocIndex].findReplace.find(sSearchRegExp))
               {
                  // No, then insert it before indexed string.
                  nColumn -= sIndexedString.length;
                  UltraEdit.document[nPageDocIndex].gotoLine(0,nColumn);
                  UltraEdit.document[nPageDocIndex].write(sIndexIdentifier);
                  bModifiedFile = true;
               }
            }
            else
            {
               UltraEdit.document[nPageDocIndex].key("LEFT ARROW");
               UltraEdit.document[nPageDocIndex].key("RIGHT ARROW");
               var nColumn = UltraEdit.document[nPageDocIndex].currentColumnNum;
               UltraEdit.document[nPageDocIndex].gotoLine(0,1);
               UltraEdit.document[nPageDocIndex].findReplace.regExp=true;
               // Does this line contain already the index identifier after page identifier?
               var sSearchRegExp = sPageIdentifier + '(?:<a id="index_\\d+"></a>)*' + sIndexIdentifier;
               if(!UltraEdit.document[nPageDocIndex].findReplace.find(sSearchRegExp))
               {
                  // No, then insert it after page identifier.
                  UltraEdit.document[nPageDocIndex].gotoLine(0,nColumn);
                  UltraEdit.document[nPageDocIndex].write(sIndexIdentifier);
                  bModifiedFile = true;
               }
            }
         }
      }

      UltraEdit.outputWindow.clear();
      if(bModifiedFile)
      {
         // Overwrite selection in index file with block with updated references.
         UltraEdit.document[nIndexFileDocIndex].write(asLines.join("\r\n"));
         UltraEdit.saveAll();
         UltraEdit.outputWindow.write("At least 1 file was updated.");
      }
      else
      {
         UltraEdit.outputWindow.write("No file was updated.");
      }

      // Clear user clipboard 9 and select Windows clipboard as active clipboard.
      UltraEdit.clearClipboard();
      UltraEdit.selectClipboard(0);
   }
}
Best regards from Austria
The above posted script works, but is inefficient because of running for each page number a Find in Files on all *.xhtml files in folder of opened index.xhtml. It is better to run first a Find in Files on all *.xhtml files to get a list (array) of found page numbers and file names and access this array on processing the lines in index.xhtml. The script below uses this method for more efficiency. It requires also the function GetFilePath copied into the script file.

Code: Select all
function GetFileNames (sDirectoryPath)
{
   // Select user clipboard 9 as active clipboard and clear output window.
   UltraEdit.selectClipboard(9);
   UltraEdit.outputWindow.clear();

   // Initialize all parameters for Find in Files used
   // to find the page identifiers in all *.xhtml files.
   UltraEdit.frInFiles.filesToSearch=0;
   UltraEdit.frInFiles.directoryStart=sDirectory;
   UltraEdit.frInFiles.searchInFilesTypes="*.xhtml";
   UltraEdit.frInFiles.displayLinesDoNotMatch=false;
   UltraEdit.frInFiles.openMatchingFiles=false;
   UltraEdit.frInFiles.ignoreHiddenSubs=true;
   UltraEdit.frInFiles.useOutputWindow=true;
   UltraEdit.frInFiles.reverseSearch=false;
   UltraEdit.frInFiles.unicodeSearch=false;
   UltraEdit.frInFiles.useEncoding=false;
   UltraEdit.frInFiles.searchSubs=false;  // Enable this option for folder tree search.
   UltraEdit.frInFiles.matchWord=false;
   UltraEdit.frInFiles.matchCase=true;
   UltraEdit.frInFiles.regExp=true;

   // Search in all *.xhtml files in folder of opened file for the page identifiers.
   UltraEdit.frInFiles.find('<a id="page_\\d+"></a>');

   // The output is written into the output window. Copy the output window
   // content into user clipboard 9 and search for the results summary to
   // determine if the searched page identifer was found at all in any file.
   UltraEdit.outputWindow.copy();
   if(UltraEdit.clipboardContent.indexOf("0 time(s). (0 file(s)).") >= 0)
   {
      UltraEdit.clearClipboard();
      UltraEdit.selectClipboard(0);
      UltraEdit.outputWindow.showStatus=false;
      return false;
   }

   var asFileNames = UltraEdit.clipboardContent.match(/[A-Za-z]:\\.+?\.xhtml.+<a id=\"page_\d+/g);

   // Clear user clipboard 9 and select Windows clipboard as active clipboard.
   UltraEdit.clearClipboard();
   UltraEdit.selectClipboard(0);
   UltraEdit.outputWindow.clear();

   // Fill the global array with the page numbers and in which file it was found.
   for(var nFileIndex = 0; nFileIndex < asFileNames.length; nFileIndex++)
   {
      var nPageNumber = parseInt(asFileNames[nFileIndex].replace(/^.+<a id=\"page_(\d+)$/,"$1"),10);
      g_asFileNames[nPageNumber] = asFileNames[nFileIndex].replace(/^(.+?\.xhtml).+$/,"$1");
   }
   return true;
}


function OpenFile (sPageIdentifier)
{
   var nPageNumber = parseInt(sPageIdentifier.replace(/^<a id="page_(\d+)".+$/,"$1"),10);
   if(g_asFileNames[nPageNumber] == null)
   {
      return (-1);
   }
   var sFileNameWithPath = g_asFileNames[nPageNumber];

   // Search in list of already opened files for this file. If the
   // file is already opened, move caret in this file to top of file.
   for(var nDocIndex = 0; nDocIndex < UltraEdit.document.length; nDocIndex++)
   {
      if(UltraEdit.document[nDocIndex].path == sFileNameWithPath)
      {
         UltraEdit.document[nDocIndex].top();
         return nDocIndex;
      }
   }

   // The file must be opened now.
   UltraEdit.open(sFileNameWithPath);
   return (UltraEdit.document.length - 1);
}


if (UltraEdit.document.length > 0)  // Is any file opened?
{
   // Define environment for this script.
   UltraEdit.insertMode();
   UltraEdit.columnModeOff();

   // Move caret to top of the active file.
   UltraEdit.activeDocument.top();

   // Find and select the block with the lines <p class="bib">...</p>.
   UltraEdit.perlReOn();
   UltraEdit.activeDocument.findReplace.mode=0;
   UltraEdit.activeDocument.findReplace.matchCase=true;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=true;
   UltraEdit.activeDocument.findReplace.searchDown=true;
   UltraEdit.activeDocument.findReplace.searchInColumn=false;

   // Do nothing if there is not at least one line.
   if(UltraEdit.activeDocument.findReplace.find('(?:<p class="bib">.+</p>[\\t ]*\r\n)+'))
   {
      // Get document index of active file.
      var nIndexFileDocIndex = UltraEdit.activeDocumentIdx;

      // Get path to folder of active file and fill array with page
      // identifiers and in which file the page identifier was found.
      var g_asFileNames = [];
      var sDirectory = GetFilePath();
      if(GetFileNames(sDirectory))
      {
         // Get the selected lines into an array of strings.
         var asLines = UltraEdit.activeDocument.selection.split("\r\n");

         var bModifiedFile = false;

         // Ignore the last string which is of length 0 on further processing.
         var nLineCount = asLines.length - 1;

         for(var nLine = 0; nLine < nLineCount; nLine++)
         {
            // Does this line contain any page reference?
            if(asLines[nLine].search(/<a href=".+#index_\d+">\d+<\/a>/) < 0) continue;

            // Get the indexed string without leading and trailing
            // spaces/tabs and without any HTML tags in string.
            var sIndexedString = asLines[nLine].replace(/^.*<p class="bib">(.+?)<a href=.+$/,"$1");
            sIndexedString = sIndexedString.replace(/<\/?.+?>/g,"");
            sIndexedString = sIndexedString.replace(/^[\t ]*(.+?)[\t ]*$/,"$1");

            // Get all page references into an array of strings.
            var asPages = asLines[nLine].match(/<a href=".+?#index_\d+">\d+<\/a>/g);

            // Process each page reference.
            for(var nPage = 0; nPage < asPages.length; nPage++)
            {
               // Get index number and reformat it to an index identifier.
               var sIndexIdentifier = asPages[nPage].replace(/^.+#index_(\d+).+$/,'<a id="index_$1"></a>');

               // Get page identifier and open the file containing this page identifier.
               var sPageIdentifier = asPages[nPage].replace(/^.+>(\d+)<\/a>/,'<a id="page_$1"></a>');
               var nPageDocIndex = OpenFile(sPageIdentifier);

               // Skip this page reference if no file contains this page identifier.
               if(nPageDocIndex < 0) continue;

               // Get path of file with the path number relative to path of index
               // file which means here usually just the file name without any path.
               var sFileName = UltraEdit.document[nPageDocIndex].path.replace(sDirectory,"");
               sFileName = sFileName.replace("\\","/");

               // Update the reference to the page in line from index file.
               var sPageRef = asPages[nPage].replace(/\".+?#/,'"'+sFileName+'#');
               if(sPageRef != asPages[nPage])
               {
                  asLines[nLine] = asLines[nLine].replace(asPages[nPage],sPageRef);
                  bModifiedFile = true;
               }

               // Search in opened file from top for the page number identifier.
               UltraEdit.document[nPageDocIndex].findReplace.mode=0;
               UltraEdit.document[nPageDocIndex].findReplace.matchCase=true;
               UltraEdit.document[nPageDocIndex].findReplace.matchWord=false;
               UltraEdit.document[nPageDocIndex].findReplace.regExp=false;
               UltraEdit.document[nPageDocIndex].findReplace.searchDown=true;
               UltraEdit.document[nPageDocIndex].findReplace.searchInColumn=false;
               UltraEdit.document[nPageDocIndex].findReplace.find(sPageIdentifier);

               // Insert the index identifier either before indexed string or
               // after the page identifier if not already existing in the file.
               UltraEdit.document[nPageDocIndex].findReplace.matchCase=false;
               if(UltraEdit.document[nPageDocIndex].findReplace.find(sIndexedString))
               {
                  // The column number get from file should be always the column
                  // at end of found string where caret is blinking in the file.
                  // But if the current file is not the active file, the returned
                  // column number is wrong without moving the caret once in the file.
                  UltraEdit.document[nPageDocIndex].key("LEFT ARROW");
                  UltraEdit.document[nPageDocIndex].key("RIGHT ARROW");
                  var nColumn = UltraEdit.document[nPageDocIndex].currentColumnNum;
                  // Move caret to beginning of current line in current file.
                  UltraEdit.document[nPageDocIndex].gotoLine(0,1);
                  UltraEdit.document[nPageDocIndex].findReplace.regExp=true;
                  // Does this line contain already the index identifier before indexed string?
                  var sSearchRegExp = sIndexIdentifier + '(?:<a id="index_\\d+"></a>)*' + sIndexedString;
                  if(!UltraEdit.document[nPageDocIndex].findReplace.find(sSearchRegExp))
                  {
                     // No, then insert it before indexed string.
                     nColumn -= sIndexedString.length;
                     UltraEdit.document[nPageDocIndex].gotoLine(0,nColumn);
                     UltraEdit.document[nPageDocIndex].write(sIndexIdentifier);
                     bModifiedFile = true;
                  }
               }
               else
               {
                  UltraEdit.document[nPageDocIndex].key("LEFT ARROW");
                  UltraEdit.document[nPageDocIndex].key("RIGHT ARROW");
                  var nColumn = UltraEdit.document[nPageDocIndex].currentColumnNum;
                  UltraEdit.document[nPageDocIndex].gotoLine(0,1);
                  UltraEdit.document[nPageDocIndex].findReplace.regExp=true;
                  // Does this line contain already the index identifier after page identifier?
                  var sSearchRegExp = sPageIdentifier + '(?:<a id="index_\\d+"></a>)*' + sIndexIdentifier;
                  if(!UltraEdit.document[nPageDocIndex].findReplace.find(sSearchRegExp))
                  {
                     // No, then insert it after page identifier.
                     UltraEdit.document[nPageDocIndex].gotoLine(0,nColumn);
                     UltraEdit.document[nPageDocIndex].write(sIndexIdentifier);
                     bModifiedFile = true;
                  }
               }
            }
         }

         UltraEdit.outputWindow.clear();
         if(bModifiedFile)
         {
            // Overwrite selection in index file with block with updated references.
            UltraEdit.document[nIndexFileDocIndex].write(asLines.join("\r\n"));
            UltraEdit.saveAll();
            UltraEdit.outputWindow.write("At least 1 file was updated.");
         }
         else
         {
            UltraEdit.outputWindow.write("No file was updated.");
         }
      }
   }
}
Best regards from Austria
Thanks Mofi.
You are such a Miracle Man.
6 posts Page 1 of 1