How to find and replace two consecutive link separated by a comma?

How to find and replace two consecutive link separated by a comma?

81
Advanced UserAdvanced User
81

    Jul 01, 2016#1

    Is it possible to find and replace two consecutive link separated by a comma.

    Random sample:

    Code: Select all

    I am jones, <xref type="bibr" rid="ref2">[2]</xref>, <xref type="bibr" rid="ref3">[3]</xref>, <xref type="bibr" rid="ref4">[4]</xref>, <xref type="bibr" rid="ref12">[12]</xref> and my age is <xref type="bibr" rid="ref5">[5]</xref>, <xref type="bibr" rid="ref6">[6]</xref>. How are you <xref type="bibr" rid="ref15">[5]</xref>, <xref type="bibr" rid="ref61">[61]</xref>
    After macro run it should be:

    Code: Select all

    I am jones, <xref type="bibr" rid="ref2">[2]</xref>&#x2013;<xref type="bibr" rid="ref3"/><xref type="bibr" rid="ref4">[4]</xref>, <xref type="bibr" rid="ref12">[12]</xref> and my age is <xref type="bibr" rid="ref5">[5]</xref>, <xref type="bibr" rid="ref6">[6]</xref>. How are you <xref type="bibr" rid="ref15">[5]</xref>, <xref type="bibr" rid="ref61">[61]</xref>
    The replace will only happen when 3 or more consecutive links are there. That is why <xref type="bibr" rid="ref5">[5]</xref>, <xref type="bibr" rid="ref6">[6]</xref> is unchanged as well as <xref type="bibr" rid="ref15">[5]</xref>, <xref type="bibr" rid="ref61">[61]</xref>

    Pardon my English writing skills :mrgreen:

    6,681583
    Grand MasterGrand Master
    6,681583

      Jul 02, 2016#2

      Is it possible to code this with a script instead of a macro?

      UE/UES macros do not support variables which would make this task very, very tricky to code as macro. It is also not really easy to code this in JavaScript as UE/UES script, but it would be definitely much easier because of support for variables and built-in string manipulating functions.
      Best regards from an UC/UE/UES for Windows user from Austria

      81
      Advanced UserAdvanced User
      81

        Jul 02, 2016#3

        Unfortunately, I don't have any knowledge in scripting of any sort :(
        Is it possible to make a search pattern which will only find consecutive strings like <xref type="bibr" rid="ref5">[5]</xref>, <xref type="bibr" rid="ref6">[6]</xref> and not search for non-consecutive strings like <xref type="bibr" rid="ref15">[5]</xref>, <xref type="bibr" rid="ref61">[61]</xref> ??

        6,681583
        Grand MasterGrand Master
        6,681583

          Jul 02, 2016#4

          The check for reference number in second link being +1 of reference number in first link, and reference number in third link being +1 of reference number in second link, and so on requires additional code. No regular expression search can convert part of a found string to an integer using decimal system, increment this number, convert the number again to a string and check if in next link this incremented number is present, and do this in a loop for each reference link found in sequence separated by a comma and 1 or more whitespaces.

          The Perl regular expression search string

          (?:<xref type="bibr" rid="ref\d+">\[\d+\]</xref>\s*,\s*){2,}<xref type="bibr" rid="ref\d+">\[\d+\]</xref>

          finds 3 or more links separated with a comma. But checking if each reference number found twice in each link matched by \d is +1 of the previous reference number can't be done by the regular expression search. And an indefinite number of replaces within the found string as it would be required here is also not possible with a single regular expression search.

          Such requirements are beyond text finding and replacing. Perl script interpreter not to be confused with regular expressions in Perl syntax has special evaluation functions to process a find with an integer reference and perform mathematical operations on each found string for validation or determining replace string. But even in a Perl script this would not be an easy task to code using those evaluation functions.

            Jul 10, 2016#5

            Here is an UltraEdit script for this definitely not easy to code replacement of comma separated continuous references by a from/to reference range in active file.

            Code: Select all

            function updateReferences(nFirst, nLast)
            {
               // On first reference of a continuous reference series remove
               // all trailing whitespaces and append an EN DASH character.
               g_asReferences[nFirst] = g_asReferences[nFirstIndex].replace(/\s+$/,"");
               g_asReferences[nFirst] += "&#x2013;"
            
               // On all references of the continuous reference series between first
               // and last reference remove preceding and trailing whitespaces, the
               // reference number in square brackets and change the element with a
               // start and end tag to an empty element.
               for (var nModIndex = nFirst+1; nModIndex < nLast; nModIndex++)
               {
                  g_asReferences[nModIndex] = g_asReferences[nModIndex].replace(/^[^<]*(<.+)>\[\d+\]<\/xref>\s*/,"$1/>");
               }
            
               // On last reference of a continuous reference series
               // remove the preceding whitespaces.
               g_asReferences[nLast] = g_asReferences[nLast].replace(/^[^<]*(<.+)$/,"$1");
            }
            
            if (UltraEdit.document.length > 0)  // Is any file opened?
            {
               // Define environment for this script.
               UltraEdit.insertMode();
               UltraEdit.columnModeOff();
            
               // Move caret to top of the active file.
               UltraEdit.activeDocument.top();
            
               // A case-sensitive Perl regular expression is used to find 3 or more
               // comma separated references executed in a loop up to end of file.
               UltraEdit.perlReOn();
               UltraEdit.activeDocument.findReplace.mode=0;
               UltraEdit.activeDocument.findReplace.matchCase=true;
               UltraEdit.activeDocument.findReplace.matchWord=false;
               UltraEdit.activeDocument.findReplace.regExp=true;
               UltraEdit.activeDocument.findReplace.searchDown=true;
               if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
               {
                  UltraEdit.activeDocument.findReplace.searchInColumn=false;
               }
            
               while(UltraEdit.activeDocument.findReplace.find('(?:<xref type="bibr" rid="ref\\d+">\\[\\d+\\]</xref>\\s*,\\s*){2,}<xref type="bibr" rid="ref\\d+">\\[\\d+\\]</xref>'))
               {
                  // Copy the references into a globally referenced list of strings.
                  var g_asReferences = UltraEdit.activeDocument.selection.split(',');
            
                  var nNumber = -1;
                  var nRefIndex = 1;
                  var nFirstIndex = 0;
                  var bModified = false;
            
                  // Get first reference number from string inside the square
                  // brackets converted to an integer number using decimal system.
                  var nRefNumber = parseInt(g_asReferences[0].replace(/^.*\[(\d+)\].*$/,"$1"),10);
            
                  // Now process in a loop all references from list except the first one.
                  while (nRefIndex < g_asReferences.length)
                  {
                     //  Get the reference number as integer of current refernce.
                     nNumber = parseInt(g_asReferences[nRefIndex].replace(/^.*\[(\d+)\].*$/,"$1"),10);
            
                     // Is this reference number NOT +1 of previous reference number?
                     if (nNumber != (nRefNumber+1))
                     {
                        // Are there at least 3 references with each reference number
                        // being +1 of the reference number of previous reference.
                        if ((nRefIndex - nFirstIndex) > 2)
                        {
                           bModified = true;
                           // The previous reference is the last one of the reference
                           // series and so 1 must be subtracted from current index.
                           // And there is one more reference and so a comma must be
                           // appended on last reference of the series.
                           updateReferences(nFirstIndex,nRefIndex-1);
                        }
                        // Append a comma to previous reference string.
                        g_asReferences[nRefIndex-1] += ",";
                        // The current reference is the first one of a perhaps new series.
                        nFirstIndex = nRefIndex;
                        nRefNumber = nNumber;
                     }
                     else
                     {
                        nRefNumber++;  // Reference series continues.
                     }
                     nRefIndex++;      // Process next reference.
                  }
            
                  // Belongs the last reference to a continuous reference series.
                  if ((nRefIndex - nFirstIndex) > 2)
                  {
                     bModified = true;
                     // The previous reference is the last one of the reference
                     // series and so 1 must be subtracted from current index.
                     updateReferences(nFirstIndex,nRefIndex-1);
                  }
                  else if (nFirstIndex != (nRefIndex-1))
                  {
                     // Append a comma to last but one reference string if just
                     // the last two references have continuous reference numbers.
                     g_asReferences[nRefIndex-2] += ",";
                  }
            
                  // Was anything in list of reference modified at all?
                  if (bModified)
                  {
                     UltraEdit.activeDocument.write(g_asReferences.join(""));
                  }
               }
               UltraEdit.activeDocument.top();
            }
            
            The input file used for testing this script was:

            Code: Select all

            I am jones, <xref type="bibr" rid="ref2">[2]</xref>, <xref type="bibr" rid="ref3">[3]</xref>, <xref type="bibr" rid="ref4">[4]</xref>, <xref type="bibr" rid="ref5">[5]</xref>, <xref type="bibr" rid="ref12">[12]</xref>, <xref type="bibr" rid="ref14">[14]</xref> and my age is <xref type="bibr" rid="ref5">[5]</xref>, <xref type="bibr" rid="ref6">[6]</xref>. How are you <xref type="bibr" rid="ref8">[8]</xref>, <xref type="bibr" rid="ref9">[9]</xref>, <xref type="bibr" rid="ref10">[10]</xref>, <xref type="bibr" rid="ref61">[61]</xref>, <xref type="bibr" rid="ref62">[62]</xref>?
            
            I am jones, <xref type="bibr" rid="ref2">[2]</xref>, <xref type="bibr" rid="ref3">[3]</xref>, <xref type="bibr" rid="ref4">[4]</xref>, <xref type="bibr" rid="ref5">[5]</xref> and my age is <xref type="bibr" rid="ref5">[5]</xref>, <xref type="bibr" rid="ref6">[6]</xref>, <xref type="bibr" rid="ref7">[7]</xref>. How are you <xref type="bibr" rid="ref15">[5]</xref>, <xref type="bibr" rid="ref63">[63]</xref>?
            
            The output file produced by the script was:

            Code: Select all

            I am jones, <xref type="bibr" rid="ref2">[2]</xref>&#x2013;<xref type="bibr" rid="ref3"/><xref type="bibr" rid="ref4"/><xref type="bibr" rid="ref5">[5]</xref>, <xref type="bibr" rid="ref12">[12]</xref>, <xref type="bibr" rid="ref14">[14]</xref> and my age is <xref type="bibr" rid="ref5">[5]</xref>, <xref type="bibr" rid="ref6">[6]</xref>. How are you <xref type="bibr" rid="ref8">[8]</xref>&#x2013;<xref type="bibr" rid="ref9"/><xref type="bibr" rid="ref10">[10]</xref>, <xref type="bibr" rid="ref61">[61]</xref>, <xref type="bibr" rid="ref62">[62]</xref>?
            
            I am jones, <xref type="bibr" rid="ref2">[2]</xref>&#x2013;<xref type="bibr" rid="ref3"/><xref type="bibr" rid="ref4"/><xref type="bibr" rid="ref5">[5]</xref> and my age is <xref type="bibr" rid="ref5">[5]</xref>&#x2013;<xref type="bibr" rid="ref6"/><xref type="bibr" rid="ref7">[7]</xref>. How are you <xref type="bibr" rid="ref15">[5]</xref>, <xref type="bibr" rid="ref63">[63]</xref>?
            
            Best regards from an UC/UE/UES for Windows user from Austria