How to replace strings from an outside text file

How to replace strings from an outside text file

5
NewbieNewbie
5

    Apr 18, 2014#1

    How to replace strings from an outside text file
    The Macro function is extremely helpful except for its character limit (somewhere within 300 lines). As an alternative, the Script module may work. It’s a shame I know little about it. So your generous help is much anticipated and appreciated.
    I have prepared a source.txt containing thousands of tab-separated lines. I also have a text running in UC ranging from 10 lines to a few hundred. My goal is to replace words or strings in the text with corresponding ones from the source.txt. How can I make it? A thousand thanks in advance.

    e.g.
    --------------------------- source.txt ---------------------

    a b
    c d
    e f
    g h
    i j


    --------------------------- text running in UC ---------------------
    …a c e…

    --------------------------- desired text in UC ---------------------
    …b d f…

    6,686585
    Grand MasterGrand Master
    6,686585

      Apr 18, 2014#2

      Scripts can be executed only on files opened in UltraEdit or UEStudio, but not opened in UltraCompare. UC has no script interpreter included.

      Here is the code for the script file to run a search and replace for all words listed line by line in a tab delimited CSV file (second file) on first file.

      The script file with the code below must be an ASCII/ANSI file.

      Code: Select all

      // The first file - most left one on the open file tabs bar - must be
      // the file which should be modified.
      
      // The file with the list of words to search for and to replace with
      // must be the second file currently opened.
      
      // The third file could be this script file which can be executed when
      // it is the active file by using command "Run Active Script" from menu
      // "Scripting". Or this script is added to the scripts list and executed
      // from this list or by the assigned hotkey/chord.
      
      if (UltraEdit.document.length > 1)  // Are 2 files opened?
      {
         // Define environment for this script.
         UltraEdit.insertMode();
         if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
         else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
      
         // Determine the line terminator type of the list file.
         var sLineTerm = "\r\n";    // Default is DOS/Windows line termination.
         if(UltraEdit.document[1].lineTerminator == 1)      sLineTerm = "\n"; // Unix
         else if(UltraEdit.document[1].lineTerminator == 2) sLineTerm = "\r"; // Mac
      
         // Get entire contents of the list file into an array of
         // strings whereby each string is a line from the file.
         UltraEdit.document[1].selectAll();
         if (UltraEdit.document[1].isSel())  // The list file should not be empty!
         {
            var asLines = UltraEdit.document[1].selection.split(sLineTerm);
            UltraEdit.document[1].top();     // Just to cancel the selection.
      
            // Remove the last string from the array which is most likely an empty
            // string caused by the line termination on last line in the list file.
            if (!asLines[asLines.length-1].length) asLines.pop();
      
            // Move caret to top of first file, the file to modify.
            UltraEdit.document[0].top();
      
            // Define all parameters for a case-sensitive, standard (non regular
            // expression) Replace All with matching and replacing only whole
            // words to avoid that the letters of a short word being part of
            // a longer word are replaced, too.
      
            // For example, a list file containing
      
            //    today tommorow
            //    to from
      
            // and "today" and "to" existing also in the file to modify would
            // result in "frommmorow" and "from" without matching whole words
            // only. But also possible would be "fromday" and "from" if the
            // list file contains the 2 lines in the following order.
      
            //    to from
            //    today tommorow
      
            // Yes, replacing words is not as easy as it looks at first sight.
      
            UltraEdit.ueReOn();
            UltraEdit.document[0].findReplace.mode=0;
            UltraEdit.document[0].findReplace.matchCase=true;
            UltraEdit.document[0].findReplace.matchWord=true;
            UltraEdit.document[0].findReplace.regExp=false;
            UltraEdit.document[0].findReplace.searchDown=true;
            UltraEdit.document[0].findReplace.searchInColumn=false;
            UltraEdit.document[0].findReplace.preserveCase=false;
            UltraEdit.document[0].findReplace.replaceAll=true;
            UltraEdit.document[0].findReplace.replaceInAllOpen=false;
      
            for (var nLine = 0; nLine < asLines.length; nLine++)
            {
               // Get position of horizontal tab character in line.
               var nTabPosition = asLines[nLine].indexOf('\t');
      
               // Ignore the line if it does not contain a tab character.
               if (nTabPosition < 0) continue;
      
               // The string left of tab character is the search string.
               var sSearch = asLines[nLine].substring(0,nTabPosition);
      
               // The string right of tab character is the replace string.
               var sReplace = asLines[nLine].substring(++nTabPosition);
      
               // Run the Replace All with these 2 strings from list file.
               UltraEdit.document[0].findReplace.replace(sSearch,sReplace);
            }
         }
      }
      
      Best regards from an UC/UE/UES for Windows user from Austria

      5
      NewbieNewbie
      5

        Apr 21, 2014#3

        Thank you, Mofi. It works unexpectedly well. Its remarkable performance prompted me to extend its power, and of course I'm looking to you for help (because I failed after a few attempts).

        I need to process texts containing both regular and irregular strings, such as phrases "attach great/too much/a lot of importance to" to be converted to "attach importance to" or anything I want. With my strenuous, inefficient macros, I know I can use the expression "\b(attach|attaches|attached|attaching) .{1,30}? importance to\b" (which can refer to "…attach great/too much/a lot of importance to…") to achieve the result.

        Your script works perfect and fast on regular strings. Now I'm confident that a few modifications to it will work on irregular ones. I hope it won't put you to too much trouble.

        6,686585
        Grand MasterGrand Master
        6,686585

          Apr 21, 2014#4

          By changing the code block

          Code: Select all

                UltraEdit.ueReOn();
                UltraEdit.document[0].findReplace.mode=0;
                UltraEdit.document[0].findReplace.matchCase=true;
                UltraEdit.document[0].findReplace.matchWord=true;
                UltraEdit.document[0].findReplace.regExp=false;
          to

          Code: Select all

                UltraEdit.perlReOn();
                UltraEdit.document[0].findReplace.mode=0;
                UltraEdit.document[0].findReplace.matchCase=true;
                UltraEdit.document[0].findReplace.matchWord=false;
                UltraEdit.document[0].findReplace.regExp=true;
          the replaces are case-sensitive regular expression replaces using Perl regular expression engine instead of case-sensitive non regular expression replaces matching only whole words with UltraEdit regular expression engine being selected.

          Now the list file can contain Perl regular expression search and replace strings separated by a single tab on each line. With this modification the search string can be

          \b(?:attach|attaches|attached|attaching) .{1,30}? importance to\b

          or

          \battach(?:es|ed|ing)? .{1,30}? importance to\b

          But be careful now with simple words in the same list file. They need now \b at beginning and end of the word, or \< at beginning of a single word and \> at end of the word.

          Or the script is enhanced with automatically changing the options depending on the search string.

          For example the script can activate Perl regular expression options if the search string contains a non word character as the modified script below does.

          Code: Select all

          // The first file - most left one on the open file tabs bar - must be
          // the file which should be modified.
          
          // The file with the list of words to search for and to replace with
          // must be the second file currently opened.
          
          // The third file could be this script file which can be executed when
          // it is the active file by using command "Run Active Script" from menu
          // "Scripting". Or this script is added to the scripts list and executed
          // from this list or by the assigned hotkey/chord.
          
          if (UltraEdit.document.length > 1)  // Are 2 files opened?
          {
             // Define environment for this script.
             UltraEdit.insertMode();
             if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
             else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
          
             // Determine the line terminator type of the list file.
             var sLineTerm = "\r\n";    // Default is DOS/Windows line termination.
             if(UltraEdit.document[1].lineTerminator == 1)      sLineTerm = "\n"; // Unix
             else if(UltraEdit.document[1].lineTerminator == 2) sLineTerm = "\r"; // Mac
          
             // Get entire contents of the list file into an array of
             // strings whereby each string is a line from the file.
             UltraEdit.document[1].selectAll();
             if (UltraEdit.document[1].isSel())  // The list file should not be empty!
             {
                var asLines = UltraEdit.document[1].selection.split(sLineTerm);
                UltraEdit.document[1].top();     // Just to cancel the selection.
          
                // Remove the last string from the array which is most likely an empty
                // string caused by the line termination on last line in the list file.
                if (!asLines[asLines.length-1].length) asLines.pop();
          
                // Move caret to top of first file, the file to modify.
                UltraEdit.document[0].top();
          
                // Define all parameters for a case-sensitive, standard (non regular
                // expression) Replace All with matching and replacing only whole
                // words by default to avoid that the letters of a short word being
                // part of a longer word are replaced, too.
          
                // For example, a list file containing
          
                //    today tommorow
                //    to from
          
                // and "today" and "to" existing also in the file to modify would
                // result in "frommmorow" and "from" without matching whole words
                // only. But also possible would be "fromday" and "from" if the
                // list file contains the 2 lines in the following order.
          
                //    to from
                //    today tommorow
          
                // Yes, replacing words is not as easy as it looks at first sight.
                // This type of replace is used for simple words in the list. A Perl
                // regular expression replace is used for other search strings with
                // any non word character (letter, digit, underscore).
          
                UltraEdit.perlReOn();
                UltraEdit.document[0].findReplace.mode=0;
                UltraEdit.document[0].findReplace.matchCase=true;
                UltraEdit.document[0].findReplace.searchDown=true;
                UltraEdit.document[0].findReplace.searchInColumn=false;
                UltraEdit.document[0].findReplace.preserveCase=false;
                UltraEdit.document[0].findReplace.replaceAll=true;
                UltraEdit.document[0].findReplace.replaceInAllOpen=false;
          
                for (var nLine = 0; nLine < asLines.length; nLine++)
                {
                   // Get position of horizontal tab character in line.
                   var nTabPosition = asLines[nLine].indexOf('\t');
          
                   // Ignore the line if it does not contain a tab character.
                   if (nTabPosition < 0) continue;
          
                   // The string left of tab character is the search string.
                   var sSearch = asLines[nLine].substring(0,nTabPosition);
          
                   // The string right of tab character is the replace string.
                   // Modify the replace string by inserting character ¿ after
                   // every character of the replace string. This avoids finding
                   // an already inserted string once again on subsequent replaces.
                   var sReplace = asLines[nLine].substring(++nTabPosition).replace(/(.)/g,"$1¿");
          
                   // Contains the search string a character which is whether
                   // a letter, nor a digit and also not an underscore?
                   if (sSearch.search(/\W/) < 0)
                   {
                      // No, run a normal replace matching only whole words.
                      UltraEdit.document[0].findReplace.matchWord=true;
                      UltraEdit.document[0].findReplace.regExp=false;
                   }
                   else
                   {
                      // Yes, run a Perl regular expression replace.
                      UltraEdit.document[0].findReplace.matchWord=false;
                      UltraEdit.document[0].findReplace.regExp=true;
                   }
          
                   // Run the Replace All with these 2 strings from list file.
                   UltraEdit.document[0].findReplace.replace(sSearch,sReplace);
                }
                // Remove all ¿ characters in entire file.
                UltraEdit.document[0].findReplace.matchWord=false;
                UltraEdit.document[0].findReplace.regExp=false;
                UltraEdit.document[0].findReplace.replace("¿","");
             }
          }
          
          Update on 2014-04-23: Code of script modified a little according to new requirement as written below.
          Best regards from an UC/UE/UES for Windows user from Austria

          5
          NewbieNewbie
          5

            Apr 22, 2014#5

            Yes. Once again, your work exceeds my expectation. Thanks a lot.
            Except for a minor glitch, it'll be perfect and outperform a specialized software (AntConc for corpus analysis) I'm currently using in many respects.
            The tiny problem is:
            If the source.txt (the one on the right) contains such lines as:


            abbreviatetab(abbreviate|abbreviates|abbreviated|abbreviating)
            abbreviatestab(abbreviate|abbreviates|abbreviated|abbreviating)
            abbreviatedtab(abbreviate|abbreviates|abbreviated|abbreviating)

            (all lines are tab-delimited.)

            And if "abbreviate" "abbreviates" and "abbreviated" are to be replaced, only the result of the last word "abbreviated" is as expected. Results of the first two are mingled with replacement from those that follow. I guess it's because the loop (not sure of the term) runs twice or more times, so it replaces part of previous results with matchable replacements that follow. You can try and see what I mean.

            6,686585
            Grand MasterGrand Master
            6,686585

              Apr 23, 2014#6

              Well, I have written in the comments of the script that such a double replacement can occur. I thought up to now, that the script should
              • replace a single word by another single word, or
              • replace 1 or more words using a Perl regular expression by a single word.
              Your example with replacing a single word by a list of words and next search for another word which is in the list of words just inserted by a previous replace is a new requirement.

              "Do not replace something in a string already inserted by the script" is usually not a trivial requirement. Different solutions are possible for this requiremt.
              • The list file can be created in a manner which avoids such a replace in an already inserted replacement string.
              • Another solution for this problem can be seen in script posted at Script to act as word censor - replace words in file A by words in file B in active file.
              • The third solution is the one I used now in the updated script above. The script makes it very unlikely that an inserted string during a replace operation is found again by temporary manipulating the replace string.
              So take the updated script code from my updated post above and run it. If you want to see why a replace string is very unlikely not found again on subsequent replaces, press Ctrl+Z to undo the last Replace All and look on the contents of the modified file. I expect that the file to modify does not contain character ¿ (decimal code value 191 in Latin I code page) anywhere.
              Best regards from an UC/UE/UES for Windows user from Austria