Create a line number report for words found in lines based on a word list

Create a line number report for words found in lines based on a word list

12
Basic UserBasic User
12

    Mar 04, 2012#1

    Hi all. Here is an example for my question. I have a text file like this (Edit1):

    Code: Select all

    Line1 a b b c d
    Line2 c d d e
    Line3 a b f f
    Line4 b d f g
    Line5 a c e f
    and a second file like this (Edit2):

    Code: Select all

    a
    b
    f
    g
    Please can somebody help me writing a script which produces the following in a new file:

    Code: Select all

    a Line1 Line3 Line5
    b Line1 Line3 Line4
    f Line3 Line4 Line5
    g Line4
    Thank you.

    6,675585
    Grand MasterGrand Master
    6,675585

      Mar 04, 2012#2

      I suppose the characters are separated by a horizontal tab character and not by a space character.

      Are on every line really only characters?

      The script could be much easier and faster if on the lines are really only single characters separated by tabs. But I suppose that there are words on every line and therefore wrote the script for word comparisons and not for character searches. That makes the script slower, but makes it work for characters and words.

      First file open (most left on open file tabs bar) must be the file to search in for the words or characters.

      Second file open must be the file with the words to find.

      The third file open can be the script file executed with command Scripting - Run Active Script.

      Code: Select all

      if (UltraEdit.document.length > 1)
      {
         // Define the environment for the script.
         UltraEdit.insertMode();
         UltraEdit.columnModeOff();
         UltraEdit.document[0].hexOff();  // File to search in for words.
         UltraEdit.document[1].hexOff();  // File with the words to find.
      
         // Select all and load the file contents into an array of lines.
         UltraEdit.document[0].selectAll();
         if (UltraEdit.document[0].isSel())
         {
            var asLines = UltraEdit.document[0].selection.split("\r\n");
            UltraEdit.document[0].top();  // Discards the selection.
      
            UltraEdit.document[1].selectAll();
            if (UltraEdit.document[1].isSel())
            {
               var asWords = UltraEdit.document[1].selection.split("\r\n");
               UltraEdit.document[1].top();  // Discards the selection.
               // Remove last string if it is an empty string because
               // of second file ends with a line termination.
               if (asWords[asWords.length-1] == "") asWords.pop();
      
               // Create a new empty array for the results.
               var asResults = new Array(asWords.length);
      
               // Search in every line for the defined words. The words
               // are separated by a single tab character on every line.
               for (var nLineNum = 0; nLineNum < asLines.length; nLineNum++)
               {
                  if (!asLines[nLineNum].length) continue;  // Ignore empty lines.
      
                  // Convert line number to decimal string.
                  var sLineNum = (nLineNum+1).toString();
      
                  // Split the line up into an array of word strings.
                  var asStrings = asLines[nLineNum].split("\t");
      
                  // Run a case sensitive word comparison from first to last
                  // word in the words array on the strings of current line.
                  for (var nWord = 0; nWord < asWords.length; nWord++)
                  {
                     var sWord = asWords[nWord];
                     // Search in strings array for the current word.
                     for (var nString = 0; nString < asStrings.length; nString++)
                     {
                        if (asStrings[nString] == sWord)
                        {
                           // Was this word found already once on any line?
                           if (asResults[nWord] != null)
                           {
                              // Yes, append the line number information.
                              asResults[nWord] += "\t" + sLineNum;
                           }
                           else  // This word is found the first time. Create
                           {     // the results string with word and line number.
                              asResults[nWord] = sWord + "\t" + sLineNum;
                           }
                           // Remove this word from the strings array for reducing
                           // the number of compares on the following words.
                           asStrings.splice(nString,1);
                           break;  // Exit comparison loop.
                        }
                     }  // Continue with next word on current line.
                  }     // Continue with next line.
               }
               // Write the results into a new file.
               UltraEdit.newFile();
               UltraEdit.activeDocument.unixMacToDos();
               var sResult = asResults.join("\r\n") + "\r\n";
               UltraEdit.activeDocument.write(sResult);
               UltraEdit.activeDocument.top();
            }
         }
      }

      12
      Basic UserBasic User
      12

        Mar 05, 2012#3

        Thank you Mofi. Your script is correct, but there is a small mistake in my example that the line number. It means the name of lines. Can you repair it. Thanks so much.

        6,675585
        Grand MasterGrand Master
        6,675585

          Mar 05, 2012#4

          So Line1, Line2, ... are just placeholders for individual line identifying strings.

          Well, there are just a few small changes required to interpret first string on every line as line identifier.

          Code: Select all

          if (UltraEdit.document.length > 1)
          {
             // Define the environment for the script.
             UltraEdit.insertMode();
             UltraEdit.columnModeOff();
             UltraEdit.document[0].hexOff();  // File to search in for words.
             UltraEdit.document[1].hexOff();  // File with the words to find.
          
             // Select all and load the file contents into an array of lines.
             UltraEdit.document[0].selectAll();
             if (UltraEdit.document[0].isSel())
             {
                var asLines = UltraEdit.document[0].selection.split("\r\n");
                UltraEdit.document[0].top();  // Discards the selection.
          
                UltraEdit.document[1].selectAll();
                if (UltraEdit.document[1].isSel())
                {
                   var asWords = UltraEdit.document[1].selection.split("\r\n");
                   UltraEdit.document[1].top();  // Discards the selection.
                   // Remove last string if it is an empty string because
                   // of second file ends with a line termination.
                   if (asWords[asWords.length-1] == "") asWords.pop();
          
                   // Create a new empty array for the results.
                   var asResults = new Array(asWords.length);
          
                   // Search in every line for the defined words. The words
                   // are separated by a single tab character on every line.
                   for (var nLineNum = 0; nLineNum < asLines.length; nLineNum++)
                   {
                      if (!asLines[nLineNum].length) continue;  // Ignore empty lines.
          
                      // Split the line up into an array of word strings.
                      var asStrings = asLines[nLineNum].split("\t");
          
                      // Run a case sensitive word comparison from first to last
                      // word in the words array on the strings of current line.
                      for (var nWord = 0; nWord < asWords.length; nWord++)
                      {
                         var sWord = asWords[nWord];
                         // Search in strings array for the current word.
                         for (var nString = 1; nString < asStrings.length; nString++)
                         {
                            if (asStrings[nString] == sWord)
                            {
                               // Was this word found already once on any line?
                               if (asResults[nWord] != null)
                               {
                                  // Yes, append the line identifier information.
                                  asResults[nWord] += "\t" + asStrings[0];
                               }
                               else  // This word is found the first time. Create
                               {     // the results string with word and line identifier.
                                  asResults[nWord] = sWord + "\t" + asStrings[0];
                               }
                               // Remove this word from the strings array for reducing
                               // the number of compares on the following words.
                               asStrings.splice(nString,1);
                               break;  // Exit comparison loop.
                            }
                         }  // Continue with next word on current line.
                      }     // Continue with next line.
                   }
                   // Write the results into a new file.
                   UltraEdit.newFile();
                   UltraEdit.activeDocument.unixMacToDos();
                   var sResult = asResults.join("\r\n") + "\r\n";
                   UltraEdit.activeDocument.write(sResult);
                   UltraEdit.activeDocument.top();
                }
             }
          }

          12
          Basic UserBasic User
          12

            Mar 05, 2012#5

            Yes, Thanks you. You're my idol :D