Find all lines in active file in all files within a directory

Find all lines in active file in all files within a directory

6
NewbieNewbie
6

    Feb 09, 2012#1

    Hello everyone, I am new in here so I came to ask what I did not seem to find in this nice software.

    What I basically need is to compare a file against multiple other files in other directory.Lets say if we have test.txt file in which:

    lksdfksdkfjklsdjkfjlsdkj
    mnbmvmbmvnb
    mmosaatyewtyetw
    xbvcvbcbv
    polqw536749


    and I want to check if in dir/x which has let's say 10 more text files, in any of them there are same lines.

    Is it possible to do that with this software?

    6,682583
    Grand MasterGrand Master
    6,682583

      Feb 14, 2012#2

      Yes, it is possible, but requires coding a script or macro. If you need this only once and there are not many strings in the base file, I suggest to use Find in Files command from menu Search. Select a string to search for and then click on the command in the menu. A dialog opens where selected string is already set as string to search for. Define the other parameters like file type (*.*) and the directory and run the search. In the output window you can see in which files the string was found if found at all.

      For many lines to find here is the script running a Find in Files for all lines in the active document on script start. The results of all Find in Files are written to an edit window. You have to change in the script the directory path C:\\Temp\\ (must end with a backslash escaped with an additional backslash) and probably also the file type *.*

      The format of the results can be either changed with additional script code by running 1 or more regular expression replaces (best method) or by modifying the options at Advanced - Configuration - Search - Set Find Output Format. I don't know which UltraEdit you use (especially which language) and how the results file should look. Therefore I have not added any code to reformat the results file.

      Code: Select all

      if (UltraEdit.document.length > 0)
      {
         // Define the environment for the script.
         UltraEdit.insertMode();
         if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
         else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
         UltraEdit.activeDocument.hexOff();
      
         // Select all and load the file contents into an array of lines.
         UltraEdit.activeDocument.selectAll();
         if (UltraEdit.activeDocument.isSel())
         {
            // The following command works only for files with DOS line terminators!
            var asLines = UltraEdit.activeDocument.selection.split("\r\n");
            UltraEdit.activeDocument.top();
      
            // Define parameters for the Find in Files executed below in a loop for every line.
            UltraEdit.frInFiles.filesToSearch=0;               // Search in a directory.
            UltraEdit.frInFiles.directoryStart="C:\\Temp\\";   // This is the directory.
            UltraEdit.frInFiles.searchInFilesTypes="*.*";      // Search in these files.
            UltraEdit.frInFiles.useEncoding=false;             // Run an ANSI search.
            UltraEdit.frInFiles.ignoreHiddenSubs=true;         // Ignore hidden subdirectories.
            UltraEdit.frInFiles.matchCase=true;                // Run a case sensitive search.
            UltraEdit.frInFiles.reverseSearch=false;           // Do not find files not containing searched string.
            UltraEdit.frInFiles.matchWord=false;               // Search for strings and not entire words.
            UltraEdit.frInFiles.openMatchingFiles=false;       // Do not open files with string found.
            UltraEdit.frInFiles.displayLinesDoNotMatch=false;  // Do not find lines not containing search string.
            UltraEdit.frInFiles.useOutputWindow=false;         // Output find result to edit window.
            UltraEdit.frInFiles.searchSubs=false;              // Do not search in subdirectories.
            UltraEdit.frInFiles.regExp=false;                  // Run a non regular expression search.
      
            // Run a Find in Files for all lines in active document. This find does
            // not make sure that the found string is really an entire line in the
            // search files. So it can be that also lines are found which contains
            // the searched string, but also additional characters left and/or right.
            for (var nLineNum = 0; nLineNum < asLines.length; nLineNum++)
            {
               if (!asLines[nLineNum].length) continue;  // Ignore empty lines.
               UltraEdit.frInFiles.find(asLines[nLineNum]);
            }
            // The results file is the active file now. Move caret to top
            // of this file and convert the file from Unicode to ASCII/ANSI.
            UltraEdit.activeDocument.top();
            UltraEdit.activeDocument.unicodeToASCII();
         }
      }

      6
      NewbieNewbie
      6

        Feb 16, 2012#3

        I created a text file on my desktop with name find.js and pasted what you wrote above. Then opened Scripting - Scripts and added it. Next I opened Search - Find in Files, but when I pasted a few lines that exist in a few files in a directory it doesn't find them.

        Can you please tell me what I miss? Do I need any settings?

        I made following modifications in the script because I need to search only in text files:

        Code: Select all

                  UltraEdit.frInFiles.directoryStart="C:\\Temp\\";   // This is the directory.
                  UltraEdit.frInFiles.searchInFilesTypes=".txt";      // Search in these files.
        Please describe the steps so I can test.

        Thank you for spending time on this. I did not find anything online that will do this for my search.

        6,682583
        Grand MasterGrand Master
        6,682583

          Feb 16, 2012#4

          1. Create a new file in UltraEdit by pressing Ctrl+N if a new file is not already displayed after starting UltraEdit.
          2. Make sure the new file is an ASCII file with DOS line terminators. If you see on the status bar at bottom of the UltraEdit window in third box just DOS, the new file is an ASCII file with DOS line terminators. Otherwise you would need the commands in submenu File - Conversions to convert the file to ASCII with DOS line terminators.
          3. Select the script code in your browser window and press Ctrl+C.
          4. Switch back to UltraEdit and press Ctrl+V to paste the code into the new file.
          5. Go to the lines with folder and file type specification.
          6. First change the file type specification to *.txt. The asterisk is important!
          7. Second change the path to the folder containing the *.txt files if you have not moved the files into folder C:\Temp\. If you modify the path, you must enter 2 backslashes for every backslash in the path and the path must end with 2 backslashes.
          8. Press F12 to open Save As dialog and save the script file to any folder you want. A good place is usually the Scripts folder in the UltraEdit program files directory if you have write access to this folder with your account. But you can use also any other folder.
          9. Open Scripting - Script List and add the just saved script to this list.
          10. Open the file containing the lines you want find in the other files, or for first testing the script, create a new file and enter some lines which exist in the *.txt files in the specified folder.
          11. Open menu Scripting and click on the name of the script file.
          12. The script is now executed and you should see a new file with the results of the Find in Files commands executed by the script for every line of active file on script execution.
          13. If output window is not open, open it with Window - Output Window and check if you can see Script succeeded in the output window and not an error message.
          That's it. As I already wrote, you can either with regular expression replaces from within the script change the format of the results, or you define the format of the results before running the script on the file with the lines to find at Advanced - Configuration - Search - Set Find Output Format. I can help you with the regular expressions replaces, but I would need an example how the results file looks after script execution and how it should look to code for you the replaces to get the output format you want.

          6
          NewbieNewbie
          6

            Feb 19, 2012#5

            It doesn't do the job, can you have a look remotely?

            If you can do it, please leave your skype, msn, yahoo or any chat you use by pm-ing me ... tnx

            6,682583
            Grand MasterGrand Master
            6,682583

              Feb 19, 2012#6

              I can't help you remotely nor do I use Skype or any chat tool.

              I have packed a slightly modified version of the script with ZIP and uploaded it as attachment to the post.

              With this modification you can open the script file together with the text file containing the lines to search for in the other files in UltraEdit.

              No other file than this script file and the list file should be opened in UltraEdit.

              Edit the directory path on line 23 if necessary and save the script file.

              Use from menu Scripting the command Run Active Script and you should see shortly a switch to the list file with the lines to search for and then a results document window should appear with listing the results.
              FindLinesInTextFiles.zip (1.24 KiB)   335
              Script file to use for searching lines in *.txt files in an entered directory.

              6
              NewbieNewbie
              6

                Feb 19, 2012#7

                Now it is working fine.

                But is it possible to make it search within directory I select? Do I need to paste all the time in C:\Temp?

                Is it possible to code the script, so that I can select the folder to search in?

                For example I open the file I want to check, and all other files which I want to check against are in dir/whatever ... so I just set that dir?

                Thanks for the help.

                6,682583
                Grand MasterGrand Master
                6,682583

                  Feb 19, 2012#8

                  It is possible that the script asks you for the full path of the directory to search in. But you have to type the full directory path manually, or paste the full path copied from address bar of Windows Explorer into the edit field. There is no scripting command which opens a "browse for directory" dialog which returns the selected directory path as string to the script. UltraEdit scripts are primary to automate regularly needed (file modification) actions without user interactions. There are lots of programming and script languages which are designed for coding applications with user interactions. I replaced above ZIP file with a new version which asks you for the directory path on execution.

                  It would be also possible that the script file uses the path of the list file as directory path. But the list file with the lines to search for should not be in the same directory as the *.txt files or it has a different file extension. Otherwise all lines in the list file with file extension TXT are surely found in a *.txt file in the directory of the list file making the results less useful.

                  6
                  NewbieNewbie
                  6

                    Mar 13, 2012#9

                    Hello again, it works now. But is it checking the lines against each other in the same file?

                    Example:

                    1234567890
                    hop6709984
                    1234567890


                    Will it detect now that line 1 is same as line 3?

                    Thank you again for this great plugin.

                    6,682583
                    Grand MasterGrand Master
                    6,682583

                      Mar 13, 2012#10

                      The script as is does not eliminate duplicate entries in the source file before searching for the lines in all files of a directory. And the script does not remove lines found several times in one of the files.

                      If you want to remove duplicate lines in source file before searching for the lines in the files, best run from within the script a sort of entire lines with removing duplicates on the source file. I don't know how the output currently looks like and how it should look like and therefore can't suggest a method to remove duplicates in output file.

                      There are several macros and also some scripts posted demonstrating how to remove duplicate lines without sorting which is slower than simply sorting with removing duplicates, but sometimes necessary because the order of the lines should be kept.


                      If I should adapt the script to find and ignore duplicate lines in source file before running the Find in Files, or report duplicate found lines in one of the files searched in, I need much more details.

                      What should the script do exactly? And on which file should the new code work on - the source file with the lines to search for or the generated output file?

                      Please post a block of lines before running new script code and how this block should look like after running new script code. Enclose both blocks in Code BBCode tags by selecting each block and clicking on Code button above the edit area.

                      6
                      NewbieNewbie
                      6

                        Mar 15, 2012#11

                        All the script does now is to check all lines in the source file against all files in the c:\temp

                        I need the script to check also the lines in the source file for duplicates.

                        Source file example:

                        Code: Select all

                        11111-22222-33333-44444-55555
                        aaaaa-bbbbb-ccccc-ddddd-eeeee
                        11111-22222-33333-44444-55555
                        As you see here we have line 1 and 3 duplicated. The script should alert me about this, how many they are, and if there are other dupplicates.

                        But will it not check if:

                        Code: Select all

                        1111122222333334444455555
                        aaaaabbbbbcccccdddddeeeee
                        1111122222333334444455555
                        No dashes?

                        Or will it not check if it's:

                        Code: Select all

                        11112222333344445555
                        aaaabbbbccccddddeeee
                        11112222333344445555
                        Less characters per line?

                        I need it NOT to do that (to skip line if the are not the same character count or no dashes)

                        You can make report to a different file and the file contains for example:

                        11112222333344445555 was found in this and this file, or in the source file and this and this file, if nowhere just report it as now 0 times in 0 files + 0 times in source file.

                        Add these changes should be done on existing script and not a new one. I would like to do all the stuff with just one click as it is now.

                        Thank you again for the good support.

                        6,682583
                        Grand MasterGrand Master
                        6,682583

                          Mar 17, 2012#12

                          I'm now really confused and do not understand anymore what you want. I give you an example what I understand as a detailed description.

                          File open on script start is C:\Temp\Test.lst containing following lines:

                          Code: Select all

                          11111-22222-33333-44444-55555
                          11112222333344445555
                          aaaaa-bbbbb-ccccc-ddddd-eeeee
                          11111-22222-33333-44444-55555
                          1111122222333334444455555
                          11112222333344445555
                          11111-22222-33333-44444-55555
                          The script is executed on C:\Temp\ as entered by me. This directory contains 3 *.txt files.

                          C:\Temp\Test1.txt contains the lines:

                          Code: Select all

                          1111122222333334444455555
                          aaaaabbbbbcccccdddddeeeee
                          1111122222333334444455555
                          C:\Temp\Test2.txt contains the lines:

                          Code: Select all

                          11111-22222-33333-44444-55555
                          aaaaa-bbbbb-ccccc-ddddd-eeeee
                          1111122222333334444455555
                          C:\Temp\Test3.txt contains just the line:

                          Code: Select all

                          88811-22222-33333-44444-55555
                          The script produces currently for these files:

                          Code: Select all

                          ----------------------------------------
                          Find '11111-22222-33333-44444-55555' in 'C:\Temp\Test2.txt':
                          C:\Temp\Test2.txt(1): 11111-22222-33333-44444-55555
                          Found '11111-22222-33333-44444-55555' 1 time(s).
                          Search complete, found '11111-22222-33333-44444-55555' 1 time(s). (1 file(s)).
                          Search complete, found '11112222333344445555' 0 time(s). (0 file(s)).
                          ----------------------------------------
                          Find 'aaaaa-bbbbb-ccccc-ddddd-eeeee' in 'C:\Temp\Test2.txt':
                          C:\Temp\Test2.txt(2): aaaaa-bbbbb-ccccc-ddddd-eeeee
                          Found 'aaaaa-bbbbb-ccccc-ddddd-eeeee' 1 time(s).
                          Search complete, found 'aaaaa-bbbbb-ccccc-ddddd-eeeee' 1 time(s). (1 file(s)).
                          ----------------------------------------
                          Find '11111-22222-33333-44444-55555' in 'C:\Temp\Test2.txt':
                          C:\Temp\Test2.txt(1): 11111-22222-33333-44444-55555
                          Found '11111-22222-33333-44444-55555' 1 time(s).
                          Search complete, found '11111-22222-33333-44444-55555' 1 time(s). (1 file(s)).
                          ----------------------------------------
                          Find '1111122222333334444455555' in 'C:\Temp\Test1.txt':
                          C:\Temp\Test1.txt(1): 1111122222333334444455555
                          C:\Temp\Test1.txt(3): 1111122222333334444455555
                          Found '1111122222333334444455555' 2 time(s).
                          ----------------------------------------
                          Find '1111122222333334444455555' in 'C:\Temp\Test2.txt':
                          C:\Temp\Test2.txt(3): 1111122222333334444455555
                          Found '1111122222333334444455555' 1 time(s).
                          Search complete, found '1111122222333334444455555' 3 time(s). (2 file(s)).
                          Search complete, found '11112222333344445555' 0 time(s). (0 file(s)).
                          ----------------------------------------
                          Find '11111-22222-33333-44444-55555' in 'C:\Temp\Test2.txt':
                          C:\Temp\Test2.txt(1): 11111-22222-33333-44444-55555
                          Found '11111-22222-33333-44444-55555' 1 time(s).
                          Search complete, found '11111-22222-33333-44444-55555' 1 time(s). (1 file(s)).
                          And the output window contains the lines:

                          Code: Select all

                          Running script: C:\Program Files\IDM Computer Solutions\UltraEdit\scripts\FindLinesInTextFiles.js
                          ========================================================================================================
                          Script succeeded.
                          The script should first detect duplicate lines in C:\Temp\Test.lst. If there are no duplicate lines in active file on script start, the output window is simply not modified.

                          But if any duplicate line is found during script execution, the output window should be automatically made visible and list the duplicate lines as follows in the output window.

                          Found 3 duplicate lines in input list file. The duplicate lines are:

                          C:\Temp\Test.lst(1): 11111-22222-33333-44444-55555
                          C:\Temp\Test.lst(4): 11111-22222-33333-44444-55555
                          C:\Temp\Test.lst(7): 11111-22222-33333-44444-55555

                          C:\Temp\Test.lst(2): 11112222333344445555
                          C:\Temp\Test.lst(6): 11112222333344445555


                          The script should ignore the duplicate lines in input file and reformat the list of found lines to show finally following:

                          Code: Select all

                          ----------------------------------------
                          
                          C:\Temp\Test2.txt(1): 11111-22222-33333-44444-55555
                          
                          Found '11111-22222-33333-44444-55555' 1 time(s) in 1 file(s).
                          
                          ----------------------------------------
                          
                          Found '11112222333344445555' 0 time(s) in (0 file(s).
                          
                          ----------------------------------------
                          
                          C:\Temp\Test2.txt(2): aaaaa-bbbbb-ccccc-ddddd-eeeee
                          
                          Found 'aaaaa-bbbbb-ccccc-ddddd-eeeee' 1 time(s) in 1 file(s).
                          
                          ----------------------------------------
                          
                          C:\Temp\Test1.txt(1): 1111122222333334444455555
                          C:\Temp\Test1.txt(3): 1111122222333334444455555
                          
                          C:\Temp\Test2.txt(3): 1111122222333334444455555
                          
                          Found '1111122222333334444455555' 3 time(s) in 2 file(s).

                          Something like that would make it clear for a script developer what the script should do and how to test it.