How to find a match between two txt files?

How to find a match between two txt files?

3
NewbieNewbie
3

    12:12 - Jan 28#1

    Greetings,

    I have two text files with over 10000 lines: fileA.txt and fileB.txt

    I want to find match/same value lines between these two files. It should work like this: The script searches from line number 1 on fileA.txt and match it with fileB.txt. If there is no match then it should go to the next line (line number 2) in fileA.txt.

    Example fileA.txt:

    Code: Select all

    orange
    apple
    pear
    banana
    So it should search orange in the file fileB.txt and if it does not find orange then script should go to next line which is apple, and so on.

    Thanks for the help.

    6,602547
    Grand MasterGrand Master
    6,602547

      17:53 - Jan 29#2

      I posted several UltraEdit scripts for such a search in a loop for each line in file A in all lines of file B and list the found lines (duplicates) in file C. I wrote even UltraEdit macros for such a task before UltraEdit scripts feature became available. But before I look for them in the scripts and macros forums, why do you not run in a command prompt window the following command?

      Code: Select all

      findstr.exe /L /X /G:"fileA.txt" "fileB.txt" >"Duplicates.txt"
      The results file Duplicates.txt contains finally all lines existing entirely in fileA.txt and in fileB.txt. The string comparison is done case-sensitive. The option /I can be added for a case-insensitive search.

      There could be executed also:

      Code: Select all

      findstr.exe /L /V /X /G:"fileA.txt" "fileB.txt" >"OnlyFileB.txt"
      The results file OnlyFileB.txt contains finally all lines existing entirely only in fileB.txt.
      Best regards from an UC/UE/UES for Windows user from Austria

      3
      NewbieNewbie
      3

        4:01 - 26 days ago#3

        Thanks Mofi for your reply

        However. I haven't find any similar thread to my case. 

        I'd be grateful if you can help me find one or write a script for me. 

        my background is not in programming so I don't understand what you said about using command prompt. but want i want is search line from fileA compare it to fileB and if find a duplicate then it will write in fileC.

        Thanks

        6,602547
        Grand MasterGrand Master
        6,602547

          6:16 - 26 days ago#4

          See 10 Ways to Open the Command Prompt in Windows 10 or Windows 11. Use the command cd to change the current working directory to the directory containing the two files. Then run the first command I posted in my previous post. The execution of findstr /? in the command prompt window outputs the usage help of the Windows command FINDSTR.

          The command can be also executed directly from within UltraEdit by clicking on ribbon tab Advanced on the item Run DOS command. The three file names must be specified in this case with fully qualified file names (drive + path + name + extension) or there is used the button Browse to select the Working directory containing the two files in which the third file will be created too.

          The usage of the Windows command FINDSTR would be definitely best for this case. It is of course also possible to write an UltraEdit script doing the same as FINDSTR. But that script would take much longer to search for all lines in fileA.txt in the fileB.txt and more information about the real strings in fileA.txt would be needed. Searching for an entire line requires a regular expression search in which case it must be known if the lines in fileA.txt can contain also characters like ^\(){} and others which could be misinterpreted as regular expression characters on using a line read from fileA.txt in an UltraEdit or Perl regular expression search in fileB.txt to find only entire lines. If that is possible as the lines in fileA.txt are not only fruit names, a different code design would be necessary for the script which searches literally in fileB.txt for a line read from fileA.txt and then check if a positive match in fileB.txt is indeed an entire line and not just a part of the line like fileA.txt contains the line orange1 and fileB.txt contains the lines orange1 and another orange10 and found should be only the line orange1 in fileB.txt. It would be necessary to know further for an UltraEdit script solution if the search should be run case-sensitive or case-insensitive. In other words a lot more information are needed for coding an UltraEdit script finally really working with real file contents whereby I am still sure that FINDSTR is the best choice for this task.
          Best regards from an UC/UE/UES for Windows user from Austria

          3
          NewbieNewbie
          3

            13:13 - 24 days ago#5

            Thanks for your reply. 

            I followed your code. 
            findstr.exe /L /X /G:"fileA.txt" "fileB.txt" >"Duplicates.txt"

            I insert one match line in both file. but I can't find it in duplicates. 

            i am not sure what went wrong. Do I need to install specific apps to run the findstr? I using windows 10 64 bit

            and I also run findstr /L /X /G:"fileA.txt" "fileB.txt" >"Duplicates.txt" (without the exe) cause i refer to parameter that you point out. and did not have the result also. 

            6,602547
            Grand MasterGrand Master
            6,602547

              14:25 - 24 days ago#6

              FINDSTR is a Windows command as the referenced Microsoft documentation page explains. It is an executable installed with Windows (any 32-bit and 64-bit Windows) with the fully qualified file name C:\Windows\System32\findstr.exe on Windows is installed to C:\Windows as by default.

              Here is a Windows batch file demonstrating that the FINDSTR command line works:

              Code: Select all

              @echo off
              setlocal EnableExtensions DisableDelayedExpansion
              cls
              rem Creating fileA.txt
              (echo orange
               echo apple
               echo pear
               echo banana
               echo cherries
              ) 1>"%TEMP%\fileA.txt"
              rem Creating fileB.txt
              (echo The fruit dish contains:
               echo apple
               echo bananas
               echo plums
               echo cherries
               echo one green pear
               echo bunch of grapes
               echo five oranges
              ) 1>"%TEMP%\fileB.txt"
              echo fileA.txt contains the lines:
              echo(
              type "%TEMP%\fileA.txt"
              echo(
              echo fileB.txt contains the lines:
              echo(
              type "%TEMP%\fileB.txt"
              echo on
              %SystemRoot%\System32\findstr.exe /L /X /G:"%TEMP%\fileA.txt" "%TEMP%\fileB.txt" 1>"%TEMP%\Duplicates.txt"
              @echo off
              echo(
              echo Duplicates.txt contains the lines:
              echo(
              type "%TEMP%\Duplicates.txt"
              echo(
              rem Delete the three created text files.
              del "%TEMP%\fileA.txt" "%TEMP%\fileB.txt" "%TEMP%\Duplicates.txt"
              setlocal EnableExtensions EnableDelayedExpansion & for /F "tokens=1,2" %%G in ("!CMDCMDLINE!") do endlocal & if /I "%%~nG" == "cmd" if /I "%%~H" == "/c" pause
              endlocal
              Copying and pasting these lines into a new ANSI encoded file, save the file with file name Test.cmd into a folder of your choice like C:\Temp and executing this batch file either with a double click in Windows File Explorer or from within a command prompt window by entering its fully qualified file name enclosed in " like "C:\Temp\Test.cmd" or using in UltraEdit Run DOS Command... with "%f" for Command (replaced by UltraEdit on execution with full name of active file Test.cmd enclosed in "), nothing for Working directory, a checked Show DOS box option and ANSI selected for Handle output as results in the output:

              Code: Select all

              fileA.txt contains the lines:
              
              orange
              apple
              pear
              banana
              cherries
              
              fileB.txt contains the lines:
              
              The fruit dish contains:
              apple
              bananas
              plums
              cherries
              one green pear
              bunch of grapes
              five oranges
              
              C:\Temp>C:\Windows\System32\findstr.exe /L /X /G:"C:\Users\username\AppData\Local\Temp\fileA.txt" "C:\Users\username\AppData\Local\Temp\fileB.txt"  1>"C:\Users\username\AppData\Local\Temp\Duplicates.txt"
              
              Duplicates.txt contains the lines:
              
              apple
              cherries
              FINDSTR does not support Unicode encoded text files with UTF-16 encoding. The text files must be ANSI or UTF-8 encoded without byte order mark. The file with the duplicates is then also ANSI or UTF-8 encoded without byte order mark (BOM).

              Some UltraEdit script for similar tasks:
              Please note that writing matching lines to a third file is just the inverted result of writing unique lines to a third file.
              Best regards from an UC/UE/UES for Windows user from Austria

              18672
              MasterMaster
              18672

                13:14 - 22 days ago#7

                Hi utoyoy111,

                this task should be doable using Perl regexp on joined files. If you do not insist on a script solution then I can post a draft.

                BR, Fleggy

                EDIT: sorry, the UE Perl regexp engine implementation is very buggy. Frankly speaking I have not been able to do serious work in UE for several years because of that :(
                But I can post the solution if you don't mind using Notepad++ (or another editor with Perl regexp support).