Removing dupe lines (again, sorry!)

Removing dupe lines (again, sorry!)

2
NewbieNewbie
2

    Apr 09, 2011#1

    I have the following case:

    NEW FILE
    blabla blabla blable ItemID=94374042 blabla blablublibla blable other_data
    blabla blabla blable ItemID=91087082 blabla blabla blbleblaable other_data
    blabla blabla blable ItemID=92415300 blabla blabla blablingble other_data
    bplofla blabla blable ItemID=91584918 blabplbangofla blabla blable other_data
    blabla blabla blable ItemID=95484087 blabla blapowbla blaplofble other_data
    bhahalabla blabla blable ItemID=93881915 blabla blabla blable other_data
    blablabli blabla blable ItemID=93391409 blabla blabla blable other_data
    blabla blblublzabla blable ItemID=94508261 blabla blabandla blable other_data

    OLD FILE 1
    blabplofla blabla blable ItemID=95709167 splashlabla blabla blable other_data
    blabla blabla blable ItemID=94889375 blabla blabbingla blable other_data
    blabla blabla blable ItemID=91087082 blabla blabla blable other_data
    bpifflabla blabla blable ItemID=93989584 blabla blabla blable other_data
    blabla blabla blable ItemID=91930654 blabla blabla bangblable other_data
    blabla blabla blable ItemID=93621288 blabla blabla blasockble other_data
    blabla blabla blable ItemID=96507582 blabla blabla blable other_data
    blablabli blabla blable ItemID=92221673 blabla blabla blable other_data
    blabla blablabluble blable ItemID=93391409 blabla blabla blable other_data
    blabla bllololoaei blable ItemID=93775797 blabla blabla blable other_data

    OLD FILE 2
    blabplofla blabla blable ItemID=91424876 blabplofla blabla blable other_data
    blabplofla blabla blable ItemID=93272698 blabplofla bingblabla blable other_data
    blabfla blabla blable ItemID=94407207 blabplofla blabla blable other_data
    bplofla blabla blable ItemID=91584918 blabplbangofla blabla blable other_data
    blabbliplo blabla blable ItemID=95498779 blabplofla blabla blable other_data
    blabploblu blabla blable ItemID=91634932 blabplofboffla blabla blable other_data
    blabplofla blabla blable ItemID=90366946 blabplofla blabla blwowable other_data
    bleepbplofla blabla blable ItemID=92169269 blabplofla blabla blable other_data

    I need to remove the lines from the NEW file that contain an "ItemID" already present in one or other of the OLD files.
    The rest of the content of the line is of no importance.
    OLD files are only a few at the moment, all in the same folder and guaranteed not to hold dupes, about 1500 lines in all.
    At the end they should grow in size and number to a total of about 30'000 lines or more. I don't mind waiting a few seconds for the dupe-check to complete.
    Question: what happens to the execution of the macro if the NEW file is constituted solely of duplicate ItemId's? At the end of my work this might occur.
    I have not been able to get either ReplInFiles or FindInFiles give what I need, but it's probably due to my non-knowledge (commonly called ignorance :oops: ).

    Thank you for your help!!
    DoduEd

    6,681583
    Grand MasterGrand Master
    6,681583

      Apr 09, 2011#2

      Here is a macro to delete the duplicate lines which need macro property Continue if search string not found checked. You have to modify in the macro the directory containing all the old files and the file type specification to find all lines with ItemID=[0-9]+ in all the old files in this directory. It is necessary to open the new file and only this file before executing the macro. The new file must be stored in a different folder or must have a different file extension. In other words the FindInFiles command should not find the lines with ItemID=[0-9]+ in the new file or the new file is empty after running the macro. The new file must be a file with DOS line terminators.

      InsertMode
      ColumnModeOff
      HexOff
      Bottom
      IfColNumGt 1
      InsertLine
      EndIf
      Top
      UltraEditReOn
      FindInFiles MatchCase RegExp "
      C:\Temp\" "*.txt" "ItemID=[0-9]+"
      Top
      UnicodeToASCII
      Loop 0
      Find MatchCase RegExp "ItemID=[0-9]+"
      IfFound
      Find MatchCase RegExp AllFiles "%*^s*^p"
      Replace All ""
      Else
      ExitLoop
      EndIf
      EndLoop
      CloseFile NoSave


      How it works? First, the macro makes sure that last line of opened file has a line termination just for security. With the FindInFiles command all lines containing ItemID=[0-9]+ in all the old files in the specified directory are copied into a new file - the results file of the search. This file is converted to ASCII. Next from top of the results file to bottom every ItemID=[0-9]+ is selected with an UltraEdit regular expression search in a loop until no one found anymore. For every found ItemID an UltraEdit regular expression replace all in all open filles is executed to delete all lines containing the selected ItemID in the new file and the active results file. After the loop finished the results file is closed without saving and the result is a new file not containing any ItemID already present in one of the old files.

      2
      NewbieNewbie
      2

        Apr 10, 2011#3

        Perfect...
        Thank you for your time and knowledge!
        This is solved, hurrah!
        DoduEd