Delete part of a lines duplicate content

Delete part of a lines duplicate content

74
Advanced UserAdvanced User
74

    Nov 02, 2007#1

    Hi all,

    I've been trying to remove duplicate data from a file. What I need is for the data on the left side to only show one occurance, and list itself below any of the data on the right side it originally matched. My before and after examples probably makes more sense. I tried modifying DelDupLineInfo- but just keep hitting a wall.

    Any help is appreciated.
    Thanks, Max


    Here is what my data looks like:

    Code: Select all

    G10002-XXX-01785-REV-IR.xml	 <?FRAME ID='50' TITLE='xx' TOCLEVEL='1'>
    G10003-XXX-01785-REV-IR.xml	 <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
    G10003-XXX-01785-REV-IR.xml	 <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
    G10003-XXX-01785-REV-IR.xml	 <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
    G10008-XXX-01785-REV-IR.xml	 <?FRAME ID='100' TITLE='xxxxxxxx' TOCLEVEL='1'>
    G10009-XXX-01785-REV-IR.xml	 <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
    G10004-XXX-01785-REV-IR.xml	 <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
    G10004-XXX-01785-REV-IR.xml	 <?FRAME ID='250' TITLE='dveg xxx ts' TOCLEVEL='1'>
    G10004-XXX-01785-REV-IR.xml	 <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
    G10001-XXX-01785-REV-IR.xml	 <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
    G10001-XXX-01785-REV-IR.xml	 <?FRAME ID='300' TITLE='wwwwwwwwwwww' TOCLEVEL='2'>
    G10001-XXX-01785-REV-IR.xml	 <?FRAME ID='350' TITLE='draft' TOCLEVEL='2'>
    
    This is what I'm trying to get:

    Code: Select all

    <?FRAME ID='50' TITLE='xx' TOCLEVEL='1'>
    G10002-XXX-01785-REV-IR.xml	 
    <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
    <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
    G10003-XXX-01785-REV-IR.xml	 
    <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
    G10008-XXX-01785-REV-IR.xml	 
    <?FRAME ID='100' TITLE='xxxxxxxx' TOCLEVEL='1'>
    G10009-XXX-01785-REV-IR.xml	 
    <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
    <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
    <?FRAME ID='250' TITLE='dveg xxx ts' TOCLEVEL='1'>
    G10004-XXX-01785-REV-IR.xml	 
    <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
    <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
    <?FRAME ID='300' TITLE='wwwwwwwwwwww' TOCLEVEL='2'>
    <?FRAME ID='350' TITLE='draft' TOCLEVEL='2'>
    G10001-XXX-01785-REV-IR.xml

      Nov 02, 2007#2

      I'm coming a little further along. Below is the macro I wrote to check for a duplicate and if found paste DUPLICATE at the beginning. I figure this will at least give me a marker to delete off of. But I'm still having troubles with my loop. Currently it only works once.

      Code: Select all

      InsertMode
      ColumnModeOff
      HexOff
      UnixReOff
      Loop 
      Find RegExp "%[A-Z]"
      StartSelect
      Find Select ".xml"
      Copy 
      EndSelect
      Key HOME
      Key DOWN ARROW
      Find MatchCase "^c"
      IfFound
      Key HOME
      "DUPLICATE"
      IfNotFound
      ExitLoop
      EndIf
      

      6,603548
      Grand MasterGrand Master
      6,603548

        Nov 02, 2007#3

        The macro below produces following result:

        Code: Select all

        <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
        <?FRAME ID='300' TITLE='wwwwwwwwwwww' TOCLEVEL='2'>
        <?FRAME ID='350' TITLE='draft' TOCLEVEL='2'>
        G10001-XXX-01785-REV-IR.xml
        <?FRAME ID='50' TITLE='xx' TOCLEVEL='1'>
        G10002-XXX-01785-REV-IR.xml
        <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
        <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
        G10003-XXX-01785-REV-IR.xml
        <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
        <?FRAME ID='200' TITLE='xxxxxxxxx xxxxxxxx' TOCLEVEL='1'>
        <?FRAME ID='250' TITLE='dveg xxx ts' TOCLEVEL='1'>
        G10004-XXX-01785-REV-IR.xml
        <?FRAME ID='100' TITLE='xxxxxxxx' TOCLEVEL='1'>
        G10008-XXX-01785-REV-IR.xml
        <?FRAME ID='150' TITLE='ttttttttt bbbbbbbbbb' TOCLEVEL='1'>
        G10009-XXX-01785-REV-IR.xml
        As you can see it is nearly what you want. The difference is that the "G*.xml" lines are sorted before reformatting which is necessary for the macro below and as a result of this sort the output is sorted also according to the XML file names. 100% identical lines are also removed by the sort before reformatting the content.

        The macro property Continue if a Find with Replace not found or Continue if search string not found must be checked for this macro.

        InsertMode
        ColumnModeOff
        HexOff
        UnixReOff
        Bottom
        IfColNumGt 1
        InsertLine
        EndIf
        Top
        TrimTrailingSpaces
        SortAsc RemoveDup 1 -1 0 0 0 0 0 0
        Find RegExp "%^(G*.xml^)[ ^t]++^(<*^)$"
        Replace All "^2#|#^1"
        Loop
        Find RegExp "#|#*$"
        IfNotFound
        ExitLoop
        EndIf
        Cut
        Find "^c"
        Replace All ""
        Find "#|#"
        IfFound
        Key HOME
        Else
        Bottom
        EndIf
        Paste
        "
        "
        Key UP ARROW
        Delete
        Delete
        Delete
        EndLoop
        Top

        Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
        Best regards from an UC/UE/UES for Windows user from Austria