Sort feature - removing duplicates if only line parts are identical

Sort feature - removing duplicates if only line parts are identical

2
NewbieNewbie
2

    Jul 15, 2008#1

    For several years now I have suggested a change to UltraEdit sort feature. Sort allows the removal of duplicate records but it only considers a full record match. It would be far more useful if the exclusion of duplicates could be made based on the sort key criteria.

    Does anyone else see this as a useful enhancement? I have been asking IDM for several years now but they have not implemented. Perhaps if there was an interest among the user community this could get done.

    Thanks
    Dan

    236
    MasterMaster
    236

      Jul 15, 2008#2

      I think that's something you could easily do in a macro (doing a sort and then a regex-search/replace to weed out duplicates in whatever way you choose) or a script (prompting the user for input as to which "rules" should apply for duplicate removal). Do you have an example of what you're trying to do?

      2
      NewbieNewbie
      2

        Jul 15, 2008#3

        I work with many different files and depending on the analysis I am doing, I may wish to remove dupes based on different columns. Currently I am using SAS to remove dupes. To me the implementation currently in UE is useless and would love to see it changed. UE is a great tool and I would like to stay in one environment rather than bounce between programs.

        61
        Advanced UserAdvanced User
        61

          Jul 16, 2008#4

          I agree about having sort options that allow you to pick criteria and use that criteria to eliminate dups.

          3
          NewbieNewbie
          3

            Oct 15, 2008#5

            I strongly agree that this is a major shortcoming of UE. If I find other software that does this Ill probably junk UE. It is obvious that those designing this product did not ask programmers for teir input on this. Also it is obvious that, since the company did not add your suggestion, it is NOT responsive to the user community. One should NOT have to write a script or anything to achieve deleting duplicate records based on sort key.

            3
            NewbieNewbie
            3

              Apr 26, 2009#6

              could someone please help me write a macro which removes complete line if there are dupicates within Col21 - Col 81 ?

              236
              MasterMaster
              236

                Apr 26, 2009#7

                What do you mean? Remove two consecutive lines if the contents of columns 21-81 are identical?

                3
                NewbieNewbie
                3

                  Apr 26, 2009#8

                  I have a database file with my customers address, phonenumber etc.
                  in col 21-80 i have the company name. Some names are the same but the company has several addresses and but identical phone numbers.

                  I would like to remove all of the identical company names. leaving me with only one line for that company.

                  In plain: Locate and delete duplicated company names within col 21-81 along with deleting the the rest of the line. Thus leaving me with only one entry for that company rather than XX number of entries becouse of several addresses.


                  hope this makes sense :)

                  236
                  MasterMaster
                  236

                    Apr 27, 2009#9

                    OK. I guess that means that "duplicate entries" within your file are not necessarily on consecutive lines.

                    The following macro (try it on a copy of your data first!) will do the following:

                    1. Ensure that the last line in the file is CRLF terminated.
                    2. Sort the file according to columns 21 and up (this step is not undoable!)
                    3. Remove all lines where columns 21-81 (both included, i. e. 61 characters!) are identical, leaving only the first occurence.

                    Code: Select all

                    InsertMode
                    ColumnModeOff
                    HexOff
                    Key Ctrl+END
                    IfColNumGt 1
                    "
                    "
                    EndIf
                    Top
                    SortAsc 21 -1 0 0 0 0 0 0
                    PerlReOn
                    Find RegExp "^(.{20}(.{61}).*)\r\n(.{20}\2.*\r\n)+"
                    Replace All "\1\r\n"
                    Tested on UE V15.00.0.1043

                    3
                    NewbieNewbie
                    3

                      Apr 28, 2009#10

                      This seems to be working great. Thanks alot for your time :)

                      6,682583
                      Grand MasterGrand Master
                      6,682583

                        Re: Sort feature - removing duplicates if only line parts are id

                        Jul 14, 2009#11

                        Starting with UE v15.10.0 it is possible to sort a file and remove lines when only line parts are identical, see power tip Advanced file sort for details.
                        Best regards from an UC/UE/UES for Windows user from Austria