Sort by line (column) length

Sort by line (column) length

2
NewbieNewbie
2

    May 04, 2006#1

    I'm wondering if there is a way using UltraEdit (v12) to sort a file based on the length of each line in the file. I'd like to end up with a file that is sorted from shortest line to longest line with same length lines in no particular order.

    Thoughts on how to accomplish this task?

    6,606548
    Grand MasterGrand Master
    6,606548

      May 04, 2006#2

      The following macro will do the job. How it works:

      First it makes sure that the last line of the file is terminated with a line ending character (CRLF for DOS, LF for Unix, CR for MAC). This is important for the macro or the result will not be what you want.

      Next it moves the cursor to top and marks the start of every not blank line with the string "STARTOFLINE" because preceding white space characters should later not be removed by the macro.

      Then the macro inserts at top of the file a line with a specified number of spaces - see red number.

      Note: It's important that this number is higher than the longest line you have. Modify it if you have a line longer than 500 characters.

      This line is now copied to bottom of the file. The cursor is now at the end of this line with spaces which is the longest line in the file and is always at end of the file. With SelectToTop in column mode now the whole file content is selected in column mode. With ColumnRightJustify now all lines are right justified and preceding spaces are added at each line of the file.

      Next the added spaces on the last line are deleted and back at top of the file the longest line gets 1 character longer by inserting an additional space character.

      A simple sort in ascending order with start column 1 sorts now the lines according to the line length because a shorter line has more preceding spaces as a longer line.

      The green marked RemoveDup option removes blank lines and other duplicate lines. If you do not have blank lines, you can remove this option. If you have blank lines and do not use the RemoveDup option, all blank lines will be at top of the file after macro execution because they are the shortest lines.

      To delete the blanks lines at top of the file if RemoveDup cannot be used, the blue colored loop can be used. Remove the blue colored code if you use RemoveDup or your file does not contain blank lines.

      The longest line with only spaces is still at top of the file and is now removed.

      Finally the preceding spaces added by ColumnRightJustify with the start of line marker string and also the trailing spaces are removed and you will hopefully have the result you want.

      Note: Make sure your file does not contain hard tabs. Convert the tabs to spaces if the file contains tabs or the result will not look like you expect. You can do this conversion also with the macro command TabsToSpaces at start of the macro after command UnixReOff.

      InsertMode
      ColumnModeOff
      HexOff
      UnixReOff
      Bottom
      IfColNum 1
      Else
      "
      "
      EndIf
      Top
      TrimTrailingSpaces
      Find RegExp "%^([~^r^n]^)"
      Replace All "STARTOFLINE^1"
      "
      "
      Key UP ARROW
      Loop 500
      " "
      EndLoop
      SelectToTop
      Clipboard 9
      Copy
      Bottom
      Paste
      ClearClipboard
      Clipboard 0
      ColumnModeOn
      SelectToTop
      ColumnRightJustify
      ColumnModeOff
      EndSelect
      Bottom
      DeleteToStartofLine
      Top
      " "
      SortAsc IgnoreCase RemoveDup 1 -1 0 0 0 0 0 0
      DeleteLine
      Loop
      Find RegExp "% +^p"
      Replace All ""
      IfNotFound
      ExitLoop
      EndIf
      EndLoop

      Find MatchCase RegExp "% ++STARTOFLINE"
      Replace All ""
      TrimTrailingSpaces

      Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.

      Edited on 2008-02-06: Fixed macro to work correct also for large files with several thousands of lines. Previous version of the macro made the rectangular selection of whole file content in column mode wrong because of a bug in UltraEdit when the file is large.
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        May 05, 2006#3

        Mofi, thanks for the reply. However the macro is not sorting the file as expected. It's actually not sorting it at all. The first line of 500 spaces is added, then the macro seems to quit as the file remains in the original order. I made sure the file has a trailing LF and no tabs. I did remove the RemoveDup command and the blue colored loop as the file has no blank lines (it is a log file).

        Thoughts to troubleshoot?

        -dtb

        Edited on 2008-02-06 by Mofi: The reason why first version of the macro failed was the file size in combination of a bug in UltraEdit. Second version of the macro contains a workaround macro code for that bug.

        2
        NewbieNewbie
        2

          Feb 06, 2008#4

          Does this macro still work?

          The macro is only working for small subsets of the file I have to work. The complete file has about 45000 lines and all lines have less than 100 columns. It only works correctly with subsets of 3124 lines or less.

          236
          MasterMaster
          236

            Feb 06, 2008#5

            For a task like this, I think a scripting language is better suited. The following Python program should do the job:

            Code: Select all

            # -*- coding: iso-8859-1 -*-
            lines=open("yourfilenamehere.txt",'r').readlines()
            lines.sort(key=len)
            outfile=open("output.txt","w")
            for line in lines:
                outfile.write(line)
            outfile.close()
            No error checking to keep things simple. The program reads the entire file into memory, sorts it according to line length and then writes it back into the same directory under the name output.txt. Leading and trailing spaces count. I've tried it on a file with 66000 lines of up to 1000 characters each, it took my laptop about a fifth of a second.

            6,606548
            Grand MasterGrand Master
            6,606548

              Feb 06, 2008#6

              Thanks gracibf for telling me how large your file is. The first version of the macro really failed on large files because of a different column mode selection caused by UltraEdit resulting in ColumnRightJustify did nothing.

              I have created now for myself a test file with 72.631 lines (file size 3.959.892 bytes = 3.78 MiB), analyzed the problem and developed a workaround. The updated version of the macro sorts the 72.631 lines now correct. I have tested it with UltraEdit v13.20a+1 and v11.20b.

              Tim's Python script is surely much, much faster.
              Best regards from an UC/UE/UES for Windows user from Austria

              2
              NewbieNewbie
              2

                Feb 07, 2008#7

                I agree that from a given size, this should be done using a scripting language or even coding a simple C program to do it. But since I need to do other kind of transforms to the text and some of them are semi-manual I don't mind waiting a bit for UltraEdit.

                Mofi, thank you very much for the updated macro. It works very well. I've seen your posts here and they're very helpful and useful. Thanks again.

                pietzcker, thanks to you too for the Python code. If I ever need to work on larger files I will use it.

                20
                Basic UserBasic User
                20

                  Nov 24, 2009#8

                  I use this macro posted by Mofi a LOT. I do sometimes need one that can do the opposite. Sort lines from longest to shortest. I've tried but no luck.

                  If someone has a chance could they mod this macro or make one to do that if possible?

                  Thanks in advance.

                  Using latest version of UE.

                  6,606548
                  Grand MasterGrand Master
                  6,606548

                    Dec 08, 2009#9

                    The opposite sort from longest to shortest is quite simple to achieve. You only have to replace SortAsc by SortDes to sort the lines in descending order and additionally remove the command DeleteLine below the sorting command because now at top of the file after the sort is the longest line and not a blank line. That's it.
                    Best regards from an UC/UE/UES for Windows user from Austria

                    20
                    Basic UserBasic User
                    20

                      Dec 11, 2009#10

                      Thanks again Mofi, it works great!