How to sort HTML URLs by their displayed text?

How to sort HTML URLs by their displayed text?

5
NewbieNewbie
5

    May 03, 2006#1

    I've read the help file and searched this forum, but I still have no idea how Advanced Sort works.

    Here's an example of my content:

    <a href="/html/filename.html">Tomatoes</a>
    <a href="/html/odd_length_name">Bananas</a>
    <a href="/html/another_weird_name.html">Apples</a>

    I need to a-z sort by the text after the "> , so that I get this:

    <a href="/html/another_weird_name.html">Apples</a>
    <a href="/html/odd_length_name">Bananas</a>
    <a href="/html/filename.html">Tomatoes</a>

    I've tried adding tabs after the "> to delimit the columns:

    <a href="/html/filename.html"> ^t Tomatoes</a>
    <a href="/html/odd_length_name"> ^t Bananas</a>
    <a href="/html/another_weird_name.html"> ^t Apples</a>

    But after that not working, I get the impression that ultraedit will only sort fixed width columns?

    Thanks for any help...

    6,686585
    Grand MasterGrand Master
    6,686585

      May 03, 2006#2

      Yes, the sort is desigend to run only on fixed columns. To sort your links according to the visible link name, a macro must be used. The following macro uses first an UltraEdit style regular expression replace to move the hyperlink of every line to the end of the line to get the link name at start of every line. Then the file is sorted at column 1. After the sort the hyperlink is moved back to start of the line and you have the result you want. The Bottom ... Top sequence makes sure that the last line of the file is terminated with a CRLF (DOS) or LF (Unix) or CR (MAC).

      InsertMode
      ColumnModeOff
      HexOff
      UnixReOff
      Bottom
      IfColNum 1
      Else
      "
      "
      EndIf
      Top
      TrimTrailingSpaces
      Find RegExp "%^(<a href=*>^)^(*^)$"
      Replace All "^2^1"
      SortAsc IgnoreCase RemoveDup 1 -1 0 0 0 0 0 0
      Find RegExp "%^(*</a>^)^(<a href=*>^)$"
      Replace All "^2^1"

      Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
      Best regards from an UC/UE/UES for Windows user from Austria

      5
      NewbieNewbie
      5

        May 04, 2006#3

        Wow, EXTREMELY helpful!

        I knew swapping the columns would be one way to do it, but I didn't want to have to get into the regex... I'm sure I'll learn a lot figuring this out as I add it to my own macro. It will be for building an HTML link-list / A-Z site map, but using special tags in the HTML pages rather than the <TITLE> tags, which is what most software uses. I'll be sure to share.

        My enormous thanks!
        L

          May 09, 2006#4

          Ok, I'm almost there with this script, but I've hit another snag.

          (This is in regex land now, so maybe the post should be moved. )

          After the code is sorted by link name text (and presumably before I put it back together) I need to put code for a letter and and an anchor reference in only the FIRST occurence of each new letter:

          Code: Select all

          apples</a><a href="html/apples.html">
          avacados</a><a href="html/avacados.html">
          bananas</a><a href="html/bananas.html">
          burritos</a><a href="html/burritos.html">
          cantalopes</a><a href="html/cantalopes.html">
          grapes</a><a href="html/grapes.html">
          
          becomes

          Code: Select all

          
          <a href="#" name="#A>A</a>
          apples</a><a href="html/apples.html">
          avacados</a><a href="html/avacados.html">
          
          <a href="#" name="#B>B</a>
          bananas</a><a href="html/bananas.html">
          burritos</a><a href="html/burritos.html">
          
          <a href="#" name="#C>C</a>
          cantalopes</a><a href="html/cantalopes.html">
          
          <a href="#" name="#G>G</a>
          grapes</a><a href="html/grapes.html">
          
          
          which then (using a variation on the rest of the macro you wrote above) would become

          Code: Select all

          <a href="#" name="#A>A</a>
          <a href="html/apples.html">apples</a>
          <a href="html/avacados.html">avacados</a>
          
          <a href="#" name="#B>B</a>
          <a href="html/bananas.html">bananas</a>
          <a href="html/burritos.html">burritos</a>
          
          <a href="#" name="#C>C</a>
          <a href="html/cantalopes.html">cantalopes</a>
          
          <a href="#" name="#G>G</a>
          <a href="html/grapes.html">grapes</a>

          This regex gets me part of the way:

          replace
          %^(?^)^(*$^)

          with
          <a href="#" name="#^1">^1</a>^p
          ^1^2

          but I don't understand how to make it conditional - only adding the new anchor link line and paragraph break IF the first letter of the line being scanned is not the same as the one before it - meaning I'd unfortunately get:

          Code: Select all

          
          <a href="#" name="#A>A</a>
          <a href="html/apples.html">apples</a>
          
          <a href="#" name="#A>A</a>
          <a href="html/avacados.html">avacados</a>
          
          <a href="#" name="#B>B</a>
          <a href="html/bananas.html">bananas</a>
          
          <a href="#" name="#B>B</a>
          <a href="html/burritos.html">burritos</a>
          
          <a href="#" name="#C>C</a>
          <a href="html/cantalopes.html">cantalopes</a>
          
          <a href="#" name="#G>G</a>
          <a href="html/grapes.html">grapes</a>

          6,686585
          Grand MasterGrand Master
          6,686585

            May 10, 2006#5

            No problem. Add the following single regex replaces at the end of the macro above. It's very simple. It inserts at the first occurence of a letter (not case-sensitive) the appropriate internal link. The solution is not very elegant, but it's working.

            The macro now needs the macro property Continue if a Find with Replace not found checked. If your HTML file is an UNIX file opened in UNIX mode (without manual or automatical converting to DOS but still saved in UNIX), use ^n instead of ^p.

            Find RegExp "%^(*>[0-9]^)"
            Replace "^p<a href="#" name="#0">0-9</a>^p^1"
            Find RegExp "%^(*>A^)"
            Replace "^p<a href="#" name="#A">A</a>^p^1"
            Find RegExp "%^(*>B^)"
            Replace "^p<a href="#" name="#B">B</a>^p^1"
            :
            and so on
            :
            Find RegExp "%^(*>Y^)"
            Replace "^p<a href="#" name="#Y">Y</a>^p^1"
            Find RegExp "%^(*>Z^)"
            Replace "^p<a href="#" name="#Z">Z</a>^p^1"
            Best regards from an UC/UE/UES for Windows user from Austria