[html] convert all links to lowercase

[html] convert all links to lowercase

33
Basic UserBasic User
33

    Jul 20, 2006#1

    Hi
    I'm the new webmaster of a website (not done by me) and I have to correct some BROKEN links DUE to the fact that they are in uppercase and the server (linux) is case sensitive.

    Example of dead link:

    Code: Select all

    <a href ="../TEST/file.htm" class="super" target="_blank">
    I want to find them and correct them manually.
    My regular expression is:

    Code: Select all

    <a\shref\s=\s"[a-z0-9]*[A-Z]+"
    but it is not working please may you help me to correct it?

    Regards

    2
    NewbieNewbie
    2

      Jul 21, 2006#2

      First, make a backup using tar or whatever.

      Since you're using linux, you should have Perl available. Below is a shell command which will:
      1.) find all plain files, with the extension .html or .htm, in and below the directory /home/siteroot
      2.) escape any non-letter characters, such as spaces, in the pathname,
      3.) filter out those files which do not contain 'href',
      4.) do an in-place, one-liner, Perl search-and-replace on <a href... tags and <link ...href... tags, but, only turning to lowercase that which lies between the href=(with optional quote) and the first occurence of either '>' or '?' or another quote

      Note: a backup file, with the extension .bak, is created for each file

      find /home/siteroot -type f \( -name '*.html' -o -name '*.htm' \) | perl -lne "print quotemeta" | xargs grep -li href | xargs perl -i'.bak' -pe 's/(<(?:a|link)(?:(?!href).)+href\s*=\s*[\x27"]?)([^>?\x27"]+)/$1\L$2/igs'

      <eom>

      33
      Basic UserBasic User
      33

        Jul 24, 2006#3

        Thanks for your answer
        but I use LINUX only for the SERVER - I want the regular expression with ULTRAEDIT not with SED please

        6,608550
        Grand MasterGrand Master
        6,608550

          Jul 24, 2006#4

          Seems the Perl users are all on holidays. So I, the UltraEdit style user, was forced to find the Perl regex.

          Following Perl regular expression replace would make every <a href="..." lowercase:

          Find: <a[ \t\r\n]*href[ \t\r\n]*=[ \t\r\n]*"(.*)"
          Replace: <a href="\L\1\E"

          You should be able to modify it for <img src="..." and <link rel="..." href="..." alone.

          Test input:

          Code: Select all

          <a href ="../TEST/file.htm" class="super" target="_blank">
          
          <a href= "../TEST/file.htm" class="super" target="_blank">
          
          <a href="../TEST/file.htm" class="super" target="_blank">
          
          <a href = "../TEST/FILE.HTM" class="super" target="_blank">
          
          <a
            href ="../TEST/file.htm" class="super" target="_blank">
          
          <a href =
          "../TEST/file.htm" class="super" target="_blank">
          Output of the replace:

          Code: Select all

          <a href="../test/file.htm" class="super" target="_blank">
          
          <a href="../test/file.htm" class="super" target="_blank">
          
          <a href="../test/file.htm" class="super" target="_blank">
          
          <a href="../test/file.htm" class="super" target="_blank">
          
          <a href="../test/file.htm" class="super" target="_blank">
          
          <a href="../test/file.htm" class="super" target="_blank">
          But maybe follwing would be better:

          Find: [ \t\r\n]*href[ \t\r\n]*=[ \t\r\n]*"(.*)"
          Replace: spacehref="\L\1\E"

          Why? It works also for href="..." in <link> and for <a name="..." href="...">.
          Best regards from an UC/UE/UES for Windows user from Austria