Reformatting a tab delimited text file

Reformatting a tab delimited text file

MikeW

    Jul 19, 2014#1

    Really didn't know how to title this thread.

    The following is how I am going to receive a tab delimited file. No hope for getting it properly formatted. This is all in one line. There are any number of names separated by a semi-colon in one cell, followed by corresponding addresses also separated by a semi-colon. In the sample below, there are 4 names, 4 addresses:

    GPO; Upper Dorset Street Post Office; Parnell Street Post Office; Summerhill Post Office <TAB> O'Connell Street Lower, Dublin 1; 58 Upper Dorset Street, Dublin 1; 97 Parnell Street, Dublin 1; Summerhill Parade, Dublin 1

    I highlighted where there is a single tab in the line. There are a minimum of 200 lines like this up to 400 or so lines.

    I need to figure out a F/R or a script that will reformat this list like the following:

    GPO <TAB> O'Connell Street Lower, Dublin 1
    Upper Dorset Street Post Office <TAB> 58 Upper Dorset Street, Dublin 1
    Parnell Street Post Office <TAB> 97 Parnell Street, Dublin 1
    Summerhill Post Office <TAB> Summerhill Parade, Dublin 1

    In reality, there is no space before/after the tab but I thought it provided a little clarity for viewing.

    The closest I can come is to have a block of Names followed by a block of addresses, which works once back into Excel I can manually move them to their appropriate place. I was hoping that I didn't need to do it that way though.

    Thank you for any pointers on how to proceed.

    Mike

    6,675585
    Grand MasterGrand Master
    6,675585

      Jul 19, 2014#2

      This is no problem with using a small UltraEdit macro:

      Code: Select all

      InsertMode
      ColumnModeOff
      HexOff
      Top
      PerlReOn
      Find MatchCase RegExp " *([\t;]) *"
      Replace All "\1"
      Loop 0
      Find MatchCase RegExp ";([^\r\n\t;]+)(\t.*);(.+)$"
      Replace All "\2\r\n\1\t\3"
      IfNotFound
      ExitLoop
      EndIf
      EndLoop
      The first Perl regular expression Replace All just removes all spaces left or right of a semi-colon or a tab character.

      More interesting is the second Perl regular expression Replace All executed in a loop until nothing replaced anymore as this makes the main job.

      ;([^\r\n\t;]+)\t(.*);(.+)$ finds
      • a semi-colon which is removed during the replace,
      • and a string not containing a carriage return, line-feed, horizontal tab or semi-colon which is marked for backreferencing by \1 in replace string,
      • the next character must be a horizontal tab character which results in not matching any string between two semi-colons or between last semi-colon and end of line and therefore matching always only the string between last semi-colon before the horizontal tab in the line and the horizontal tab character by the preceding expression,
      • the string after the horizontal tab character before last semi-colon in the line marked with the tab character before as second string for backreferencing by \2
      • the last semi-colon in the line which is removed during the replace,
      • and last the string after last semi-colon in the line to end of line marked as third string for backreferencing by \3.
      The matching pair of strings is moved to a new line below with an additionally inserted tab character between with the replace string.

      So the macro converts your example line first to:

      GPO;Upper Dorset Street Post Office;Parnell Street Post Office;Summerhill Post OfficetabO'Connell Street Lower, Dublin 1;58 Upper Dorset Street, Dublin 1;97 Parnell Street, Dublin 1;Summerhill Parade, Dublin 1

      The file contains after first run of the second regular expression:

      GPO;Upper Dorset Street Post Office;Parnell Street Post OfficetabO'Connell Street Lower, Dublin 1;58 Upper Dorset Street, Dublin 1;97 Parnell Street, Dublin 1
      Summerhill Post OfficetabSummerhill Parade, Dublin 1

      The file contains after second run of the second regular expression:

      GPO;Upper Dorset Street Post OfficetabO'Connell Street Lower, Dublin 1;58 Upper Dorset Street, Dublin 1
      Parnell Street Post Officetab97 Parnell Street, Dublin 1
      Summerhill Post OfficetabSummerhill Parade, Dublin 1

      And the file contains after third run:

      GPOtabO'Connell Street Lower, Dublin 1
      Upper Dorset Street Post Officetab58 Upper Dorset Street, Dublin 1
      Parnell Street Post Officetab97 Parnell Street, Dublin 1
      Summerhill Post OfficetabSummerhill Parade, Dublin 1

      The fourth run of second regular expression Replace All does not replace anything and the loop exits.

      Of course you can run the 2 replaces also manually by pressing Replace All button on second regular expression until the message is displayed that nothing could be found.
      Best regards from an UC/UE/UES for Windows user from Austria

      MikeW
      MikeW

        Jul 19, 2014#3

        Works a treat. Thank you very much, Mofi.

        Mike