Change to uppercase only in 3rd column in a pipe delimited file

Change to uppercase only in 3rd column in a pipe delimited file

11
Basic UserBasic User
11

    May 02, 2010#1

    Hi everyone,

    Looking for some help with the following.

    I have a file of 30 columns separated by pipes "|" and I want to have a macro to convert to upper case only first
    letter of every word but only within 3rd column. The sample file only of 4 columns would be:

    Code: Select all

    PRODUCT No.|DESCRIPTION|SCIENCE BOOKS|Some other data|
    Product 1|Blah blah blah|PHILOSOPHIAE NATURALIS PRINCIPIA MATHEMATICA (1687)|Some other data|
    Product 2|blah blah blah|Dialogue concerning the two chief world systems (1632)|Some other data|
    Product 3|blah blah blah|De Revolutionibus Orbium Coelestium (1543)|Some other data|
    Product 4|blah blah blah|the voyage of the beagle (1845)|Some other data|
    The desired output is:
    PRODUCT No.|DESCRIPTION|SCIENCE BOOKS|SOME OTHER DATA|
    Product 1|Blah blah blah|Philosophiae Naturalis Principia Mathematica (1687)|Some other data|
    PRODUCT 2|Blah blah blah|Dialogue Concerning The Two Chief World Systems (1632)|Some other data|
    PRODUCT 3|Blah blah blah|De Revolutionibus Orbium Coelestium (1543)|Some other data|
    PRODUCT 4|Blah blah blah|The Voyage Of The Beagle (1845)|Some other data|

    My idea is:
    1-) Match the text between "|" in column 3;
    2-) Convert to lower case all text matched in column 3; (Once text is matched I would apply "ToLower" over the regexp)
    2-) Convert to lower case only first letter of every word within 3rd column. (Once text is matched I would apply "ToUpper" over the regexp)

    I now the regexp \b[A-Za-z] matches first letter of every word, but I don't know how to match only the text in 3rd column.

    I think this coud be in script, but I would like to do it through a macro to learn how using the correct Regexp, because I've been trying
    to build a good regexp, but I can get it.

    I'm trying in Perl style regexp with "\|([^|]*)\|" but this look that matches every 2 columns and not only third one.

    Any help would be very appreciated.

    Regards.

    236
    MasterMaster
    236

      May 02, 2010#2

      Well, perhaps there is another way to achieve what you want:

      You could use the command "Column - Convert to Fixed Column" to convert your pipe delimited file to equal column size, then activate Column mode, select the third column, and press F5 (Format/To Caps). Then convert the file back to pipe delimited (Column - Convert to Character Delimited).

      A single regex that matches the third column only is not possible in UltraEdit because the Perl regex engine doesn't support variable length lookbehind assertions.

      Therefore, if you need a macro solution, you'd have to step through each line like this:

      Code: Select all

      InsertMode
      ColumnModeOff
      HexOff
      Top
      Key DOWN ARROW
      Loop 0
      PerlReOn
      Find RegExp "\|[^|]*"
      Key RIGHT ARROW
      Find RegExp "[^|]*"
      ToCaps
      Key END
      IfEof
      ExitLoop
      EndIf
      Key RIGHT ARROW
      EndLoop

      11
      Basic UserBasic User
      11

        May 02, 2010#3

        Hi pietzcker,

        Thanks for your answer.

        I've tested yor macro and run fine in the sample file, but when I use with a file of 30 columns and more than 500 lines, only changes the
        first line below the headers.

        I followed the method you suggest (using F5 Format/To Caps) and works like you said, but this method isn't recorded with Ultraedit macro recorder. I using UE version 15.10.0.1017.

        Thanks in advance for any other advice.

        Best regards.

        6,603548
        Grand MasterGrand Master
        6,603548

          May 03, 2010#4

          Although macro from pietzcker worked fine on your example with UltraEdit v16.00.0.1040 with configuration setting Allow positioning beyond line end not enabled, I suggest a different macro which is much faster.

          This macro moves the third data column to start of every line and additionally inserts lots of spaces after the text of third data column. That is the only weak point of this macro. You must know how many spaces are at least required to make the shortest text in third data column longer than the longest text in the third data column. The macro as posted here inserts 80 spaces. You can double the amount of spaces or use 500 spaces. The exact number of spaces is not important. Important is only that the number of spaces are enough to make the shortest text of third data column longer than longest text in third data column.

          Next the cursor is moved to last line and positioned there on end of the inserted block with spaces. A selection in column editing mode up to start of second line selects now the data of originally third data column (and all the inserted spaces).

          On this rectangular selection the command to capitalize the words is executed.

          A final regular expression replace moves the reformatted third data column back to correct position in the CSV file and deletes the inserted spaces.

          Code: Select all

          InsertMode
          ColumnModeOff
          HexOff
          PerlReOn
          GotoLine 2 1
          Find RegExp "^(.*?\|.*?\|)(.*?\|)"
          Replace All "\2#-#                                                                                #'#\1"
          Bottom
          IfColNum 1
          Key UP ARROW
          Else
          Key HOME
          EndIf
          Find "#'#"
          ColumnModeOn
          GotoLineSelect 2 1
          ToCaps
          EndSelect
          ColumnModeOff
          GotoLine 2 1
          Find RegExp "^(.*?\|)#-# +#'#(.*?\|.*?\|)"
          Replace All "\2\1"
          BTW: Your CSV file has 30 data columns. That means that each data row has lots of characters. Before loading the CSV file you should increase the value of configuration setting Maximum columns before line wraps at Advanced - Configuration - Editor Display - Miscellaneous from default value 4096 to maximum value of 20000 to avoid soft wrapping because of too long lines. If some of the data rows have more than 20000 characters, we have a problem and we would need a more complicated solution.
          Best regards from an UC/UE/UES for Windows user from Austria

          11
          Basic UserBasic User
          11

            May 03, 2010#5

            Mofi,

            It works great. I've tested with the bigger file I have to apply this routine. It is a file with 40 columns and more than 10,000 lines (13 MB) and the work was done in 5 second!

            Many thanks for your always great support, thanks both for the help.

            Best regards