How to find line breaks/newlines in fields of a CSV file and remove it?

How to find line breaks/newlines in fields of a CSV file and remove it?

9
NewbieNewbie
9

    Dec 05, 2008#1

    I have a situation wherein my delimiters on a file are quote marks ("). The end of every line should be a " and then a hard return. However, some lines have inserted hard returns that should not be present. I need to move those lines up to the end of the previous line. I have posted example attachments (deleted later) of what I have and what I need to get.

    In the Example 2 Search - look at end of lines 5 and 11 for where the error occurs.

    My initial thought was to find any line that does not end with a " mark which I can do with [~"]^p but then I'm stuck on how to move next line up. Any ideas or assistance would be appreciated!

    Cheers,
    Lori

    236
    MasterMaster
    236

      Dec 05, 2008#2

      UE 14.20.1, Perl regular expression

      Search for (?<!")\r\n and replace all with nothing or (probably better) a space character.

      This will find a line break that is not preceded (negative lookbehind) by a " character.

      9
      NewbieNewbie
      9

        Dec 05, 2008#3

        Fantastic! Worked like a charm. I wasn't going Perl, which may have been my problem. Thanks for the quick and easy solution. Much appreciated.

        Cheers,
        Lori

        236
        MasterMaster
        236

          Dec 05, 2008#4

          Oh, I'm sure you could have done it with a Unix/UE style regex, too (using capturing groups), but I'm not familiar with those regex styles. Plus, it's a good idea to learn Perl regexes since they are the most powerful and concise of the bunch :)

          6,681583
          Grand MasterGrand Master
          6,681583

            Dec 05, 2008#5

            With the UltraEdit regular expression engine:

            Find What: ^([~"^p]^)^p
            Replace With: ^1space

            With the Unix or Perl compatible regular expression engine:

            Find What: ([^"\r\n])\r\n
            Replace With: \1space

            Run the replace all in a loop until the search string was not found anymore because for something like

            "field1","field2 start

            continue field2
            end field2"


            only 1 run is not enough.
            Best regards from an UC/UE/UES for Windows user from Austria

            9
            NewbieNewbie
            9

              Dec 05, 2008#6

              Thanks for that. Looks like I was on the right track with the UE, just missing a couple of parameters!

              Good weekend to you both.

              6,681583
              Grand MasterGrand Master
              6,681583

                Jul 22, 2009#7

                Using the Perl regular expression search string (?<!")\r\n|(?<!\r)\n|\r(?!\n) removes:
                • a DOS line ending (carriage return + line-feed) when there is no double quote character preceding,
                • a UNIX line ending (only line-feed without a preceding carriage return),
                • a MAC line ending (only carriage return without a following line-feed).
                This regular expression can be used for most CSV files with DOS line endings.

                Please note that this regular expression does not remove DOS line breaks in double quoted CSV fields where a double quote character exists at end of a line inside the field like in following example:

                Code: Select all

                row 1 column 1,"row 1 column 2 with a ""double quoted string""
                and a line break",row 1 column 3,"row 1 column 4 containing also a ""double quoted string"" but no line break"
                row 2 column 1,row 2 column 2,row 2 column 3,row 2 column 4
                Please read also WikiPedia - Comma-separated values very carefully to understand the CSV file format.

                Normally when a line break in a field in Excel is entered with ALT+RETURN and the Excel file is saved as CSV file, Excel will double quote this field and use a line-feed without preceding carriage return character as line break, whereas the row itself is terminated with carriage return + line-feed. Example:

                "row 1 column 1 text in line 1LFtext in line 2LFtext in line 3",row 1 column 2 text,row 1 column 3 valueCRLF
                "row 2 column 1 text,"row 2 column 2 text in line 1LFtext in line 2",row 2 column 3 valueCRLF

                Using the second expression in the Perl OR expression is therefore normally enough to delete all line breaks in fields when the CSV file was opened with the configuration settings Never prompt to convert files to DOS format and Only recognize DOS terminated lines (CR/LF) as new lines for editing enabled at Advanced - Configuration - File Handling - DOS/UNIX/MAC Handling. The line-feeds in the CSV file without a preceding carriage return are displayed with this configuration setting as rectangles.

                Many thanks to pietzcker who posted all 3 expressions which I have just combined here into 1 OR expression.
                Best regards from an UC/UE/UES for Windows user from Austria

                262
                MasterMaster
                262

                  Jul 29, 2009#8

                  Maybe such a problem should ideally be solved from inside Excel ?

                  I went through these steps to remove carriage return (CR, char(13)) and line feeds (LF, char(10)) in column A of an Excel file, replacing them with blanks.

                  Using MS Excel 2002 SP3 (on WinXP) I entered this formula in cell B1:

                  =SUBSTITUTE(SUBSTITUTE(A1;CHAR(13);" ");CHAR(10);" ")

                  It is a nested SUBSTITUTE. The innermost replaces CR with blank and the outermost replaces LF with blanks.

                  Next repeat this formula in the entire B column ("Edit/Fill/Down" - ctrl-D)

                  Then select the entire B column and select copy. Then use "paste special" and tick paste as "values".

                  You can finish with deleting the A column entirely and then promoting B column to A column.

                  You now have your text all in column A but without CR and LF as these are replaced by spaces. (Note: If you do not want "double spaces" as a result of CRLF pairs, then change the SUBSTITUTE for CHAR(13) which is " " (one space) to "" (empty string).

                  (You might even be able to make this a Excel VB macro (working directly on column A) and assign a hot key for easy reuse - but I am not skilled in VB - but try googling a solution).

                  2

                    How to find wrong placed line-feeds in a CSV file and remove them?

                    Apr 27, 2017#9

                    New to UltraEdit. I tried using ^p in the find dialog to do the multiline search but that failed to find the offending rows.

                    The problem is one of the fields in a source database has some sort of character that is being interpreted as an "end of record". It always appears in the middle of the word (case as shown) AlGOrithm between the O and the r, so the truncated record ends with O and the next record begins with r. What I want is for the "end of record" to go away.

                    Is there a way to make find/replace work? macro?

                    Edit: Possibly making the logic easier: In the *.csv file all values are text and surrounded by double quotes. That means the simplest logic would be "find all rows that do not start with double quote and join with previous row". Hope this helps.

                    Thanks

                    19476
                    MasterMaster
                    19476

                      Re: How to find wrong placed line-feeds in a CSV file and remove them?

                      Apr 28, 2017#10

                      Hi David,

                      you can use Perl regex Replace All:

                      Find what:\r?\n(?!")(?=.)
                      Replace with:<leave it empty>

                      BR, Fleggy

                      2

                        Re: How to find wrong placed line-feeds in a CSV file and remove them?

                        Apr 28, 2017#11

                        Saved my hide. Thank you VERY much.