Using the Perl regular expression search string
(?<!")\r\n|(?<!\r)\n|\r(?!\n) removes:
- a DOS line ending (carriage return + line-feed) when there is no double quote character preceding,
- a UNIX line ending (only line-feed without a preceding carriage return),
- a MAC line ending (only carriage return without a following line-feed).
This regular expression can be used for most CSV files with DOS line endings.
Please note that this regular expression does not remove DOS line breaks in double quoted CSV fields where a double quote character exists at end of a line inside the field like in following example:
Code: Select all
row 1 column 1,"row 1 column 2 with a ""double quoted string""
and a line break",row 1 column 3,"row 1 column 4 containing also a ""double quoted string"" but no line break"
row 2 column 1,row 2 column 2,row 2 column 3,row 2 column 4
Please read also
WikiPedia - Comma-separated values very carefully to understand the CSV file format.
Normally when a line break in a field in Excel is entered with ALT+RETURN and the Excel file is saved as CSV file, Excel will double quote this field and use a line-feed without preceding carriage return character as line break, whereas the row itself is terminated with carriage return + line-feed. Example:
"row 1 column 1 text in line 1
LFtext in line 2
LFtext in line 3",row 1 column 2 text,row 1 column 3 value
CRLF
"row 2 column 1 text,"row 2 column 2 text in line 1
LFtext in line 2",row 2 column 3 value
CRLF
Using the second expression in the Perl OR expression is therefore normally enough to delete all line breaks in fields when the CSV file was opened with the configuration settings
Never prompt to convert files to DOS format and
Only recognize DOS terminated lines (CR/LF) as new lines for editing enabled at
Advanced - Configuration - File Handling - DOS/UNIX/MAC Handling. The line-feeds in the CSV file without a preceding carriage return are displayed with this configuration setting as rectangles.
Many thanks to pietzcker who posted all 3 expressions which I have just combined here into 1 OR expression.