CSV file display differently in UE vs. MS Excel?

CSV file display differently in UE vs. MS Excel?

3
NewbieNewbie
3

    May 20, 2008#1

    Hello everyone:

    Is there a reason why a CSV file shows up differently in Excel and UE - the only possible catch I can think of is that there are some "paragraph" things in the second column. In UE, all these paragraph signs (not sure which one it is) will show up as a new line... I am always very confused about these things (in Perl regex): \p \n \r maybe this is the difference?

    Is there something I can do so that I can get UE to display the file in a similar way (or maybe just replace those "paragraph" things in those cells with a space, so that it'll just appear as one line?

    I have been struggling with this for a long time... and I'll be really grateful for any suggestion you can give me.

    Thank you!

    Richard

    119
    Power UserPower User
    119

      May 20, 2008#2

      richardl wrote:Is there a reason why a CSV file shows up differently in Excel and UE
      Yes. UE shows you the raw text. Excel parses the CSV and attempts to do an "intelligent" interpretation of the data. (It's easy to screw up your file in Excel, even if you just open and re-save it. Fields like "007" will be changed to "7". Empty trailing fields will be deleted. It's all very irritating.)

      The reason that Excel treats the embedded newlines as being part of a single cell is because the entire field is quoted. The reason UE doesn't is because it's a text editor. There's no way to make UE display the file in a similar manner to Excel without parsing it and rendering it as a plain-text table. At that point it wouldn't be a valid CSV file anymore.

      6,604548
      Grand MasterGrand Master
      6,604548

        May 20, 2008#3

        You should read the Wikipedia article about Comma-separated values. Your specific problem is that you have cells with line breaks. It is allowed in CSV files to embed a line break. In such cases a table row spreads over multiple lines in the CSV file. But text editors interpret a line break always as a line break. Text editors don't know that when a line break exists within "..." that it is a line break of the text inside a cell and not the line break after last cell of a row and this line break should be ignored when displaying the CSV file.

        However, you can even see CSV files with embedded line breaks in a text cell correct, if you know some more details. MS Excel saves spreadsheets as CSV always with 0D 0A (carriage return + line feed) as terminator of a table row (= line) which is also called DOS line ending. But MS Excel saves line breaks inside a cell with 0A (line feed) only which is normally used for Unix files. So there is a difference between end of a table row and end of a line of a cell text. Normally such files are interpreted as mixed DOS/Unix files and most programs like UltraEdit or a browser when downloading a CSV file automatically corrects the 0A without the 0D by inserting 0D so that the whole file contains only DOS line endings. Such automatic corrections are also the reason why sample text files should be always packed with ZIP, RAR, etc. and the archive should be uploaded for the others and not the plain text file.

        Okay, how does this difference in saving of "line endings vs. line break" may help you? UltraEdit has at Advanced - Configuration - File Handling - DOS/UNIX/MAC Handling the setting Only recognize DOS terminated lines (CR/LF) as new lines for editing. If you enable this setting and have additionally selected above Never prompt to convert files to DOS format you will see now the first table row also as a single line in UltraEdit. You will notice that this single 0A is displayed in the text with a rectangle. You can now use a regular expression replace to convert those line feeds without a preceding carriage returns to a normal text. I use for example following:

        Search: ^([~^r]^)^n
        Replace: ^1\n

        This regular expression in UltraEdit syntax replaces the single line feed (hex code 0A) by the text \n. When I now see a \n in the text I know here is a line break in the table cell.
        Best regards from an UC/UE/UES for Windows user from Austria

        3
        NewbieNewbie
        3

          May 20, 2008#4

          Thank you mjcarman and mofi! I'll give it a try. I never quite understood what the difference is between HEX/DOC/MAC but should probably find out more about it. Many thanks!

          Richard
          Running UE15.00 on Windows 7.