Tapatalk

UC - Two CSV files comparison - phantom data columns issue

UC - Two CSV files comparison - phantom data columns issue

5
NewbieNewbie
5

PostDec 04, 2025#1

I am not new to UltraCompare but I cannot recall an issue with phantom columns in two CSV files compared in table mode.
When looking at the two files in Microsoft Excel they both have the exact same number of columns A to Z with the column headings matching in both files.
I have researched and found that non-printable characters can cause the issue, but I cannot find an easy method to find the non-printable characters. I know UltraEdit in hex mode shows all characters but that is not practical when the spreadsheet has 55000 rows of data. Or at least I have not stumbled onto a nifty way to search for non-printable characters in hex mode.
I have about twenty-three sets of files to compare in table mode and all of them contain phantom columns in various locations.
In simple cases where there is just one blank column, setting UltraCompare to ignore a column will align both file columns but of course this causes the highlighting to not happen.
I tried searching “non-printable characters,” “invisible columns,” “column alignment issues,” etc. and found nothing here.
If someone can refer me to a prior post or give a suggestion that deals with this issue I would appreciate it.

6,823625
Grand MasterGrand Master
6,823625

PostDec 05, 2025#2

There could be used in UltraEdit a Perl regular expression Find in Files with the search expression [\x00-\x08\x0B\x0C\x0E-\x1F]+ to find one or more control characters with the exception of horizontal tab (hexadecimal code 09), line feed (hexadecimal code 0A) and carriage return (hexadecimal code 0D).

Are the CSV files 100% valid according to the CSV specification as described by the Wikipedia article Comma-separated values?

This historical version of the Wikipedia article has a chapter Basic rules explaining the CSV specification on examples.

5
NewbieNewbie
5

PostDec 05, 2025#3

Thanks for responding so fast Mofi.

These files are produced by a web server converting XML to CSV using XSLT with output defined as Text UTF-8. Both files have header data in columns A to K.

Drop File Left.csv on the left of UltraCompare and File Right.csv on the ride side. You will see what I am calling a phantom column E on the left for File Left.csv and phantom columns G and H on File Right.csv using Table Mode.

ZIP file created with 7-Zip attached (removed by Mofi).

Thank you again.

PostDec 05, 2025#4

I searched all 46 CSV files I am comparing for Regular expressions / Perl [\x00-\x08\x0B\x0C\x0E-\x1F]+ and UltraEdit found nothing.
I read the WIKI reference you supplied.
Not all files but a few do have the double quotes around field values.
The sample files I provided do not contain double quotes, but almost all files in which the field values are enclosed in double quotes still generate phantom columns.

6,823625
Grand MasterGrand Master
6,823625

PostDec 05, 2025#5

Thanks for the ZIP archive with the two CSV files. I can see the same as you with those two files using table comparison mode of UltraCompare 2024.1.0.5. I deleted the ZIP archive from your post to keep the data private.

I suggest to click in UltraCompare on ribbon tab Home in second group Current Session on first item Set compare options. In toolbar/menu mode click in menu Session on the menu item Session properties… In the opened Current Session Properties dialog check for both CSV files the option Use row 1 for column headers and check the option Ignore separator in quotes for correct interpreting line 237 in both files with "Finance, Planning & Analysis","50000057 Finance, Planning & Analysis". Then click on button OK and Run.

I have not yet found out in ten minutes why does the data column determination without checked Use row 1 for column headers resulting in analyzing the entire file produce the phantom data columns. I will look for the cause on weekend and report here the result of my further investigation of the issue.

The two CSV files look absolutely valid and do not contain control characters.

5
NewbieNewbie
5

PostDec 05, 2025#6

Mofi, your suggestion worked.

I was under the gun to do a presentation this morning, and your advice really helped in preventing me from explaining to a non-technical audience why columns are not aligning. Sincere thanks.

I did find a sort of workaround for the issue.
Initially, I wanted to compare raw files with no modifications but some XSLT code is crummy in that the output contained thousands of rows of data with nothing but fifty commas after the last row with good data, causing UltraCompare to strain in the compare session. I discovered that Excel will automatically clean up CSV non-printable characters upon opening a CSV file. To start I was just opening the files in Excel to simply look at them and then closed without saving to preserve the raw file.

This morning I started saving the files as MS-DOS CSV, and this took care of a lot of my phantom column issues. So, the Excel workaround and your solution is getting me 80% to 90% there.
However, using your method plus the Excel workaround is not resolving all the phantom columns so I intend to push on until I resolve 100%. ( I have over one hundred identical processes running on two separate servers where I need to compare the output and explain the differences.)

I will report back on how it turns out.

Thanks again Mofi.

6,823625
Grand MasterGrand Master
6,823625

PostDec 07, 2025#7

I analyzed the table comparison with the provided two CSV files modified and saved by me in several variants all with DOS/Windows instead of Unix line endings. I could not find out the reason for the empty phantom data columns. I created therefore an issue report as it can be read below and sent it by email to support of UltraEdit, Inc. The RAR archive file is not attached here to keep the data in the CSV files private.

Subject: UC - Empty phantom data columns on table comparing two CSV files

There is an issue with empty phantom data columns on comparing two CSV files with table comparison depending on the data with 64-bit UltraCompare Prof. 2024.1.0.5 as detected first by an UltraCompare user asking for help in the UltraCompare Text Compare and Merge forum, see (url of this topic).

The (not here) attached RAR archive file CSV_comparison.rar contains several CSV files and screenshots for the explanation of this issue which I analyzed without finding out the reason.

There is executed a table comparison with a checked option Ignore separators in quotes in Current Session Properties dialog window as captured by me to the attached image CSV_comparison_properties.webp. This option must be checked on comparing the two CSV files File_Left_1.csv and File_Right_1.csv because of line 237 containing two additional commas in double quoted values in the data columns A and B. The line 237 has therefore in total 12 commas while all other lines have 10 commas and no double quoted field values. However, the line 237 is of no importance for the empty phantom data columns issue as it turned out with further tests.

There can be seen on image CSV_comparison_1.webp that there is on left side the data column E which does not contain any data in any row and on right side the data column H also not containing any data in any row. The reason for the two empty phantom data columns is unclear.

There was created by me the files File_Left_2.csv and File_Right_2.csv by deleting in both files all lines from the line 22 to the end of each file and saving both CSV files. In UltraCompare is replaced in both file names the number 1 by 2 and refreshed the table comparison. The result is captured to the image CSV_comparison_2.webp. The table comparison is now with the eleven data columns without any empty phantom columns.

I created next the files File_Left_3.csv and File_Right_3.csv by keeping from File_Left_1.csv and File_Right_1.csv the first 22 lines, deleting in both files all lines from the line 23 to end of each file and saving both CSV files. In UltraCompare is replaced in both file names the number 2 by 3 and refreshed the table comparison. The result is captured to the image CSV_comparison_3.webp. The table comparison is now again with empty phantom columns. There is again an empty phantom data column E on left side. On right side is now an empty phantom data column G instead of H as on comparing the CSV files File_Left_1.csv and File_Right_1.csv. This comparison behavior is again not understandable for me on looking on the 21 data rows in comparison to the former comparison with 20 data rows.

Does the issue occur always on having more than 20 data rows?

No, as I could find out with the next two CSV files comparisons.

The CSV files File_Left_4.csv and File_Right_4.csv were created by
  • opening the two files File_Left_1.csv and File_Right_1.csv,
  • deleting in both files all lines from line 22 to end of the file,
  • selecting and copying the lines 2 to 21 (the 20 data rows),
  • pasting the 20 lines twice at the end of each file,
  • saving both CSV files as File_Left_4.csv and File_Right_4.csv.
The table comparison is fine in UltraCompare with File_Left_4.csv and File_Right_4.csv without phantom data columns, see the image file CSV_comparison_4.webp.

The CSV files File_Left_5.csv and File_Right_5.csv were created next by
  • opening the two files File_Left_1.csv and File_Right_1.csv,
  • deleting in both files the lines from line 22 to line 49,
  • keeping line 50 (now line 22) and deleting all other lines below,
  • saving both CSV files as File_Left_5.csv and File_Right_5.csv.
The table comparison is fine in UltraCompare with File_Left_5.csv and File_Right_5.csv without phantom data columns, see the image file CSV_comparison_5.webp.

Last were compared again in UltraCompare the files File_Left_1.csv and File_Right_1.csv with opening again the session properties, checking for both files the option Use row 1 for column headers, and clicking on button OK and Run. The image file CSV_comparison_header.webp shows that the two initial CSV files are compared now also without empty phantom data columns.

That is all very strange. I think, only an UltraCompare developer can find out with a debugger why there are empty phantom data columns on comparing File_Left_1.csv and File_Right_1.csv and File_Left_3.csv and File_Right_3.csv without checked options Use row 1 for column headers and why there is a phantom data column H on comparing *_1.csv and a phantom data column G on comparing *_3.csv on right side with in both use cases a phantom data column E on left side.

The solution is at least for the CSV files in the attached RAR archive using the option Use row 1 for column headers for both compared CSV files.

5
NewbieNewbie
5

PostDec 08, 2025#8

Thank you Mofi.
You are going over and above.