Request clarity for "Smart" compare

Request clarity for "Smart" compare

3
NewbieNewbie
3

    May 27, 2022#1

    The documentation implies that "Smart" compare treats all files as text files!

    Does Smart compare still treat files that have binary mode (based on the file extension) as binary? 

    I suggest that the documentation be clearer about this.

    Thank you

    6,672577
    Grand MasterGrand Master
    6,672577

      May 27, 2022#2

      The help of English UltraCompare for Windows v2022.20.0.22 contains on page with title Folder (Configuration - Compare) for the option Smart the description:
      UC help wrote:If this option is selected a Folder Compare will include a text compare of all files included in compared folders and will also obey ignore options configured for text compares.
      You are right that the description is not very clear regarding to comparison of binary files in the compared folder(s) on using a smart folder comparison. So I will try to clarify.

      A text file is either a Unicode encoded text file using UTF-16 or UTF-32 encoding which contain null bytes due to the encoding or files containing just bytes with a decimal value ≥ 32 (normal space) plus some bytes with a decimal value < 32 which represent control characters used in text files like 9 (horizontal tab), 10 (line-feed), 11 (vertical tab), 12 (form-feed, page break), 13 (carriage return) which includes ANSI, OEM, UTF-7, UTF-8 and ASCII Escaped Unicode encoded text files. All other files are binary files which contain usually multiple null bytes. The presence of null bytes in a file is most often used to identify a file as binary file.

      A smart folder comparison is usually used on comparing folders with text files where the user knows that the text files have different file sizes or different file contents despite same file size which do not matter for the user respectively a compiler or a script interpreter or whatever program processes these text files. The smart folder comparison is most often used by programmers, but can be useful also for other users depending on text file contents. The goal of the usage of a smart folder comparison is to find out which files with different file size or different file contents despite same file size are really different in meaning of the text inside the text files.

      What could be the reasons for different file sizes or different file contents despite same file size to treat such files nevertheless as equal?
      1. The compared text files use different character encodings like the folder on left side contains the text files with UTF-8 encoding and the folder on right side contains the text files with ANSI encoding. The file sizes can be different in this case although a text file with name X contains in both folders the same characters which are just encoded different.
      2. The compared text files use different line endings like the folder on left side contains the text files with DOS/Windows line ending (carriage return + line-feed) and the folder on right side contains the text files with Unix line ending (just line-feed). The file sizes are different in this case although a text file with name X contains in both folders the same visible characters.
      3. The compared text files have differences because of different leading or trailing spaces/tabs or different alignment spaces/tabs inside a line. For many programs processing text files such whitespace differences do not matter, but they result in different files sizes or different file contents on comparing just the bytes.
      4. The compared text files have differences caused by a different number of empty or blank lines between lines with text. The file sizes are different although the differences do not matter for the program processing the text files.
      5. The compared text files have differences caused by letters in different case. The two compared files have same file size although the file contents are different because of the letters in different case. If the text files contain code of a program or script, it matters now if the language is case-sensitive or case-insensitive if such differences in case of letters matter or do not matter.
      6. The compared text files have differences caused by different headers like a copyright information with different years in a comment block like "Copyright (c) 2012-2021" versus "Copyright (c) 2012-2022". In such a case the text files have same file size, but are binary not equal, but the difference in file contents do not matter for the compiler or script interpreter.
      In such use cases a smart folder comparison is useful to find out which files have differences which really matter for a program, script, website, etc.

      Okay, it is hopefully clear now how text files are compared using a smart folder comparison with the text compare ignore options configured by the user of UltraCompare.

      How are binary files compared on using a smart folder comparison?

      I think, binary files are usually not in the folders on which a smart folder comparison is run, but it is of course possible that the compared folders contain also binary files on which all the text ignore options cannot be really applied.

      I did following to find out the smart folder comparison behavior on a folder containing binary and text files:
      1. I created two empty folders - C:\Temp\Test1 and C:\Temp\Test2.
      2. I copied a real binary file (an executable) into both folders with name BinaryFile.bin.
      3. I copied a real text file (C source code file) into both folders with name TextFile.txt.
      4. I modified the binary file BinaryFile.bin in C:\Temp\Test1 by replacing three bytes by normal spaces (hexadecimal value 20) at a specific offset in the middle of the file using UltraEdit in hex edit mode.
      5. I modified the binary file BinaryFile.bin in C:\Temp\Test2 by replacing three bytes by horizontal tabs (hexadecimal value 09) at exactly the same offset as before using UltraEdit in hex edit mode.
      6. I saved both binary files with command Save all and made sure that both binary files have really 100% identical last modification time.
      7. I modified the text file TextFile.txt in C:\Temp\Test1 by inserting three normal spaces at an empty line using UltraEdit in text edit mode.
      8. I modified the text file TextFile.txt in C:\Temp\Test2 by inserting three horizontal tabs at exactly the same empty line as before using UltraEdit in text edit mode.
      9. I saved both text files with command Save all and made sure that both text files have really 100% identical last modification time.
      10. I started UltraCompare and ran a folder comparison with compare option Smart (includes text compare with ignore options) selected and the Text Ignore Options Ignore spaces, Ignore tabs and Ignore line terminators checked.
        The folder comparison result was that BinaryFile.bin was indicated as different and TextFile.txt as equal as expected by me.
      11. I opened the Session properties, unchecked the Text Ignore Options Ignore spaces and Ignore tabs and clicked on OK and Run.
        The folder comparison result was that BinaryFile.bin and also TextFile.txt were indicated as different as expected by me.
      Conclusion: The text ignore options are not used on comparing two files which are detected by UltraCompare as binary files. The binary files are just compared byte by byte.

      The next test case was for finding out if file comparison mode per extension affects the smart comparison. I did following:
      1. I opened the File Extensions window by clicking in menu Options on menu item Set mode per extension with using toolbar/menu mode. In ribbon mode is Set mode per extension the third command on second ribbon tab Home in last group Configure.
        I could see with Text files selected the file extension *.txt with description Text Document as that is a preconfigured extension.
        I could see with Binary files selected the file extension *.bin with description Extension was added by UltraCompare.
      2. I added *.hlp with description Help file to the list of Text files and *.tmp with description Temporary file to the list of Binary files.
      3. I copied in both folders the binary file BinaryFile.bin into the same folder with new name BinaryFile.hlp.
      4. I copied in both folders the text file TextFile.txt into the same folder with new name TextFile.tmp.
      5. I opened the Session properties, checked again the Text Ignore Options Ignore spaces and Ignore tabs and clicked on OK and Run.
        The folder comparison result was that BinaryFile.bin and TextFile.tmp were indicated as different and BinaryFile.hlp and TextFile.txt as equal.
      The binary file BinaryFile.hlp for which explicitly the text comparison mode is configured is indeed compared using a text comparison with interpreting the three different bytes in the binary file (three normal spaces with hexadecimal value 20 versus three horizontal tabs with hexadecimal value 09) with the text ignore options as defined for the smart folder comparison while the text file TextFile.tmp for which explicitly the binary comparison mode is configured is indeed compared using a binary comparison which means the difference caused by the three spaces/tabs is not ignored by UltraCompare.

      Conclusion: For files with a file extension for which the comparison mode to use is already defined, the smart comparison uses the defined file comparison mode, even if the configured file comparison mode does not match with the contents of the compared files.
      Best regards from an UC/UE/UES for Windows user from Austria

      3
      NewbieNewbie
      3

        May 27, 2022#3

        Thank you. Have a great day!