Is comparing text files with completely ignoring line terminators possible?

Is comparing text files with completely ignoring line terminators possible?

24
Basic UserBasic User
24

    Dec 10, 2010#1

    Why do the File Ignore Options (IGNORE BLANK LINES IGNORE BLANK SPACE, IGNORE CASE & IGNORE LINE TERMINATORS ) not work when comparing 2 files of the same type (text in this case) where the only difference between them is the number of spaces and/or case and/or lines?

    For example if file 1 has this:

    Code: Select all

    See the brown dog run over the Red Moon
    And File 2 has this:

    Code: Select all

      SEE   the BROWN DOG 
      run 
    over THE    RED 
    Moon
    And I have all 4 File Ignore Options checked then to me these 2 files should logically be the same since the only difference between them is 1 or more of the 4 things that I told it to ignore. And yet UltraCompare does not see these 2 files as identical even when all 4 File Ignore Options are checked.

    Does anyone know why?

    Thanks

    6,675585
    Grand MasterGrand Master
    6,675585

      Dec 11, 2010#2

      That can be answered by reading the first sentence on help page Ignore Line Terminators (Options menu):

      This item may be selected to allow the active compare to ignore line terminator differences (DOS/UNIX/MAC) when comparing files for differences.

      A text compare in UltraCompare is always a line by line comparison. It is not possible to compare text with ignoring the line terminators completely and read/compare the entire file as single line text. The option Ignore Line Terminators is just for being able to compare files with different types of line terminators like a DOS text file with carriage return + line-feed as line terminator with a UNIX text file with just line-feed as line terminator.
      Best regards from an UC/UE/UES for Windows user from Austria

      24
      Basic UserBasic User
      24

        Jan 11, 2011#3

        Mofi,

        "It is not possible to compare text with ignoring the line terminators completely and read/compare the entire file as single line text."

        UC might not be able to do this, but it is by design and therefore intentional and not that way simply because it is something that can’t be done period. I'm not disputing the problems with comparing 2 different file types, but when you have 2 files of identical type with simply different text you should be able to compare them and ignore the line especially in a program you pay for as opposed to free or shareware.

        While this limitation in UC may seem like a dumb inquiry it is not and I doubt I am the only person to every wonder why UC cannot do this. One would think an application for comparison would be able to do this kind of thing.

        Thank you for your time

        901
        MasterMaster
        901

          Jan 12, 2011#4

          YSLGuru wrote:Why do the File Ignore Options (IGNORE BLANK LINES IGNORE BLANK SPACE, IGNORE CASE & IGNORE LINE TERMINATORS ) not work...
          Actually, I can tell you why. Most compare tools, including UltraCompare, do a line-by-line comparison and will not recognize a single line on one side being spread over multiple lines on the other side. The "IGNORE LINE TERMINATORS" doesn't actually mean to ignore the line terminators (pretend that they are not there). It means, "Ignore different kinds of line terminators". In other words, if one file has DOS line terminators (CRLF) and another file has Unix line terminators (LF without the CR), then UltraEdit will not consider the files to be different simply because of the different line terminators.

          1
          NewbieNewbie
          1

            Jan 12, 2011#5

            So what you are saying is that the same document where word wrapping takes place in different places in the line will always have differences on every line. To get a better compare each paragraph should be unwrapped. That's the only solution?

            6,675585
            Grand MasterGrand Master
            6,675585

              Jan 19, 2011#6

              To compare text files paragraph by paragraph it is necessary to remove line breaks inside the paragraphs, so that the paragraphs are not only visually for humans a paragraph, but are also a paragraph for word processing, text editing and text compare applications. Text files with line breaks inside paragraphs are often produced by copying text from word processing applications (MS WORD, OpenOffice Write) or from PDF Readers into a text file or saving *.doc, *.pdf, etc. as text file, so that paragraphs in text files look like the soft wrapped text in the *.doc, *.pdf, etc.

              That can be done in UltraEdit quite easily. Open both files (or a copy of both files), execute first Format - Trim Trailing Spaces to get rid of whitespaces at end of lines and visually empty lines, next run Format - Convert CR/LFs to Wrap. Additionally you might want to remove multiple spaces/tabs in the paragraphs which you can do with a regular expression replace as posted at Remove double or extra spaces. Last at end of file the single space should be replaced by a line terminator to terminate also the last paragraph by pressing Ctrl+End, Shift+LEFT ARROW and RETURN.

              I wrote quickly an UE/UES macro for that job with some additional commands. The macro property Continue if search string not found must be checked for this macro to convert a bad formatted text file with line breaks inside paragraphs to a well formatted text file.

              Code: Select all

              InsertMode
              ColumnModeOff
              HexOff
              UnixReOff
              Top
              TrimTrailingSpaces
              ReturnToWrap
              TabsToSpaces
              Find RegExp "  +"
              Replace All " "
              Find RegExp "% +"
              Replace All ""
              Bottom
              IfColNumGt 1
              Key LEFT ARROW
              IfCharIs 32
              Delete
              EndIf
              InsertLine
              EndIf
              Top
              Of course that UltraEdit/UEStudio macro is useful only for files containing real text and not for files containing program/script code or lists.
              Best regards from an UC/UE/UES for Windows user from Austria

              24
              Basic UserBasic User
              24

                Dec 06, 2011#7

                Thanks for the suggestion Mofi. I realized I'm very slow to reply back, but I am just getting back to delaying with this quirk using UC.


                Thanks again