Carriage return only character hidden in mainly DOS text file

Carriage return only character hidden in mainly DOS text file

2

    Dec 06, 2017#1

    Hi guys,

    I have a file which has mainly got normal DOS line breaks in it (CRLF), but from time to time there are carriage return only characters (CR). 

    The problem is that UltraEdit now (I was pretty sure it didn't in the past) just ignores these characters and assume it is not a line break and just doesn't display it at all.

    If I open the same file in Notepad++ I can see the carriage return quite clearly.

    So, is there a way I can change the view options so I can see the CR character, and not just the CRLF characters in DOS edit mode?

    See attached file for my example to compare to Notepad++

    Cheers,
    Paul
    Ult.png (49.07KiB)

    6,675585
    Grand MasterGrand Master
    6,675585

      Dec 06, 2017#2

      I suppose you mean UltraEdit for Windows v24.20.0.44 with UltraEdit now.

      The default settings at Advanced - Settings or Configuration - File handling - DOS/Unix/Mac handling are:
      • Default file type for new files: DOS
      • Unix/Mac file detection/conversion: Never prompt to convert files to DOS format
      • Only recognize DOS terminated lines (CR/LF) as new lines for editing: not checked
      • Save file as input format (Unix/Mac/DOS): checked
      • Status bar shows original line terminator format (on disk): not checked
      I can see with these settings on opening a file with all 3 types of line terminators with UE v24.20.0.44:

      The line ending with CRLF (DOS) and line ending displayed with .
      The line ending with only LF (UNIX) and line ending displayed with ¬.

      But the line ending with only CR  (MAC) is not recognized as separate line. This line is merged on display with the next line. The carriage return is not displayed at all. But on making a selection where the carriage return is included or on selecting a text in line right to carriage return, strange things happen as suddenly a space is displayed somewhere in selection. Well, this issue with display of unprintable characters in selected text is fixed in UE v24.20.0.51.

      The line ending with only LF is also merged on display with next line without showing LF character at all on enabling option Only recognize DOS terminated lines (CR/LF) as new lines for editing. There is no issue on making a selection including LF or on selecting a text right to LF in the line.

      The display is a bit different with same default settings for DOS/Unix/Mac handling with UE v22.20 and UE v23.20. The DOS and UNIX terminated lines are displayed like in UE v24.20. The MAC terminated line is also with UE v22.20 and v23.20 merged on display with the next line in file. But the carriage return is at least displayed like a space. The code value is 13 decimal of this whitespace character displayed like a space as I can see on using command Character Properties on this character. There is no issue on making a selection including carriage return or the text in same line after the carriage return with UE v22.20 and v23.20.

      The change on display of a carriage return only in a file with mixed line endings happened on UE v24.00 according to my tests on which UltraEdit became a full Unicode aware application which resulted in lots of changes in code.

      I have not analyzed further with even older versions of UltraEdit if only CR in a file with mainly CRLF line endings is displayed ever different in the past.

      I agree that the behavior with interpreting a single carriage return like a normal whitespace character in a line is not what a user expects. And of course not displaying the carriage return at all and the strange display on making a selection is definitely a wrong behavior which must be reported by email to IDM support.

      I could see even with UE v22.20 and v24.20 and with having Automatically convert to DOS format selected that a file with first line ending with carriage return + line feed, second line ending with only line-feed, third line ending with only carriage return and all other more than 2000 lines in my test file ending with CR+LF results in ignoring the single carriage return at end of third line and converting all 0D 0A to 0D 0D 0A which is definitely a wrong behavior too.

      So at the moment I can only agree that handling of a file with mixed line endings in file is not working as expected from UE v22.20 to v24.20 depending on configuration. I have reported all I could see with my test file (changes.txt with 5 lines inserted at top with second and third line ending as described) by email to IDM support.

      I have no other suggestion at the moment for editing a file with some lines ending with only CR then using a regular expression replace to find them and replace them with nothing, a space or CR+LF. The Perl regular expression search string \r(?!\n) finds a carriage return with next character not being a line-feed.
      Best regards from an UC/UE/UES for Windows user from Austria

      20
      Basic UserBasic User
      20

        Dec 21, 2017#3

        My suggestion would be to perform a character replacement to rationalize the line endings by replacing bare CR and LF characters with DOS newline sequences (CR/LF). Though it's been a while since I did so, I recall that I used a Perl regular expression along the lines of ".\r." to find carriage returns, and ".\n." to find line feeds. For both, the replacement can be the standard UltraEdit line break token, ^p.

        6,675585
        Grand MasterGrand Master
        6,675585

          Dec 22, 2017#4

          TXWizard, ^p is interpreted as carriage return + line-feed only in a non regular expression or an UltraEdit regular expression search/replace. It should not be used in a Perl or Unix regular expression. Well, UltraEdit replaces ^p in a Perl/Unix regular expression replace string before execution by the two bytes with hexadecimal values 0D 0A and so ^p works by chance also in replace string of a Perl/Unix regular expression replace string. But ^p cannot be used in a Perl/Unix search string with meaning CR+LF. And your search strings are not good because . in a Perl regular expression string matches any character except newline character and so the suggested replaces would delete two characters. Further the search expressions suggested by you would not work for multiple CR or multiple LF in series in a file containing mainly CR+LF.

          What is really working to replace a line-feed without a carriage return before or a carriage return without a line-feed afterwards is a Perl regular expression searching for (?<!\r)\n|\r(?!\n) and replacing all found occurrence with \r\n.

          By the way: Before I reported this issue to IDM support by email I ran tests also with older versions down to UE v11.20b. CR without LF afterwards in a file containing mainly CR+LF was never interpreted by any version of UltraEdit has line ending. Only LF without CR before in a file containing mainly CR+LF was always recognized by any version of UltraEdit. A CR without LF in a file containing mainly CR+LF is very, very, very rare.
          Best regards from an UC/UE/UES for Windows user from Austria

          20
          Basic UserBasic User
          20

            Dec 22, 2017#5

            Thanks for correcting me on that oversight. As I said, it's been a good while since I've needed to perform such a cleanup. Moreover, you are absolutely right about bare line feed characters in the wild being very rare. Unfortunately, bare carriage returns are much more so, especially in files that regularly pass back and forth between Unix and Windows hosts over FTP.

            2

              Jan 19, 2018#6

              Hi Guys,

              Thanks for your reply, and yes I can confirm I am using "UltraEdit Text/Hex Editor (x64) Version 24.20.0.51"

              The problem with trying any replace command is that it doesn't detect the CR character at all, it is ignored, so it doesn't find it to replace it.

              This probably doesn't happen too often for me, and for now I can use Notepad++ to get around the issue (as it does show in there) and will pass this information onto the UltraEdit Team.

              Cheers,
              Paul

              P.S. I tried to attach the TXT file as a sample to this chat but was unable to.

              6,675585
              Grand MasterGrand Master
              6,675585

                Jan 19, 2018#7

                Paul, the Perl regular expression posted by me above worked fine. I did following for reproducing this issue and the Perl regular expression replace:
                1. I copied changes.txt in UltraEdit program files directory to C:\Temp.
                2. I started UltraEdit with C:\Temp\changes.txt.
                3. I changed the configuration setting for Unix/Mac file detection/conversion to Never prompt to convert files to DOS format because I usually use Automatically convert to DOS format and have Save file as input format (Unix/Mac/DOS) and Status bar shows original line terminator format (on disk) checked at Advanced - Settings or Configuration - File handling - DOS/Unix/Mac handling.
                4. I switched to HEX edit mode and modified on history lines of UE v24.20 starting with - Addressed the line endings by deleting on some lines 0D and on some lines 0A, saved and closed the file.
                5. I re-opened the file from recently opened files list resulting in getting the lines ending with just 0A displayed correct while the lines ending with just 0D are displayed wrong as carriage return not interpreted as line ending and additionally not displayed at all although the character is there as it can be seen on positioning the caret at end of a line ending with just carriage return and moving the caret with key RIGHT ARROW one character to right on which the caret position does not change on first key press, but on the second one.
                6. I moved the caret with Ctrl+Home to top of file and started the Perl regular expression replace as posted in my previous post. UltraEdit finds \r without \n following and \n without \r before and replace them with \r\n.
                I reported this issue by email to IDM supports with similar steps to reproduce and IDM support could reproduce this issue and added the report to their issue database for investigation by a developer.

                PS: *.txt files cannot be attached to a post, but *.zip and *.rar files if not being too large. So if you want to share your file with us, compress it into a ZIP or RAR archive file which you can attach to your next post.
                Best regards from an UC/UE/UES for Windows user from Austria