Why is Unicode (UTF-8) encoding info saved for a file only when set via status bar?

Why is Unicode (UTF-8) encoding info saved for a file only when set via status bar?

6

    May 25, 2016#1

    Hi!

    I just noticed when I save a document using UTF-8, close and reopen that same document, the encoding reverts to 1252 ANSI. Only when I change the encoding type in the bottom bar and save again it will stick.

    What gives?

    I'm using UE 20.00.

    Thank you!

    6,675585
    Grand MasterGrand Master
    6,675585

      May 26, 2016#2

      If a text file with UTF-8 encoding is saved without byte order mark (BOM), and the file does not contain any character with a code value greater decimal 127 (in first 64 KB on using UltraEdit for Windows < v24.10 or UEStudio < v17.10) which means a non ASCII character, the UTF-8 encoded file is binary identical to an ASCII file with same content. So in other words with only ASCII characters in a file and no BOM, UltraEdit and any other text editor must assume the file is encoded in ASCII as there is nothing which declares the file content as UTF-8 encoded.

      For more details about encoding see the IDM power tip Working with Unicode in UltraEdit/UEStudio and the following forum topics:
      Best regards from an UC/UE/UES for Windows user from Austria

      6

        May 26, 2016#3

        Thanks, Mofi. As always, you're a great resource.

        Your links mostly answer my question, but I'm still wondering why my documents open up as UTF-8 when changing the encoding in the bottom bar and saving, but when I merely Save As... in UTF-8, they reopen as ANSI.

        Any ideas?

        Thank you again!

        6,675585
        Grand MasterGrand Master
        6,675585

          May 26, 2016#4

          The configuration settings
          • Write UTF-8 BOM header to all UTF-8 files when saved and
          • Write UTF-8 BOM on new files created within this program (if above is not set)
          determine if the file is saved with or without BOM after using
          • File - Conversions - ASCII to UTF-8 (Unicode Editing) or
          • File - Conversions - UNICODE/UTF-8 to UTF-8 (Unicode Editing) or
          • the code page/encoding selector item Unicode - UTF-8 in status bar.
          On changing the encoding via code page/encoding selector in status bar to a Unicode encoding, UltraEdit adds additionally since v21.10 to its INI file in section [Open As Unicode] the name of the file with full path and after equal sign the number specifying the type of Unicode encoding. Section [Open As Unicode] was introduced with UltraEdit for Windows v21.10.

          But no such additional information is stored in INI file of UltraEdit for a file saved using Save As and selecting within this dialog either UTF-8 or UTF-8 - NO BOM, or saved using command Save after using appropriate conversion command from File - Conversions.

          UltraEdit remembers also a manually set code page for a non Unicode file via code page/encoding selector in status bar or using command Set Code Page in menu View by adding to section [File Code Page] the name of the file with full path and after equal sign the number for the code page. Section [Open As Unicode] was introduced with UltraEdit for Windows v12.10.

          As with UltraEdit for Windows v19.00 the enhanced status bar was introduced with the list boxes to change code page/encoding for active file and syntax highlighting, [File Code Page] was not enough anymore as now it was necessary to remember also the encoding for Unicode files, especially for UTF-8 and ASCII Escaped Unicode. Therefore from v19.00 to v21.00 the code page/encoding information after a manual switch by the user is remembered in INI file in section [SBar Code Page].

          But this combined storage of code page for 1 byte per character encoded files and the type of Unicode encoding for Unicode files was not well designed as for the code page information there are two sections in INI file: [File Code Page] and [SBar Code Page]. I suppose this was the reason why [SBar Code Page] was discarded in UE for Windows v21.10 and [Open As Unicode] was introduced for the encoding information of Unicode files and [File Code Page] is used again as the only information storage for a manually selected code page.

          A conversion of character encoding done via the commands in Conversions or in the Save As dialog was and is still never stored in INI file. (UE v23.10.0.1 is the current version.) This makes sense as the conversion commands can be also used from within a macro or script which converts for example thousands of files. I'm quite sure that nobody wants all the file names from a scripted conversion stored in the INI file.

          But a code page or the Unicode encoding when set manually by the user via status bar should be remembered for a file as most likely the most users would not be happy if code page/encoding once selected via status bar for a file is different after a restart of UltraEdit after automatically reloading the files or manually re-opening the files from the history lists.

          The sections with the file names are completely removed on using button Clear history in Toolbars / Menus - Miscellaneous configuration dialog.
          Best regards from an UC/UE/UES for Windows user from Austria

          6

            May 26, 2016#5

            And, again, thank you for all the detailed information. :)