Tapatalk

Why is the character 'µ' inserted with code value 0xE6 instead of 0xB5?

Why is the character 'µ' inserted with code value 0xE6 instead of 0xB5?

14
Basic UserBasic User
14

    May 08, 2017#1

    Hello, UltraEdit world!

    UltraEdit is so awesome that I've been using it for years and have never had to use the forum.

    Until now. This one's got me stumped. :?:

    On an ASUS ZenBook UX31E Azerty keyboard running with Windows 7 code page 1252 ANSI and region set for France and 32-bit UltraEdit v14.00:

    Let's say I hit the key 'µ' a corresponding 'µ' shows in my current file in UltraEdit.
    However, in hex mode the byte corresponding to 'µ'= e6 (230 decimal).
    This of course causes problems when I copy to MSVC# and try to compile.
    ASCII table insert character gives the correct value b5 for code page 1252, but in text mode I get 'á' which in 1252 should be decimal 193=c1 :?:

    Would a character genius please come extirpate me from this despicable predicament.

    6,685587
    Grand MasterGrand Master
    6,685587

      May 09, 2017#2

      You have configured UltraEdit for editing files with code page OEM 850 instead of Windows-1252. Code page OEM 850 is used by Windows in command prompt window (console) with region configured to France. That setting is useful on writing batch files with output of French text into console window, but is definitely the wrong code page for writing C# code.

      Click in menu View on OEM Character Set to toggle off editing with code page OEM 850 and enable editing with code page Windows-1252.

      For more details about OEM Character Set see the topics:
      By the way: It is not advisable to insert characters into a string in a C# source code file with a code value greater 127 decimal as ANSI encoded character because in this case it depends on which code page the compiler uses on compilation, i.e. which region is configured in Windows for current user account. Better is to insert such characters in strings with hexadecimal notation, i.e. \xB5 for µ. The command Character Properties in menu Search displays the hexadecimal value of the character on which the caret is set (blinking left to character) in file.
      Best regards from an UC/UE/UES for Windows user from Austria

      14
      Basic UserBasic User
      14

        May 10, 2017#3

        Greetings Grand Master Mofi,
        Thanks very much for quick reply to "the problem with µ".
        View : code page : == "1252 (ANSI - Latin I)" (direct ^c from UE), still problem with µ persists.
        Could you please let me know exactly in which configuration file UltraEdit stores this information.
        If it's stored in a string somewhere, maybe I can change it manually.
        Furthermore, OEM 850 does not even appear as an option in view:code page.
        Also, what would the effect of set polices back to default be?
        Also, what about File main menu bar : advanced : Define Code Page?
        At what point does the code page get changed? On restart? even for files originally coded to OEM 850? Can different code
        pages be loaded for different tabs, different instances of UE running on different thread sets in same Windows 7 machine/session?

        :idea: PERHAPS SOME SORT OF OVERRIDE INHERITED FROM WINDOWS 7 SETTINGS? IS OEM 850 SOME FRENCH WINDOWS 7 DEFAULT? :idea:

        Much obliged, grand master Mofi, as I'm usually a math person, but here I need to do industrial scale char matching/replacement and so it's the first time I'm really looking at char sets closely.

        6,685587
        Grand MasterGrand Master
        6,685587

          May 11, 2017#4

          See the attached screenshot from UltraEdit v14.00 which I think you are using.

          On left side you see the contents of a new file with Windows-1252 because of OEM Character Set is not enabled in menu View.

          On right side you see the contents of same new file with OEM 850 because of OEM Character Set is now enabled in menu View.

          The code page option was not used between making the left and right side screenshots. Just the OEM Character Set was enabled by clicking on this menu item.

          The state of OEM Character Set (disabled as by default or enabled) is stored in INI file of UltraEdit in section [Settings] with Force OEM=0 or 1 for file extension group Default. But there are also Force OEM2= to Force OEM11= for perhaps customized file extension groups to enable OEM Character Set editing for just specific file types like *.bat, *.cmd and *.nfo files.

          So you could also open %APPDATA%\IDMComp\UltraEdit\uedit32.ini with Windows Notepad while UltraEdit is not running and change all Force OEM to 0.

          I can see after opening View - Set Code Page the code page 850 in the list, see attached screenshot from this dialog (made on German Windows XP). However, it does not matter which code page is set for the active file when OEM Character Set is enabled in menu View because of UltraEdit uses in this case the code page defined by Windows according to region setting for console applications which I think is for France code page 850. Open a command prompt window and run the command chcp (change code page) without any parameter and you get displayed which code page is defined by Windows for console according to your Region and Language settings in Windows Control Panel.

          By clicking on OEM Character Set to toggle off this option currently enabled in your configuration the code page defined by Windows for graphic user interface applications like UltraEdit according to Region and Language settings becomes active again which is Windows-1252.

          For non Unicode files with text encoded with always 1 byte per character a text editor like UltraEdit can't really automatically detect which code page was used on writing the text. The exception is HTML and XHTML files with the charset meta tag and XML files with the encoding attribute. For those files with the character/encoding declaration present in file and of course also right set according to encoding/code page used for the file, UltraEdit can also determine automatically the code page for non Unicode encoded HTML/XHTML/XML files.

          The code page in UE v14.00 which can be set individually for each opened file is mainly for copying/pasting text correct to/from clipboard and for converting to/from Unicode as also needed often in implicitly on copying/pasting text to/from clipboard and not only on using the file conversion commands in UltraEdit in menu File - Conversions. On changing the code page for a file, a font must be selected by the user in UE v14.00 which supports the newly set code page.

          In later versions of UltraEdit there have been made improvements like the encoding selector in status bar at bottom making it possible to change the text encoding from one text encoding to any other text encoding directly with 3 left mouse button clicks. And with UltraEdit v24.00 the user must not select anymore manually a font suitable for the encoding/code page set for the active file because of UltraEdit makes this automatically.

          The setting Advanced - Set Code Page/Locale defines the default code page used for non Unicode files after opening such a file and the code page can't be determined by UltraEdit from a charset or encoding declaration because of file is not an HTML/XHTML/XML file. In general with default settings "C" Default Locale/Code Page - Previously Used and "C" Default Locale/Code Page - Previously Used is used the code page defined by Windows for GUI applications according to region setting for current user, except OEM Character Set is enabled resulting in using the code page defined by Windows according to region settings for console. This setting makes it possible to override the default code page defined by Windows region setting for ANSI encoded files opened in UltraEdit.

          The setting Advanced - Set Code Page/Locale is also important on using Sort with option Use Locale enabled. In this case it is important which locale is configured in dialog opened with Advanced - Set Code Page/Locale. For example with the default code page/locale settings and running a sort of lines containing on each line a French word with accents there is no difference on sort order on having option Use Locale checked or not checked before running the sort because of sort is strictly "C" based, i.e. a < á. But with code page set to 1252 and locale set to French, a == á for sort with option Use Locale checked resulting in a sort order of the words in the lines as native French speaking people expect.

          Set Code Page is a per file setting. So multiple opened files can be encoded with different code pages. But in UE < 24.00 the user has to choose a font manually which supports all code pages set for the opened files. A manually set code page for a file is remembered by UltraEdit in its INI file for next opening this file.

          The button Clear History at Advanced - Configuration - Toolbars / Menus - Miscellaneous deletes usually all file/search/replace histories including the file based encoding/code page history. But that is not the case in UE v14.00. So to clear the file code page history on using UE < v18.10.0.1010 it is necessary to open INI file of UltraEdit with Windows Notepad while UltraEdit is not running and delete the entire section [File Code Page].

          See also the topics:
          oem_character_set_option.png (5.86KiB)
          OEM Character Set menu option of UE v14.00 and how display and editing changes on having this option disabled/enabled.
          set_code_page.png (3.12KiB)
          Code Page Selection dialog of UE v14.00 on German Windows XP.
          Best regards from an UC/UE/UES for Windows user from Austria

          14
          Basic UserBasic User
          14

            Thanks for incredible comprehensive reply oh great MOFI

            May 13, 2017#5

            All right Mr. super mofi (I really mean it - I asked for super character guru and YOU ARE SUPER CHAR GURU 8O ),
            Unbelievably comprehensive and precise. So, now we need to digest. Will post back this week. But totally instructive on my quest to understand the char set zoo :roll:

              Super Mofi nails it. No longer any problem with µ (=hex B5 as it should in ansi 1252 code page)

              May 18, 2017#6

              Greetings Super Mofi and other interested UE users. 8)

              So, to start from the beginning, what settings must be set (and where) to get an "Edit1" (Ctrl+N, starting from scratch, virgin) UE tab using ANSI 1252 encoding, so that I can fill it in and see if it clipboard copies (Ctrl+C/Ctrl+V) to MSVC#? :?:

              Well, asked myself that question and using Mofi's set "Force OEM=0" (was Force OEM=1 in %AppData%\Roaming\IDMComp\UltraEdit\Uedit32.ini (C:\Users\admin\AppData\Roaming\IDMComp\UltraEdit\Uedit32.ini)) and starting up virgin tab, I was able to get my v14 UE working with ANSI 1252...screenshot proof attached.

              So thanks oh incredibly great Mr. super Mofi, you saved the day once again! :mrgreen:
              B5_µ.gif (83KiB)
              Hex proof of µ=hex B5 and getting ASCII char table to correspond with ANSI 1252 after Mofi's set "Force OEM=0"