Tapatalk

Cannot display Unicode plane 1 code points

Cannot display Unicode plane 1 code points

3
NewbieNewbie
3

    Jan 08, 2021#1

    Unicode display works great for plane 0, but as soon as I use plane 1 code points they are not displayed correctly:
    example: U+12009 would be f0 92 80 89 in hex byte UTF-8. This is displayed correctly e.g. in modern browsers if embedded in xml. See example file attached.
    UltraEdit just displays the 'question mark in rectangle' symbol.
    write002.xml (65 Bytes)   0

    6,685587
    Grand MasterGrand Master
    6,685587

      Jan 08, 2021#2

      The currently latest UltraEdit v27.10.0.164 supports only basic multilingual plane (UCS-2) really completely as I know after a related question to IDM support triggered by the forum topic Why must many emojis be deleted in a Unicode encoded text file by pressing twice key BACKSPACE? See also the topic Why are characters in a Unicode file displayed with a rectangle or a question mark? Modern browsers support  emoji and other pictographic sets and game symbols not via a font installed in Windows fonts folder with using related Windows library functions as UltraEdit does.

      So I can only suggest using a different text editor if you really want to edit XML files with characters and symbols from other planes than the basic multilingual plane and you want them correct displayed in the text editor too. It is possible with UltraEdit to edit all Unicode encoded files, but that does not mean that all characters and symbols defined in Unicode specification can be displayed correct by UltraEdit too.

      xml_display_firefox.png (1.07KiB)
      Display of the attached XML file by Firefox v84.0.2 on my Windows 7 machine.
      Best regards from an UC/UE/UES for Windows user from Austria

      3
      NewbieNewbie
      3

        Jan 09, 2021#3

        Many thanks for the detailed answer!
        Basically I wanted to make sure I don't miss anything and there would have been a way to display unicode symbols of the higher planes correctly.
        Basic editing in fact works well, even if there is a symbol not displayed ok. I can live with that.

        I am writing an input scanner that has to accept the full set of UTF-8 encoded files. Some code tests would be a little bit easier to write if I could just see in UE that the symbols are correctly displayed for test files I generate.

        There are not too many editors out there capable of displaying unicode of higher planes. Eclipse (my main dev environment) also cannot do it. Interestingly, MS Visual Studio Code is doing good:
        unicode.jpg (22.79KiB)

        6,685587
        Grand MasterGrand Master
        6,685587

          Jan 09, 2021#4

          It is interesting that MS Visual Studio Code can display this symbol. Please let me know which font is used by MS Visual Studio Code if a font is used at all and if you can ever find it out. I would be really interested in that font and would install it on my computer for making some tests with it with various applications.
          Best regards from an UC/UE/UES for Windows user from Austria

          3
          NewbieNewbie
          3

            Jan 10, 2021#5

            I have no idea how they do it. Looking at the standard character map tool in Win 10 you can search for unicode chars, but only with 4 letters, so up to something like u+fffc (object replacement character). My guess is they mimic browser behaviour in VS Code and bypass Win API. Or they have Win API for higher unicode planes but have not updated their font tool yet to make use of it.