Why are characters in a Unicode file displayed with a rectangle or a question mark?

Why are characters in a Unicode file displayed with a rectangle or a question mark?

10
Basic UserBasic User
10

    May 05, 2018#1

    Thanks for macro and script posted at How to check for duplicate characters in a single line of text?

    I did try the macro after reading up about it. The one I created by myself just kept deleting more and more characters.

    The other problem I noticed is that UltraEdit won't show the characters correctly. Many of the Unicode characters are showing as "squares".

    characters_display_ultraedit.jpg (44.23KiB)
    Display of the characters in UltraEdit for Windows version 22.

    Maybe it's a problem with UltraEdit 22 (open as Auto Detect ASCII/Unicode is selected).
    Notepad displays them correctly (as does another text editor).

    characters_display_notepad.jpg (27.51KiB)
    Display of the characters in Windows Notepad.

    (And I can only install UltraEdit 22 on Windows XP.)

    6,614548
    Grand MasterGrand Master
    6,614548

      May 06, 2018#2

      In UltraEdit for Windows version 22.xx the configured font determines how characters are displayed in the document window. The character encoding detected by UltraEdit is displayed at bottom in status bar. But even if character encoding detection is done right by UltraEdit, the characters can be displayed wrong if the font configured with View - Set Font or with View - Set HEX/Column Mode Font for hex/column editing mode has no glyph defined for a character to display.

      You have to configure a different font like Courier New, Consolas or Arial Unicode MS which supports much more characters from Unicode character table than the font currently used by you.
      Best regards from an UC/UE/UES for Windows user from Austria

      10
      Basic UserBasic User
      10

        May 07, 2018#3

        Mofi wrote:...the characters can be displayed wrong if the font configured with View - Set Font or with View - Set HEX/Column Mode Font for hex/column editing mode has no glyph defined for a character to display.

        You have to configure a different font like Courier New, Consolas or Arial Unicode MS which supports much more characters from Unicode character table than the font currently used by you.
        In Notepad, the font I use is Verdana (and the file is saved as Unicode):

        verdana_notepad.jpg (67.63KiB)
        Display in Notepad with Verdana

        I tried adjusting the Fonts and HEX/Column Mode Font. With Courier New, Arial Unicode & Arial Unicode MS, it didn't make any difference (apart from displaying "?" instead of "squares"). I also tried with the various Unicode options too (UTF-8, UTF-16LE & UTF-16BE) Again, no difference:

        courier_new_ultraedit.jpg (74.66KiB)
        Display in UltraEdit with Courier New

        verdana_ultraedit.jpg (72.34KiB)
        Display in UltraEdit with Verdana

        With Verdana font showing the characters correctly in Notepad, then I'd have thought Verdana font should work in UltraEdit.

        If anyone wishes to try displaying these sample characters correctly, I've attached a file. The first 5 "U" characters display for me, then it goes downhill after that...
        Characters Sample.txt (319 Bytes)   34

        6,614548
        Grand MasterGrand Master
        6,614548

          May 07, 2018#4

          Sorry, but your assumption is wrong. Verdana (verdana.ttf, version 5.05 with 191,344 bytes) has a very limited list of supported characters, i.e. the number of glyphs is quite low with 1391 glyphs in version 5.31. Courier New (cour.ttf, version 5.13 with 711,092 bytes) with 3458 glyphs in version 6.80 and Arial (arial.ttf, version 5.22 with 774,236 bytes) with 3988 glyphs in version 6.80 support much more. Most Unicode characters are supported by Arial Unicode MS (arialuni.ttf, version 1.01 with 23,275,812 bytes). The number of glyphs is not specified on Microsoft's documentation page for font Arial Unicode MS, but the font file size of more than 22 MiB is a strong indication that this font really supports many characters in comparison to most other fonts in Windows fonts directory with a file size lower than 1 MiB.

          characters_display_verdana.png (7.55KiB)
          Unicode text display with UE v22.20.0.49 on having font Verdana configured.

          It can be seen with UltraEdit for Windows v22.20.0.49 running on Windows 7 SP1 that the font Verdana does not support most characters of the UTF-8 with BOM encoded file (basic status bar).

          characters_display_courier_new.png (7.84KiB)
          Unicode text display with UE v22.20.0.49 on having font Courier New configured.

          It can be seen with UltraEdit for Windows v22.20.0.49 running on Windows 7 SP1 that the font Courier New does support all characters of the UTF-8 with BOM encoded file (standard status bar).

          But why are all characters displayed in Windows Notepad as expected on using font Verdana?

          Windows Notepad does not use font Verdana for all characters. For each character not supported by Verdana a different font is used which supports this character. That is a Windows feature available for all full Unicode aware applications.

          UltraEdit for Windows is since v24.00 also a full Unicode aware application. So UltraEdit v25.00.0.68 with Verdana configured as font for document window area displays also all characters correct although font Verdana does not have glyphs for most of the characters in the UTF-8 encoded file.

          Here is a screenshot that I made from UE v25.00.0.68 with the UTF-8 encoded file opened and with my preferred bitmap font Dina configured. Dina is available only in the font sizes 8, 9 and 10 with a very limited list of supported characters.

          characters_display_dina_arial_mixed.png (13.71KiB)
          Unicode text display with UE v25.00.0.68 on having font Dina configured.

          Dina is not installed with Windows. I have downloaded and installed this bitmap font manually. I have configured a much larger font size as this bitmap font supports to clearly see the difference. The first 8 characters are supported by Dina. For that reason the font rendering engine of Windows takes the bitmap and enlarges the bitmap to required size resulting in a square look. The other characters are displayed much better. The reason is that those characters are displayed using glyphs of font Arial which is a vector font which can be scaled to any font size smoothly even unusual font sizes like 24.39 points.

          Internet browsers do in the meantime the same. If the font as defined according to HTML and CSS does not support a character to display, a Unicode fall back font is used for this character to display the character nevertheless correct. This approach has advantages and disadvantages as always when an application auto-corrects something for the application's user. A user thinks that Verdana supports all Unicode characters and it must be a bug of an application if getting characters displayed with default glyph as defined inside the font for a not supported character. Web authors do not recognize that for some characters it would be in real necessary to add a SPAN element and specify a different font for the character(s) enclosed by the SPAN than the font usually used for the text.
          Best regards from an UC/UE/UES for Windows user from Austria

          10
          Basic UserBasic User
          10

            May 14, 2018#5

            Mofi wrote:But why are all characters displayed in Windows Notepad as expected on using font Verdana?

            Windows Notepad does not use font Verdana for all characters. For each character not supported by Verdana a different font is used which supports this character. That is a Windows feature available for all full Unicode aware applications.
            Right. Well that makes sense.

            I appreciate your help, but I'm stumped. Courier New just won't display the characters correctly.
            And upgrading to a new version of UltraEdit is out of the question as my OS won't support it.

            6,614548
            Grand MasterGrand Master
            6,614548

              May 14, 2018#6

              The font Courier New is installed by default in version 2.90 on Windows XP with SP3 with the following files in %SystemRoot%\Fonts:
              1. cour.ttf ... Courier New Regular ... 303,296 bytes
              2. courbd.ttf ... Courier New Bold ... 312,920 bytes
              3. courbi.ttf ... Courier New Bold Italic ... 236,148 bytes
              4. couri.ttf ... Courier New Italic ... 245,032 bytes
              Version 2.90 of Courier New has the same limited support for characters as Verdana version 2.43 installed by default also on Windows XP.

              I have made the screenshot of UltraEdit v22.20.0.49 display on Windows 7 with SP1 which has installed by default Courier New in version 5.11. That's not the latest version of Courier New because that would be version 6.80, but version 5.11 has many more glyphs as version 2.90 as it can be seen also on looking on file size of the four files:
              1. cour.ttf ... Courier New Regular ... 709,600 bytes
              2. courbd.ttf ... Courier New Bold ... 710,192 bytes
              3. courbi.ttf ... Courier New Bold Italic ... 530,336 bytes
              4. couri.ttf ... Courier New Italic ... 618,240 bytes
              Courier New version 5.11 (or newer version 5.13 as installed on another of my Windows 7 machines or currently newest version 6.80) can be also used on Windows XP. It is just necessary to
              • copy those four files from a Windows 7 / 8 / 10 machine from %SystemRoot%\Fonts to a temporary folder on Windows XP machine,
              • next open Control Panel - Fonts on Windows XP,
              • select the four Courier New and uninstall them with pressing key DEL,
              • click in menu File on menu item Install New Font, browse to temporary folder with newer version of Courier New, select them all and install them,
              • delete the temporary folder with the four newer Courier New files installed already in %SystemRoot%\Fonts and correct registered in Windows registry by Windows.
              This procedure can be done also for other fonts which are newer on Windows 7 / 8 / 10 than on Windows XP.

              This reply was written by me after verification that display of Characters Sample.txt in UltraEdit v22.20.0.49 running on Windows XP SP3 with Courier New version 5.11 installed looks like on Windows 7 SP1.

              And I updated Courier New to version 5.13 on my Windows XP machine. So I have installed on my Windows XP machine now:
              1. cour.ttf ... Courier New Regular ... 711,092 bytes
              2. courbd.ttf ... Courier New Bold ... 711,444 bytes
              3. courbi.ttf ... Courier New Bold Italic ... 531,520 bytes
              4. couri.ttf ... Courier New Italic ... 619,776 bytes
              Best regards from an UC/UE/UES for Windows user from Austria