Undisplayable characters in UE v14.00+4 VF from code page Windows-1252

Undisplayable characters in UE v14.00+4 VF from code page Windows-1252

14
Basic UserBasic User
14

    Jul 31, 2019#1

    😎 Greetings to all the great people using UE the BEST editing software EVER!
    😎 Special greetings to MOFI to whom we are all (and especially UE programmers) deeply beholden.

    🤓 Check out the attached UE file.

    Quite interestingly it's a 16 char x 16 line = 256 possible hex byte values (along with their graphical representations in UE (code page 1252)) where the values that cannot be the first letter in a C# variable are commented out, this in the form of an actual compilable C# int declaration.

    The following characters have no visible graphical (1252) UE representation corresponding to their hex values a list of which follows: 00 01 0C 0D 1C 1D 1E 1F 81 8F 90 A0 AF

    😓 Is there a way (ideally) to gather graphical representations from another (code page) font and get the 1252 font I'm using currently in UE to display them when the above hex byte values are encountered?
    c#intdeclare.displayable.chcp.1252 (2.94 KiB)   0

    6,675585
    Grand MasterGrand Master
    6,675585

      Aug 01, 2019#2

      Please refer to Wikipedia articles ASCII and Windows-1252. You can click on these two pages on the characters for a description, especially on those not displayed with a glyph because of either the used font does not support this character at all or it has no glyph for it or the character has no visible graphic representation at all according to definition of this character, for example on being a whitespace character.

      Examples:

      The NULL with hexadecimal value 00 should not exist ever in a text file. UltraEdit v14.00 convert them to normal spaces with hexadecimal byte value 20 on opening a file being interpreted as text file opened in text edit mode and not as binary file opened in hex edit mode although it contains one or more NULL. The configuration setting Allow editing of text files with HEX 00's without converting them to spaces at Advanced - Configuration - Editor - Advanced can be enabled in UltraEdit v14.00 to get a NULL just displayed as a space, but do not convert it to a normal space on editing the file. This setting does not exist anymore since UltraEdit v24.00 as UltraEdit became a full Unicode aware application. I have never understood why people wanted to edit files which are obviously binary files in text edit mode. Well, the reason is most likely that those people do not know what is the difference between a binary file and a text file and don't know anything about character encoding at all. There is defined in Unicode standard the Symbol for Null to be able to write in a text file or a document or in an HTML file like the one for Wikipedia article about ASCII the NULL character with a graphical symbol while the NULL control character has no symbol.

      The character with hexadecimal value 0C is a form feed which is used on printing text to make a page break. It has no graphical symbol as this control character belongs also to group of whitespace characters. There is in menu View the option Show Page Breaks as Lines to display for each form feed in a text file a horizontal line from left to right edge of document window to have also a pagination on viewing/editing text files with form feeds.

      The character with hexadecimal value 0D is a carriage return. This is a vertical whitespace character according to Unicode and should not exist at all in a text file except once at end of a line. Some fonts display this character with the glyph for a character supported by the font, but having no graphical representation which is often a rectangle symbol. But valid is also for a font to represent this control character not interpreted by an application as newline character with nothing, i.e. the symbol is a white area with width 0 which means the character is not displayed at all.

      The hexadecimal values 81, 8F, 90 are not mapped to characters in Windows-1252. It is invalid to have bytes with decimal value 129, 143 and 144 in a text file encoded with Windows-1252. So no font supporting Windows-1252 will ever display a graphical symbol for these invalid bytes as there are no characters mapped to these bytes.

      The character with hexadecimal value A0 is the no-break space. So it is absolutely right to display it as space because it is per definition of Unicode consortium a space character with same width as a normal space. The difference is that no automatic line break is allowed between a word character and a no-break space and between a no-break space and a word character. I often use no-break space in HTML as well as in Microsoft Word documents, for example between a number and a word or unit like 10 years or  20 s.

      UltraEdit has no feature to get characters with a specific value displayed using a different character or using a different font with the exception of
      • form feed which can be displayed with a horizontal line depending on View - Show Page Breaks as Lines,
      • line feed at end of a line in a text file with UNIX line endings (no carriage returns, just line feeds) displayed with ¬ on View - Show Line Endings enabled on using code page Windows-1252 or UTF-8 or UTF-16,
      • carriage return at end of a line in a text file with MAC line endings (no line feeds, just carriage returns) displayed with ± on View - Show Line Endings enabled on using code page Windows-1252 or UTF-8 or UTF-16,
      • carriage return plus line feed at end of a line in a text file with DOS/Windows line endings (carriage return + line feed) displayed with ¶ on View - Show Line Endings enabled on using code page Windows-1252 or UTF-8 or UTF-16,
      • horizontal tab displayed with » on View - Show Spaces/Tabs enabled on using code page Windows-1252 or UTF-8 or UTF-16,
      • normal space displayed with · on View - Show Spaces/Tabs enabled on using code page Windows-1252 or UTF-8 or UTF-16.
      I have added the characters en dash, em dash and no-break space to a special color group in syntax highlighting wordfile for HTML files and configured a separate foreground and background color for this color group to get them visually displayed different to hyphen (dash) and normal space in HTML files containing text copied for example from a Microsoft Word document. The wordfile html.uew installed with UltraEdit v26.10 contains these three characters also in color group Dashes and NBSP and the /Delimiters = line in html.uew contains also these three characters to get them always displayed with special configured colors in an HTML file.

      See also: Why are characters in a Unicode file displayed with a rectangle or a question mark?
      Best regards from an UC/UE/UES for Windows user from Austria

      14
      Basic UserBasic User
      14

        Aug 02, 2019#3

        😎 Thank you MOFI for the once again incredibly complete and precise indications on the question of font mapping in UE.
        🤓 To come clean, I am currently working on a pre-compile C# shorthand language where for example '®' would shortcut to the C# FOR statement.
        As such, every character I can get my hands on becomes vital.
        😓 So, to insist, is there really absolutely no possible way to hack the UE 1252 v14 VF font to display at least 0x81, 0x8F, 0x90 which, since unmapped by Windows (and UE) seem to me to be categorizable as "no harm no foul". UE must be holding this map somewhere (since it easily changes font size) and modifying binary code in my opinion is just this side of pizza and a ball game in terms of a good time. If you can point me in the right direction, would be my pleasure to keep you informed.

        6,675585
        Grand MasterGrand Master
        6,675585

          Aug 02, 2019#4

          You can of course use a different font for text display in UltraEdit. And you can of course create your own font for example by editing a standard font like Courier New or Consolas with using a font editor. But UltraEdit has no feature to display a specific character (or byte) depending on its code value with a specific different character.

          I suggest to click on View - Set Font and select Wingdings instead of Courier New. You can see the effect on text display on using a font not designed for text. Press Alt+C to switch to column mode using the font as defined with View - Set HEX/Column Mode Font and the text is readable again. Press Alt+C once more to disable column mode and the text is again "encrypted". Open View - Set Font and select previously used font instead of Wingdings.

          So what you need is a font which displays all bytes like you want it and this can be most likely achieved only by creating your own font for the 256 characters of code page Windows-1252. There are freeware font editors available in world wide web and they are not difficult to use.
          Best regards from an UC/UE/UES for Windows user from Austria