ASCII table not displaying some characters

Hugh · PostJun 25, 2019#12019-06-25T02:32+00:00

Character mappings are not showing for values 0x7F through 0x9f. See screen images below. Is there a setting or something that I need to change?

Mofi · PostJun 25, 2019#22019-06-25T06:13+00:00

What is your current code page for the active file on opening the ASCII table view?

If you are using code page ISO 8859-1 no characters can be shown for code value range 7F to 9F as this code page does not define any character in this range as also ISO 8859-2 and other ISO/IEC standardized code pages.

The not standardized code page Windows-1252 as well as other Windows code pages defined by Microsoft being supported also on other operating systems than Window define characters also in that code value range with some gaps.

Note: ASCII table is not really the right term because of ASCII defines only characters for code value range 00 to 7F. The right term would be "single byte encoded character table according to current code page". But this is a bit long and most likely not really understandable for users not knowing anything about character encoding as vast majority of UltraEdit users.

Hugh · PostJun 26, 2019#32019-06-26T02:53+00:00

Default encoding for open files = UTF-8.
Default encoding for new files = UTF-8.

On 25 Jun 2019 at 02:13 Mofi wrote:

Note: ASCII table is not really the right term…

Exactly what I was thinking as I wrote my original post on this topic.

Mofi · PostJun 26, 2019#42019-06-26T05:59+00:00

And which code page is configured for non Unicode encoded files because of this setting is used for the "ASCII" table? Unicode is not used for the "ASCII" table as far as I know.

Even if Unicode would be used for "ASCII" table, the character with Unicode code point value 007F would be the control character DELETE and code point values 0080 to 009F are used for other control characters as displayed for example on Wikipedia page Unicode/Character reference/0000-0FFF. Most fonts do not have defined glyphs for these control characters, most likely because of these control characters should never exist in text files.

Hugh · PostJun 26, 2019#52019-06-26T20:40+00:00

I may be misunderstanding your question, but there appears to be no option for setting the encoding of non-Unicode encoded files. The Default encoding option actually says, "Default encoding for open files (used when auto-detection fails)" and my experience is that Linux/Unix systems are quite good at figuring out file types from the file content, which I assume is what auto-detection refers to.

I'm not sure I'm asking for Unicode to be used in this "ASCII" table. However, when I double-click on the 0x85 value, a horizontal ellipsis character ("…") appears. The underlying hex codes inserted are 0xE280A6. This is exactly the behavior I want. I just would like that ellipsis character to be displayed in the so-call ASCII table.

Mofi · PostJun 27, 2019#62019-06-27T05:19+00:00

Character Ellipsis encoded with one byte with hexadecimal value 85 means using code page Windows-1252 and not UTF-8 or ISO-8859-1. So it looks like Windows-1252 is used for characters in hexadecimal range 7F to 9F although the file is UTF-8 encoded resulting in inserting the Ellipsis character with Unicode code point value 2026 finally stored with the three bytes E2 80 A6.

I agree that when double clicking on character with hexadecimal value 85 results in inserting an Ellipsis character, this character should be also displayed in ASCII table. Please report this UEX issue to IDM support by email. The ASCII table should display the characters which are inserted into the file on double click.

In UltraEdit for Windows is displayed in ASCII table the Ellipsis character. The reason is that configuration setting Default code page (for ANSI encoding) is set to 1252 (ANSI - Latin I). UltraEdit has selected this code page after installation as Default code page (for ANSI encoding) because of I have configured in Windows for my user account the country Austria which results in Window selecting Windows-1252 as code page for non-Unicode text. UltraEdit for Windows has loaded this Windows configuration on first start and set it for this configuration option. The user has the freedom to configure a different code page than what is set by Windows for non-Unicode text according to country configured for the user account. I don't know if Linux has also a kernel function like Windows to get the information which code page is default for non-Unicode text according to user account related country and language settings.

UltraEdit for Windows has also the setting Automatically detect code page for ANSI files. The automatic detection of code page for ANSI files works only for HTML, XHTML and XML files with a valid charset (HTML, XHTML) respectively encoding (XML) declaration at top of the file. For all other non-Unicode files it is impossible to find out for an application with which code page the characters are encoded because this would mean the application really understand the meaning of the text, finds out to which language the text belongs and which code page is usually used for this language.

If I would get text in a foreign language I don't know I could in best case guess the code page, but I would not really know it until somebody tells me in which language the text in file is written. And even with knowing language of text there is still the problem that for example Russian text could be encoded using Windows-1251 or OEM 866 or ISO 8859-5 or KOI8-R. It would be impossible for me to find out with which code page the Russian text is written in the file because I don't know any Russian word.