The hex edit mode shows the binary
bytes -
not characters - of a file. The
ASCII representation of the
bytes (not characters) uses the code page as defined by default for UltraEdit which is by default the ANSI code page defined by Windows according to region (country) configured for the user account. You will never see the
bytes in hex edit mode displayed Unicode interpreted according to UTF-7, UTF-8 or UTF-16. The hex edit mode displays the
bytes of any type of file and not the
characters of a text file. Please read the introducing chapters on power tip page
Unicode text and Unicode files in UltraEdit to get better knowledge about
character encoding.
Viewing a file with any character encoding is very easy with UltraEdit. There is at bottom of main application window the status bar which contains since UltraEdit for Windows v19.00 the encoding selector. So the bytes of the currently displayed file can be interpreted using a different encoding as automatically selected on opening the file in case of automatic encoding selection was not correct for the file since UltraEdit for Windows v24.10. In older versions selecting a different encoding on status bar could result in converting the file to the selected encoding instead of just displaying the bytes of current file according to selected encoding. I know from IDM support that this encoding selector behavior change was done after a good deal of internal discussion prompted by messages from users who didn't actually want to convert their files, but simply wanted to change which encoding was used to display the file in certain cases.
The currently used font must support the selected encoding as well, i.e. it must have glyphs defined for the characters of selected encoding. Most fonts support only characters of a few code pages. There is a different font chosen automatically by UltraEdit for Windows since v24.00 for a character not supported by configured font if the text file is Unicode encoded. That can result in a caret positioning issue because of the character widths are always according to configured font. So if a different font is used for just some characters in a line and the alternate font used for those few characters defines a different width for those characters, the caret positioning can be wrong. Internet browsers do that also on displaying Unicode text on which some characters are not supported by the font defined by the web page creator or the user. But Internet browsers don't show a caret at all and so most users don't recognize that some characters are displayed using a different font.
UltraEdit informs the user if the configured font must be changed to support the different code page respectively encoding. For example UltraEdit shows the warning on changing interpretation of the bytes of a text file from Windows-1252 displayed with a font with script Western selected to Windows-1250 on which the font must be changed to script Central Europe if the font supports that code page at all.
The appropriate conversion command can be used after selecting the correct encoding for the currently displayed text file. Or the command
Save as is used which has an encoding option to convert the file on saving to UTF-8 without or with byte order mark (BOM) or UTF-16 Little Endian or Big Endian without or with BOM or ANSI according to ANSI code page defined in UltraEdit for ANSI encoded text files.
Your example text block is definitely
not encoded using code page Windows-1250. Please look on Wikipedia article about code page
Windows-1250. The character
¡ is not available in code page Windows-1250. The inverted exclamation mark is available in code page
Windows-1252 with hexadecimal code value A1 and has the Unicode code value
00A1. The character
¿ is also not available in code page Windows-1250 while inverted question mark is available in code page Windows-1252 with hexadecimal code value BF which has the Unicode code value
00BF.
So you made a mistake. You thought text is ANSI encoded with code page Windows-1250, but is in real encoded with code page Windows-1252. So you get the characters displayed wrong on converting the bytes of the file interpreted according to Windows-1250 converted to Unicode with UTF-8 encoding. The byte with code value A1 is not converted to Unicode character with code value 00A1 as expected by you, but to Unicode character with code value 02C7 (caron) according to code page Windows-1250.
If you see the characters
¡ and
¿ displayed in document window of UltraEdit although having selected code page Windows-1250, you have most likely ignored the warning of UltraEdit that the font must be modified, i.e. script must be changed from Western to Central Europe and so the configured font displays the bytes nevertheless Windows-1252 encoded and not Windows-1250 encoded.