Users of UltraEdit v17.20 or 17.30 or UEStudio v11.20 should first read
Font message when opening file (fixed) if this message is displayed only on using
File - Open dialog.
In menu
Advanced there is
Configuration. In the tree there is
File Handling - Code Page Detection. Click on button
Help in this configuration dialog for details about those settings.
With first setting enabled, the second setting is responsible for displaying the message:
In order to view the document correctly, you may have to change your font and/or script settings in the Font Dialog.
The
auto code page detection feature searches in HTML, XHTML and XML files for
charset= respectively
encoding= declaration and displays the message prompt if the character set (code page) / encoding requires a different font or script to display the content of the file correct according to declared code page / encoding.
For example the line
Code: Select all
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
in head of an HTML file means that the HTML file is encoded (hopefully really) with 1 byte per character using the code page
ISO/IEC 8859-1 which is very similar (but not identical) to code page
Windows-1252 which is the default code page for Western European and North American countries on Windows.
Therefore it is necessary to select a font supporting script
Western in the dialog opened by
Set Font (normal text editing on which proportional fonts can be also used) and
Set HEX/Column Mode Font (hex and column mode editing only possible with a fixed width font) in menu
View.
Code: Select all
<?xml version="1.0" encoding="utf-8"?>
at top of an XML file means that this file is a Unicode file encoded with
UTF-8 (hopefully really). On status bar at bottom
UTF-8 or
U8- should be displayed in this case indicating Unicode editing with using UTF-8 as encoding for storage.
In this case the font script (code page) does not matter. The used font just has to support Unicode. As far as I know there is no font which supports all characters defined in Unicode standard. Courier New and Arial support many Unicode characters usually used in Western European and North American countries. The font
Arial Unicode supports a wide range of Unicode characters. That can be seen on looking on file size of ARIALUNI.TTF in comparison to ARIAL.TTF in folder
C:\Windows\Fonts. 22730 KB versus 358 KB or a ratio of 63.5 : 1 is a strong indication that font
Arial Unicode contains obviously much more characters (glyphs) than
Arial.
Code: Select all
<meta http-equiv="content-type" content="text/html; charset=iso-8859-5" />
in an XHTML file requires the code page
28595 or
Windows 1251 for Cyrillic text encoded with just 1 byte per character (= non Unicode). A font is needed supporting Cyrillic characters and therefore such a font and the script
Cyrillic must be selected in
Set Font dialog.
For more information about text encoding see power tip
Working with Unicode in UltraEdit/UEStudio which explains what is a character set / code page / encoding and why it is important to know that for editing text files.
In UltraEdit in menu
Advanced there is
Set Code Page/Locale. The defaults are
- For system installed code pages: "C" Default Locale/Code Page - Previously Used
- For system installed locales: "C" Default Locale/Code Page - Previously Used
This means that all new non Unicode files and all opened non Unicode files are by default interpreted with code page as defined in region and language settings of Windows with no locale selected (pure ASCII sort as default in applications written in C/C++).
The code page is important for conversion of text from/to Unicode or other code pages like conversion of a file from/to Unicode or on paste of text from clipboard.
And the locale is important on running a sort with option
Use locale (slower) enabled which results for example for German text that ä = a, Ä = A, ö = o, Ö = O, ü = u, Ü = U and ß = ss.
Those defaults should not be changed except there is a real reason to do so. For example if a French person in France with a French Windows and France - French selected in region and language settings of Windows is working in UltraEdit mainly on non Unicode Baltic text files for whatever reason.
Selecting in
Select Code Page dialog
"System Default" Locale/Code Page for both lists, closing the dialog with button
OK and re-opening it results in getting displayed in this dialog what is configured currently in Windows regional and language settings and is now set permanently for editing of non Unicode files in UltraEdit. This is important if a locale aware sort (= language specific sort) should be done on a selection or an entire file.
UltraEdit remembers a manually set code page for active file via
View - Set Code Page in uedit32.ini and sets this code page again if the file is re-opened later in same or a different instance of UltraEdit. This can result in displaying again the message about selecting a font and script for the file matching the code page of the file.
If a code page was by mistake set for a file not matching the system code page (= as defined in Windows region and language settings) which an appropriate font script selected in
Set Font dialog, it is necessary to close all files, open
Advanced - Configuration - Toolbars / Menus - Miscellaneous, click on button
Clear to clear ALL histories including the code page per file history and exit configuration dialog with
Cancel.
Encoding type list item in status bar
The encoding type list item in non basic status bar is available since UE v19.00 and UES v13.00. Selecting for a non Unicode text file a different code page is like executing
View - Set Code Page for this single byte per character encoded text file. Nothing changes on display nor is the file modified in any way. Just the code page is changed for character conversion.
But the encoding type list item in status bar contains also 4 Unicode options. The selection of one of the four Unicode options results in a real and immediately executed conversion of entire file to the selected Unicode encoding like when using the appropriate command from
File - Conversions or context menu of file tab of active file. For HTML, XHTML and XML files the
charset/
encoding declaration in head must be adapted after changing the Unicode encoding.
Also if the active file is currently a Unicode file indicated by
UTF-8,
UTF-16,
UTF-16BE or
UESC in status bar and any other code page is selected outside the Unicode list, an immediate conversion is executed for the entire file from Unicode (multiple bytes per character) to single byte per character encoding (known as ASCII/ANSI on Windows although not really correct term) using the selected code page. For HTML, XHTML and XML files the
charset/
encoding declaration in head must be adapted after changing the encoding. But to get file displayed now correct, it might be necessary to select a different font or script for this text file (and all other opened files).
Summary
The selected
code page is important for conversion of text from/to Unicode or other code pages as it happens always on paste from clipboard for non Unicode files.
View - Set Code Page changes only the code page for active file for character conversion for non Unicode files from/to Unicode or other code pages, but not how the bytes of the file are displayed as text in document window.
The selected font and script of the font defines how the text of non Unicode files is displayed in the editor. This setting is for all opened files except a
specific font is defined for files with a specific file extension at
Advanced - Configuration - Word Wrap/Tab Settings.
The locale setting is important only for locale specific sort and nothing else.
PS: See also
Warning message about change of font and/or script settings in font dialog.