Well, the charset specification comes from you and of course you have to make sure that the characters are really encoded according to the charset declaration at top of the HTML5 file. You can see in the status bar at bottom of the UltraEdit main window which encoding is used currently by UltraEdit for a file.
UTF-8 (new status bar in UE v19.00) or
U8- (basic status bar in UE v19.00 and all previous versions of UE) indicate a UTF-8 encoding of the file. Just the line terminator type (DOS, UNIX, MAC) or an ANSI code page (new status bar in UE v19.00) means ANSI encoding.
Declaring character encodings in HTML on W3C website explains how character set respectively encoding should be declared in an
HTML,
XHTML and
XML file.
UltraEdit v19.00 detects UTF-8 encoded files by
- UTF-8 BOM at beginning of a file (not recommended for HTML files)
- One of the following four strings is found at top of the file (within the first 1024 bytes):
charset=UTF-8, charset=utf-8, encoding="UTF-8, encoding="utf-8
- Within the first 64 KB at least one byte sequence is found which looks like a UTF-8 character encoding sequence.
The short character set as used by you can be used also for HTML5 as it can be read at
HTML 5.3 - Specifying the document’s character encoding. But as
charset="utf-8 is not recognized by UltraEdit for Windows < v24.00 or UEStudio < v17.00, the HTML5 file is opened as ASCII/ANSI file if there is no UTF-8 byte sequence within the first 64 KB on using UltraEdit for Windows < v24.10 or UEStudio < v17.10.
Entering now a character with a code value greater 127 results in using a wrong encoding for this character in comparison to the character set declaration at top of the HTML5 file.
Solution:
- Select Create new files as UTF-8 at Advanced - Configuration - Editor - New File Creation.
- Uncheck at Advanced - Configuration - File Handling - Save
Write UTF-8 BOM header to all UTF-8 files when saved
and
Write UTF-8 BOM on new files created within this program
- While UltraEdit is not running, open %APPDATA%\IDMComp\UltraEdit\uedit32.ini with Notepad and add to group [Settings] a line with Force UTF-8=1 and save the modified INI.
Now new files are by default encoded in UTF-8 as required for your HTML5 files. And all files not detected as UTF-16 encoded files are interpreted now always as UTF-8 encoded files.
If you need to open an ASCII/ANSI encoded file like an UltraEdit script file, you have to use the
Open As option with
ASCII selected in the
File Open dialog to overwrite the
Force UTF-8=1 setting for such files.
I have sent an enhancement request to IDM support by email for supporting also HTML5 character set declarations. Best you do the same so that request count is already 2. The more users request an enhancement, the higher becomes the priority for being implemented.