User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

This forum is user-to-user based and not regularly monitored by IDM.
Please see technical support page on how to contact IDM.
4 posts Page 1 of 1
I have an HTML5 template I use to start each new web page.
I always use <meta charset="utf-8"> in the HTML head.

I did this on a new page, and uploaded it to the server. I validated it using the W3C HTML5 validator, but it gave me an error saying the page wasn't UTF-8, but instead was windows-1252.

Where does the charset specification come from?
Does it come from the file that UE creates, or does the web server designate it?

I've never run into this problem before.

thanks
Well, the charset specification comes from you and of course you have to make sure that the characters are really encoded according to the charset declaration at top of the HTML5 file. You can see in the status bar at bottom of the UltraEdit main window which encoding is used currently by UltraEdit for a file. UTF-8 (new status bar in UE v19.00) or U8- (basic status bar in UE v19.00 and all previous versions of UE) indicate a UTF-8 encoding of the file. Just the line terminator type (DOS, UNIX, MAC) or an ANSI code page (new status bar in UE v19.00) means ANSI encoding.

Character encodings on W3C website explains how character set respectively encoding should be declared in an HTML, XHTML and XML file.

UltraEdit detects UTF-8 encoded files by

  • UTF-8 BOM at beginning of a file (not recommended for HTML files)
  • One of the following four strings is found at top of the file (within the first 1024 bytes):
    charset=UTF-8, charset=utf-8, encoding="UTF-8, encoding="utf-8
  • Within the first 64 KB at least one byte sequence is found which looks like a UTF-8 character encoding sequence.
As it can be read at HTML 5.1 Nightly - Specifying the document's character encoding the short character set as you use can be used also for HTML5. But as charset="utf-8 is not recognized yet by UltraEdit, the HTML5 file is opened as ASCII/ANSI file if there is no UTF-8 byte sequence within the first 64 KB.

Entering now a character with a code value greater 127 results in using a wrong encoding for this character in comparison to the character set declaration at top of the HTML5 file.

Solution:

  • Select Create new files as UTF-8 at Advanced - Configuration - Editor - New File Creation.
  • Uncheck at Advanced - Configuration - File Handling - Save
    Write UTF-8 BOM header to all UTF-8 files when saved
    and
    Write UTF-8 BOM on new files created within this program
  • While UltraEdit is not running, open %appdata%\IDMComp\UltraEdit\uedit32.ini with Notepad and add to group [Settings] a line with Force UTF-8=1 and save the modified INI.
Now new files are by default encoded in UTF-8 as required for your HTML5 files. And all files not detected as UTF-16 encoded files are interpreted now always as UTF-8 encoded files.

If you need to open an ASCII/ANSI encoded file like an UltraEdit script file, you have to use the Open As option with ASCII selected in the File Open dialog to overwrite the Force UTF-8=1 setting for such files.

I have sent an enhancement request to IDM support by email for supporting also HTML5 character set declarations. Best you do the same so that request count is already 2. The more users request an enhancement, the higher becomes the priority for being implemented.
A very complete answer, thanks.

I will submit the request.
thanks
UTF-8 charset/encoding detection was enhanced in UltraEdit for Windows v24.00 and UEStudio v17.00. Supported are now:

HTML4

Code: Select all
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<META http-equiv='Content-Type' content='text/html; charset=UTF-8'>
<META http-equiv='Content-Type' content='text/html; charset=utf-8'>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'>
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>

And of course the charset declarations in XHTML files with / before > are supported, too.

HTML5

Code: Select all
<META charset="UTF-8">
<META charset="utf-8">
<meta charset="UTF-8">
<meta charset="utf-8">
<META charset='UTF-8'>
<META charset='utf-8'>
<meta charset='UTF-8'>
<meta charset='utf-8'>
<META charset=UTF-8>
<META charset=utf-8>
<meta charset=UTF-8>
<meta charset=utf-8>

XML

Code: Select all
<?xml version='1.0' encoding="UTF-8">
<?xml version='1.0' encoding="utf-8">
<?xml version='1.0' encoding='UTF-8'>
<?xml version='1.0' encoding='utf-8'>

And of course UTF-8 encoding declaration in XML files of version 1.1 is supported, too.
Best regards from Austria
4 posts Page 1 of 1
cron