Short UTF-8 charset declaration in HTML5 header (solved)

Short UTF-8 charset declaration in HTML5 header (solved)

2

    Apr 03, 2013#1

    I have an HTML5 template I use to start each new webpage.
    I always use <meta charset="utf-8"> in the HTML head.

    I did this on a new page, and uploaded it to the server. I validated it using the W3C HTML5 validator, but it gave me an error saying the page wasn't UTF-8, but instead was windows-1252.

    Where does the charset specification come from?
    Does it come from the file that UE creates, or does the web server designate it?

    I've never run into this problem before.

    thanks

    6,603548
    Grand MasterGrand Master
    6,603548

      Apr 04, 2013#2

      Well, the charset specification comes from you and of course you have to make sure that the characters are really encoded according to the charset declaration at top of the HTML5 file. You can see in the status bar at bottom of the UltraEdit main window which encoding is used currently by UltraEdit for a file. UTF-8 (new status bar in UE v19.00) or U8- (basic status bar in UE v19.00 and all previous versions of UE) indicate a UTF-8 encoding of the file. Just the line terminator type (DOS, UNIX, MAC) or an ANSI code page (new status bar in UE v19.00) means ANSI encoding.

      Declaring character encodings in HTML on W3C website explains how character set respectively encoding should be declared in an HTML, XHTML and XML file.

      UltraEdit v19.00 detects UTF-8 encoded files by
      • UTF-8 BOM at beginning of a file (not recommended for HTML files)
      • One of the following four strings is found at top of the file (within the first 1024 bytes):
        charset=UTF-8, charset=utf-8, encoding="UTF-8, encoding="utf-8
      • Within the first 64 KB at least one byte sequence is found which looks like a UTF-8 character encoding sequence.
      The short character set as used by you can be used also for HTML5 as it can be read at HTML 5.3 - Specifying the document’s character encoding. But as charset="utf-8 is not recognized by UltraEdit for Windows < v24.00 or UEStudio < v17.00, the HTML5 file is opened as ASCII/ANSI file if there is no UTF-8 byte sequence within the first 64 KB on using UltraEdit for Windows < v24.10 or UEStudio < v17.10.

      Entering now a character with a code value greater 127 results in using a wrong encoding for this character in comparison to the character set declaration at top of the HTML5 file.

      Solution:
      • Select Create new files as UTF-8 at Advanced - Configuration - Editor - New File Creation.
      • Uncheck at Advanced - Configuration - File Handling - Save
        Write UTF-8 BOM header to all UTF-8 files when saved
        and
        Write UTF-8 BOM on new files created within this program
      • While UltraEdit is not running, open %APPDATA%\IDMComp\UltraEdit\uedit32.ini with Notepad and add to group [Settings] a line with Force UTF-8=1 and save the modified INI.
      Now new files are by default encoded in UTF-8 as required for your HTML5 files. And all files not detected as UTF-16 encoded files are interpreted now always as UTF-8 encoded files.

      If you need to open an ASCII/ANSI encoded file like an UltraEdit script file, you have to use the Open As option with ASCII selected in the File Open dialog to overwrite the Force UTF-8=1 setting for such files.

      I have sent an enhancement request to IDM support by email for supporting also HTML5 character set declarations. Best you do the same so that request count is already 2. The more users request an enhancement, the higher becomes the priority for being implemented.

      2

        Apr 04, 2013#3

        A very complete answer, thanks.

        I will submit the request.
        thanks

        6,603548
        Grand MasterGrand Master
        6,603548

          Feb 12, 2017#4

          UTF-8 charset/encoding detection was enhanced in UltraEdit for Windows v24.00 and UEStudio v17.00. Supported are now:

          HTML4

          Code: Select all

          <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
          <META http-equiv="Content-Type" content="text/html; charset=utf-8">
          <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
          <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
          <META http-equiv='Content-Type' content='text/html; charset=UTF-8'>
          <META http-equiv='Content-Type' content='text/html; charset=utf-8'>
          <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'>
          <meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
          
          And of course the charset declarations in XHTML files with / before > are supported, too.

          HTML5

          Code: Select all

          <META charset="UTF-8">
          <META charset="utf-8">
          <meta charset="UTF-8">
          <meta charset="utf-8">
          <META charset='UTF-8'>
          <META charset='utf-8'>
          <meta charset='UTF-8'>
          <meta charset='utf-8'>
          <META charset=UTF-8>
          <META charset=utf-8>
          <meta charset=UTF-8>
          <meta charset=utf-8>
          
          XML

          Code: Select all

          <?xml version='1.0' encoding="UTF-8">
          <?xml version='1.0' encoding="utf-8">
          <?xml version='1.0' encoding='UTF-8'>
          <?xml version='1.0' encoding='utf-8'>
          
          And of course UTF-8 encoding declaration in XML files of version 1.1 is supported too.
          Best regards from an UC/UE/UES for Windows user from Austria