Today I have had time to look into the UTF-8 problems you described.
First what I think is the reason why you and other users think UltraEdit has problems handling UTF-8 files without BOM.
A UTF-8 file without BOM is 100% binary identical with an ASCII file, if it does not contain at least 1 character with code value greater than 0x7F (decimal 127) and must be saved therefore in a UTF-8 encoded file with 2 to 4 bytes like the German umlauts äöü. So if a file without BOM does not contain any multi-byte character, it is interpreted as ASCII file and this is 100% correct.
But UltraEdit is handling the
Character encodings correctly. If the file contains either the string
charset=utf-8 (HTML, PHP, ASP, ...) or
encoding="utf-8" (XML) at top of the file in the first few KB on using UltraEdit for Windows < v24.10 or UEStudio < v17.10, UE is handling the file as UTF-8 file independent of the existence of a UTF-8 multi-byte character. So although for example an English webpage does not contain any character encoded in UTF-8 and so could be also an ASCII file, UE is nevertheless loading and handling it as a UTF-8 file if it contains one of the 2 encoding specification strings.
I tested saving a new file with Ctrl+S and with Save As with the format UTF-8 - NO BOM. I can't see a difference. I have done following tests:
1) Open a new file and save it immediately with Ctrl+S with format UTF-8 - NO BOM. As everybody can see in the status bar of UltraEdit, UE is still handling the file as ASCII file and not as UTF-8 file. This is correct, because the 0 byte file is according to the international encoding standard still not a UTF-8 file. You maybe have a different opinion here because you think, "I have specified it as UTF-8, so UE should handle it as UTF-8 until saved and re-opened". But this is not correct according to the international encoding standard because you have not really specified it as UTF-8 file.
2) Open a new file and save it immediately with Save As with format UTF-8 - NO BOM. Same result as at 1), the empty file is still an ASCII file.
3) Open a new file, enter a few ASCII characters all with a hexadecimal code lower than 0x80 and save the new file with format UTF-8 - NO BOM. According to the status bar UE handles it still as ASCII file. According to the international encoding standard this is correct even if you think it is a bug. It isn't. The cursor position is not changed after this first save. There is no difference between Ctrl+S and Save As because first save of a new file always opens the Save As dialog.
4) Open a new file, enter a few ASCII characters and also at least 1 character with a hexadecimal code higher than 0x80 like Ä and save the new file with format UTF-8 - NO BOM. According to the status bar UE handles it now as a UTF-8 file (U8-DOS with my settings).
You can see what really happens in this situation if you look the file content temporarily in hex mode before save and temporarily also look it again in hex mode after save.
Attention: Do not save the new file while you are in hex mode. Just enable the hex mode temporarily before save and after save.
The file is converted from 1 byte per character before save to a Unicode UTF-16 LE file with BOM and 2 byte per character after save. The cursor position has changed to top of the file after the first save because of the automatic conversion in background.
5) Open a new file, enter a few ASCII characters and also the string
charset=utf-8 and save the new file with format UTF-8 - NO BOM. According to the status bar UE handles it now as a UTF-8 file although the file does not contain any character which is really encoded as multi-byte character. The file is also converted and handled temporarily now after save as UTF-16 LE with BOM. The cursor position has changed to top of the file after the first save because of the automatic conversion in background.
Conclusion: UltraEdit handles new files as UTF-8 files 100% correct according to the international encoding standard.
Update: Since UE v17.30.0.1011 any conversion executed from
File - Conversions requiring a change in line termination or an ASCII/ANSI to Unicode or Unicode to ASCII/ANSI conversion is done immediately on the file and not anymore on next save. And since UE v19.00 the encoding of a file can be changed directly via the
Encoding Type control in status bar at bottom of UE main window for active file as long as basic status bar is not used.
Johna and you have 2 problems caused by "wrong" UTF-8 handling.
It is not possible to open a file which does not contain whether a correct encoding specification nor at least 1 multi-byte character and insert by keyboard or paste from clipboard now characters which must be encoded in UTF-8. If these characters don't have an ANSI equivalent in the selected code page of the currently used font (a single byte with code value lower hexadecimal 0x100), you will not see those characters correctly.
The file is loaded as ASCII file according to the international encoding standard. As long as you do not convert it manually to a UTF-8 file (in real temporarily to a Unicode UTF-16 LE file), you cannot insert or paste characters which simply need 2 bytes.
And the second very similar problem is, that you can also not insert multi-byte encoded characters into a new file as long as it is not a real UTF-8 file according to the international encoding standard which is correctly indicated in the status bar of UltraEdit.
The second problem can be easily avoided. UTF-8 is a byte optimized version of Unicode. So if you most of the time want to create new files in UTF-8 format, enable the option
Configuration - Editor - New File Creation - Always create new files as UNICODE. Now a new file is by default a UTF-16 LE file as every loaded UTF-8 file is also while editing. With the format UTF-8 - NO BOM in the Save As dialog the new file is then automatically saved as you want. The 5 tests above has been done with this option not checked to make it more difficult for UE as necessary.
It's correct that templates cannot contain characters which must be saved with 2 bytes because they have no single byte equivalent. The template file of UltraEdit is still a binary file where only single byte characters are possible. Changing the format of the template file just for support of a few 2 byte characters would be a hard work. You have to take also all the thousands of existing template files of UltraEdit users into consideration which are already satisfied with the current format. And the downwards compatibility will be also lost. I think you will understand now why IDM will not change the format of the template file because a few users think, they need it.
And you don't really need it. Write your templates for a new PHP, CSS, HTML, ... but don't forget to add to the template also the correct encoding specification. The templates must not have a special character, only the correct encoding specification.
Then you can use the templates on new files and after first save with the format UTF-8 - NO BOM the file is automatically converted by UltraEdit to UTF-8 (UTF-16 LE). But don't forget, first save the new UTF-8 file with no BOM but with the encoding specification before you insert manually or from clipboard a character which must be encoded with 2 bytes. Best is to use 1 or more macros for that job. An example:
InsertMode
ColumnModeOff
HexOff
NewFile
Template 4
SaveAs ""
or without immediately saving the new file
InsertMode
ColumnModeOff
HexOff
NewFile
ASCIItoUnicode
Template 4
And the Format selected in the Save As dialog is UTF-8 - NO BOM.
ASCIItoUnicode is only needed if
Always create new files as UNICODE is not checked.
Template 4 for example contains your standard body for new PHP files with the charset=utf-8 encoding specification string in the HTML header. I should add, that UltraEdit is not examining where either
charset=utf-8 or
encoding="utf-8" is found. If the string is for example inside a PHP comment, UltraEdit will also interpret it as valid encoding specification. Don't know if the PHP interpreters or the browsers except the encoding specification only in the correct environment or also anywhere in the file like UltraEdit.
Update: With smart templates introduced with UE v18.00 it is also possible to create templates with Unicode characters as the templates are stored now in XML using a text encoding supporting all UTF-16 encoded characters.
To your last question: No it is not possible to add macros to the menu or a toolbar. But I never missed it because there is the macro list view at
View - Views/Lists - Macro List. Activated by a click on it in the menu or by a hotkey you have assigned to this command or by a click on its symbol in the toolbar after you have added this command to the toolbar, it opens the macro list in a docked or floating window as you have specified it on last usage. Then you will see all the macros of the macro file currently loaded and you can run the macro you currently need to create a new PHP or a new CSS or a new ??? file with a double click or with the Return/Enter key if a macro in the macro list has the focus.
What I think IDM could do to help webpage writers who use UTF-8.
First an ASCIItoUTF8 macro command could be very helpful
(available since UE v17.30).
Second a file loading configuration option like "Create and load ASCII files as UTF-8" would be helpful for some users like you
(available undocumented since v11.10c, read below).
With such an option checked a new and also an existing ASCII file is automatically loaded and handled as UTF-8 file without BOM (internally in UE as UTF-16 LE) and so saved also as UTF-8 file without BOM. A real ASCII file without any character with a code higher 0x7F will be after closing still an ASCII file and not a UTF-8 file, it it still does not contain the UTF-8 encoding specification.
I have never requested the macro command and the configuration option, because I personally don't need it. Especially the configuration option would never be checked by me because I rarely edit or create UTF-8 files, but daily work with ASCII files with characters with a code greater 0x7F - German characters äüöÄÖÜß with OEM or ANSI code.
So if there are webpage writers who would need these 2 things, they all should write an appropriate feature request email to IDM support.
My suggestions for the configuration for UTF-8 webpage writers:
First read the
FAQ about UTF-8, UTF-16, UTF-32 & BOM and the
Character encodings to get the basic knowledge you need.
Second in UltraEdit or UEStudio open
Configuration - File Handling and set following options:
Conversions
Uncheck the 2 EBCDIC options if you are not editing EBCDIC files, but check the option
On Paste convert line ending to destination type (UNIX/MAC/DOS).
DOS/UNIX/MAC Handling
Set the
Default file type for new files to whatever you prefer. If your host server is a Linux/Unix server, you should use
Unix to avoid problems while downloading or uploading via FTP. If your host server is a Windows server, use
DOS.
Set the
Unix/Mac file detection/conversion to
Automatically convert to DOS format to avoid problems with copy and paste with other windows applications.
Uncheck
Only recognize DOS terminated lines (CR/LF) as new lines for editing.
Save
Uncheck
Write UTF-8 BOM header to ALL UTF-8 files when saved.
If
Write UTF-8 BOM on new files created within this program (if above is not set) should be enabled or not depends on the type of Unicode files you are creating. If you create for example only XML and HTML type files (HTML, HTML, PHP, ASP, ...) in UTF-8, you should uncheck this option, because then the encoding should be defined inside the file with
encoding="utf-8" (XML) or with
content="text/html; charset=utf-8" (HTML). See FAQ above for details about BOM and when it should be used.
Enable
Save file as input format (UNIX/MAC/DOS). That's important because we convert every file automatically to DOS for editing, but we want to save it in the original format and not in DOS format. This option is moved from the Save to the DOS/UNIX/MAC Handling configuration dialog in v12.10 of UltraEdit!
You can set option
Trim trailing spaces on file save to whatever you prefer. Normally it is good to activate it because it can reduce the file size a little bit which is interesting for HTML files.
Temporary Files
Use the second option
Open file without temp file but prompt for each file and set the
Threshold for example to 4096 (4 MB). You can set the threshold value to a higher value if your computer has enough performance and your hard disk is fast and you often edit large files.
Unicode/UTF-8 Detection
Enable
Auto detect UTF-8 files,
Detect Unicode(UTF-16) files without BOM and
Detect ASCII/ANSI files with Escaped Unicode. You can disable for example the UTF-16 detection if you are sure that you will never edit a UTF-16 file. Every enabled detection increases the file load time of normal ASCII files. But if you don't know what format your files have, it is better to let UE/UES automatically detect it.
The 3rd option
Disable automatic detection of HEX file format on reload is not important for handling Unicode files.
And as already explained above also enable the option
Always create new files as UNICODE at
Editor - New File Creation.
Last if you download/upload the files via the FTP client of UE/UES, always use the binary transfer mode and not the text mode. If your files on your Apache (Unix/Linux) host server are already Unix files, than UE/UES is converting a file temporary for editing only into DOS after loading from FTP and before opening in the editor and before saving back to Unix with the settings above. So there is no need to do it while transferring the file content. Local copies are then also Unix files and so are 100% identical with the files on the server. Using binary transfer mode is faster than the text/ASCII mode. Even if you don't use the FTP client of UE/UES and use a different FTP tool, you should always create and edit files with Unix line termination and use the binary transfer mode and the automatic conversion to DOS feature of UE/UES except your host server is a Windows server.
Added on 2009-11-09: I have found an undocumented setting in uedit32.exe of v11.10c and later. With manually adding to uedit32.ini
[Settings]
Force UTF-8=1
you can force all non Unicode files (not UTF-16 files) to be read/saved as UTF-8 encoded files. But new files are nevertheless created and saved either as Unicode (UTF-16 LE) or ASCII/ANSI files. So this special setting is only for already named files. However, creating a new file in ASCII/ANSI, save it with a name, close it and re-open it results in a new file encoded in UTF-8. Be careful with that setting. Even real ANSI files are loaded with this setting as UTF-8 encoded file causing all ANSI characters to be interpreted wrong.
Added on 2010-03-28: With UltraEdit v16.00 instead of
Create new files as Unicode there are now the choices
Create new files as ANSI
Create new files as UTF-8
Create new files as UTF-16
at
Advanced - Configuration - Editor - New File Creation. Therefore users of UltraEdit 16.00 and later can set the default encoding for new files to UTF-8. With this change the option
Format of the Save As dialog is not remembered and preset anymore in UE v16.00 and later.
Format of the Save As dialog is now always set to
Default on opening of the dialog.