First, thank you for the information that the font
Segoe UI Emoji is used by you to get displayed the emoticons in Unicode encoded text files at all.
I am still using at the moment only Windows XP and Windows 7 which both have not installed this font by default.
There is planned an upgrade to Windows 10 for next month of my main office computer. I have to upgrade the main office computer although I don't want that. Windows 10 will not derive some advantages for my daily work. I am pretty sure that with the upgrade to Windows 10 my daily work will become less efficient with using the same hardware because of Windows 10 is less efficient than Windows 7 which was already less efficient than Windows XP for my daily paid work.
You have done an impressive work to find and document all issues caused by the fact that currently latest version 26.20.0.68 of UltraEdit for Windows and currently latest version 19.20.0.44 of UEStudio do not support correct editing of Unicode characters encoded with two 16-bit code units called a surrogate pair. The currently latest versions of UltraEdit and UEStudio support correct loading, editing and saving UTF-8 and UTF-16 encoded Unicode files containing code points from the supplementary planes, as long as those characters and symbols are not deleted, modified, inserted, searched or replaced by the user in such Unicode text files. I am really impressed by how much time you invested in this work. I would have asked first IDM support by email if current version of UltraEdit supports surrogate pairs at all. This is obviously not the case.
I know that Visual C/C++ is used by IDM Computer Solutions, Inc. for UltraEdit for Windows and for UEStudio. I know that from inspecting the executable files of UltraEdit. I can see on the executables that Visual Studio 2017 is used as compiler for UE v26.20 and UES v19.20.
The Visual Studio 2017 documentation page
Multibyte and Wide Characters referenced from VS2017 documentation page
Multibyte Characters of VS2017 documentation chapter
Characters describes that
wide characters are multilingual character codes that are always 16 bits wide. I like more the Microsoft documentation page
Working with Strings because it offers more useful information. This page contains similar to the other documentation page specific for Visual Studio 2017 the information:
Windows represents Unicode characters using UTF-16 encoding, in which each character is encoded as a 16-bit value.
Well, we both know that this statement is wrong
nowadays. The characters from a supplementary plain are encoded with two 16-bit values using a surrogate pair. It looks like UltraEdit uses the string library of Visual Studio 2017 without any additional code to support Unicode characters encoded with a surrogate pair. I found also the discussion
Wide strings vs UTF-16 strings very informative on this topic although it is from March 2014 and therefore most likely not up-to-date anymore.
I don't know if Visual Studio 2019 introduces a string library using 32-bit per character. I suggest to search in world wide web for an appropriate information if you are interested in. But I doubt that because I think it would cause an immense compatibility problem with all the Windows 7/8/8.1/10 libraries including the Windows kernel libraries.
You wrote that Windows Notepad supports emoticons. But you have not written which version of Windows Notepad supports emoticons encoded in a Unicode encoded text file with four bytes. Microsoft has not made any enhancements on Windows Notepad for many years. But Microsoft implemented enhancements on Windows Notepad starting with Windows version 10.0.17666 as it can be read in Wikipedia article about
Windows 10 version history. More enhancements are made by Microsoft on Windows Notepad in the Windows 10 versions 10.0.17713 and 10.0.18298. In other words Windows Notepad of Windows 10 1809 and later Windows 10 versions are different to Windows Notepad of Windows 10 1803 and former versions of Windows 10. But even knowing the exact Windows 10 version is perhaps not enough because of the information
Notepad updates is now available via Microsoft Store listed as highlight on Windows 10 version 10.0.18963. I don't have access at the moment to a Windows 10 machine. But if Windows Notepad updates are available now via Microsoft Store, the users of Windows 10 with a version lower than Windows 10 1809 could have the possibility to update their Windows Notepad, too. Text writers and developers creating text files with code should taken into account that not all users of Windows have installed anymore the same version of Notepad. So such people have to take care that a text file written by them or created by an application written by them really looks well on being opened by a user with Windows Notepad not being the currently latest version released by Microsoft.
Note: Microsoft releases every 6 months a completely new compiled version of Windows 10. All files installed by default
with Windows 10 into directory
%SystemRoot% and its subdirectories are replaced by Windows 10 on
upgrading the Windows 10 version. For that reason it is very often not enough to just write on reporting an issue that
Windows 10 is used as operating system. It is quite often very important to know
which version of Windows 10 is used by the user on reporting an issue. On Windows XP and all newer Windows versions it is possible to click on Windows Start button and execute
winver. That is a very small executable with full qualified file name
%SystemRoot%\System32\winver.exe which shows in a GUI window the Windows version with all additional information usually needed on reporting an issue. The command
ver can be executed in a
Windows command prompt window to get the exact version string of Windows which could be very important especially on using Windows 10 and reporting an issue.
Further I know from looking on files in program files folder of UltraEdit for Windows and UEStudio that the currently latest versions of UE/UES use the
ICU - International Components for Unicode C++ library version 64.2 which is also used by many other software companies for their applications. However, the library uses most likely (not verified by me) standard C++ wide character strings and their appropriate string functions and so it depends on the used compiler and its string library if a wide character is handled in memory with 16 or with 32 bits. I don't use UltraEdit for Linux and UltraEdit for Mac, but if UEX and UEM are compiled with GCC, it could be that UEX and UEM support fully all Unicode characters including those in supplementary planes with including correct behavior on inserting, deleting, modifying, searching and replacing those characters.
What could be done by you now?
You have invested a lot of time to find and document many (definitely not all) issues caused by missing full support of code points encoded in UTF-16 using a surrogate pair. So I suggest to report those issues to IDM support by email.
What could be done by IDM Computer Solutions, Inc. now regarding to missing full support of of code points encoded using a surrogate pair?
- IDM could do nothing.
- IDM could explicitly declare that Unicode characters encoded with four bytes are not fully supported for editing by UltraEdit for Windows and UEStudio and change nothing on code of UltraEdit for Windows and UEStudio.
- IDM could change to a different compiler using 32-bit wide characters. But that would be definitely an extremely time consuming (months or even years) and so very expensive work. It has also the disadvantage that the memory usage for editing Unicode encoded text files (or all text files) would double whereby it should not be forget that not only the file content must be kept in memory (partly on very large files), but also the undo history, all views and lists showing strings from active file or a set of files, etc. must store strings in memory with 32-bit per character. I doubt that UltraEdit and UEStudio can be kept compatible to all the Microsoft libraries currently used by UE/UES on changing to a compiler using 32-bit wide characters. It would be most likely necessary to exchange nearly the entire code to a completely different framework like all the applications which are written primary for Linux and are ported to Windows using a framework which is not written by Microsoft for Windows (like Eclipse based on Java). That would dramatically decrease the performance of UltraEdit and UEStudio on Windows.
- IDM could add lots of extra code to handle surrogate pairs in UTF-16 text data stream. On every caret movement by the arrow keys, commands like Goto, Select word, Select range, etc. the extra code would need to evaluate every single 16-bit value in UTF-16 text data stream in range depending on used command if two 16-bit values build a surrogate pair of a Unicode character outside base multilanguage plane. That would result in an incredible performance loss on working with text files in UltraEdit. And there would be remaining the problem with Unicode characters of a supplementary plane which must be displayed in any other view than the file window or stored in any other file than the opened file like the find/replace history stored in INI file.
There could be even more possibilities, but those are the fours I could imagine.
Only the first two possibilities are realistic in my opinion. The other two possibilities are extremely expensive on implementation (time and money if the developers are paid for the work) and very risky as lots of bugs cannot be avoided making users definitely not happy and are most likely also the opposite of what 99.999999% of all UE/UES users would like because of UltraEdit/UEStudio would not be anymore a very efficient native Windows text editor respectively IDE.
UltraEdit is for
my daily work on text files the best text editor and
UEStudio is for
my daily programming tasks the best IDE with the exception that Visual Studio is the better IDE on searching for an error in code in a Windows GUI application coded by me using integrated debugger of Visual Studio. But I wrote in the last years mainly code for embedded devices where UEStudio is best for me as I can use the same customized development environment for various controllers and processors. Small Windows console applications compiled with Visual Studio are usually also debugged by me in Visual Studio if that is necessary at all which is not often the case.
UltraEdit is most likely not the best text editor for people who have to edit small Unicode encoded text files containing characters being encoded with four bytes because of not being assigned to base multilanguage plane and which really have to touch such characters in the Unicode encoded file. So my advice for such people is to use an other application for this text editing task which supports this very special text editing task better than UltraEdit.