I have downloaded and installed (the first time) also the German version of UltraEdit and could reproduce what you have find out with German as well as with English version of UE. Selecting UTF-8 in status bar for Windows-1252 encoded file results in a conversion from 1252 to UTF-8 encoding with of course not changing the encoding information in first line of XML file. The XML manager fails now to parse the file correct. Switching back in status bar from UTF-8 to code page 1252 and pressing F5 in XML manager view results in a correct parsed file.
I'm using usually not enhanced (standard) status bar, but instead the basic status bar not offering to switch the encoding via status bar. Therefore I have not notice before that switching code page in status bar from Windows-1252 to UTF-8 results in execution of command
ASCII to UTF-8. So the file is changed as indicated on file tab. That is very interesting as English UltraEdit describes on help page
Status Bar:
English UE help wrote:Encoding Type
The Encoding Type control allows users to change the encoding used to display the active file. This does not actually affect the underlying content of the file. No conversion is done. This merely changes the encoding used to display the file in the editor.
That is definitely not correct as selecting UTF-8 for a single byte encoded text file (
Windows-1252 is not really an ANSI standard) results in execution of conversion command
ASCII to UTF-8.
I will report this to IDM by email.
Further,
View - Set Code Page is identical to code page in enhanced status bar. But if a Unicode encoding is selected and the file is therefore really a Unicode encoded file, the code page setting is of no importance for the display of the text or the editing. What is selected in
View - Set Code Page does not matter if the file itself is encoded with UTF-8, UTF-16 little endian, UTF-16 big endian, or ASCII Escaped Unicode. This setting is only important for a Unicode file if
File - Conversions - UTF-8 to ASCII or one of the other Unicode to ASCII conversion commands is executed as in this case UltraEdit needs to know to which code page the Unicode file should be converted. This is not described at all in help of UltraEdit.
I will send another email to IDM support with the suggestion to explain View - Set Code Page better, especially what this setting is for if the file is a Unicode file.
There is not really one code page for editing and another one for displaying of the text. There is just one code page per file which is only important for text files not being Unicode encoded (as long as no conversion is done). But as not every font supports every code page, the user has to select a font or the right "script" (code page) when switching the code page for a single byte encoded text file. Of course for Unicode encoded files it is also necessary to select an appropriate font. Most Unicode encoded text files in Eastern Asian languages need a different font than usually selected in Western European countries for viewing/editing the text as their characters are simply not present in Windows-1252 code page or in most fonts installed on Windows computers in Western European countries.
On Linux the problem with code page versus font is nowadays made not visible by using always UTF-8 (Unicode) editing and having by default a font set which works for all Unicode characters North American and Western European users usually use from Unicode set. The same is true for Windows for North American and Western European countries. A font like Courier New or Consolas usually set contains all Unicode characters usually used in North American and Western European countries and so most users in those countries do not even know that bytes in a file is not equal character displayed and there are many, many different encodings. What is an encoding? A text file is a text file, isn't it? No, it is not as IDM explained in power tip
Working with Unicode in UltraEdit/UEStudio.
What I do not understand is why UTF-8 was selected for
config.xml as this was not the case as I opened this file in English and German UltraEdit.
Do you have manually selected UTF-8 although encoding declaration in first line of the
config.xml is Windows-1252?
Note: UltraEdit remembers in uedit32.ini a manually selected code page for a file for next opening. This information can be removed by using button
Clear history at
Advanced - Configuration - Toolbars / Menus - Miscellaneous which clears this manual code page selection information as well as all other histories stored in uedit32.ini. I use this button at least once per month.
rblock wrote:BTW, I really hate those dialog for selecting the codepage because the entries are not sorted and it is not possible to search for substrings to find the right one.
Do you mean
View - Set Code Page?
The list in this very small dialog is sorted alphabetically, but not numerical. The list is more or less the same as when looking on code page conversion tables in region and language settings of Windows with following exceptions:
- The currently set code page for the file is always at top of the list. I don't know the reason. But this behavior is good in case of selecting something different in list and than the user decides: No, I do not want to change the code page. What was selected before? Ah, yes, the code page at top of the list.
- Code page 28603 which is ISO/IEC 8859-13. This is on my German Windows XP listed as last but two entry. Interesting is that this code page is not listed at all in Windows region and language settings which is most likely the reason why this code page is no between code page 28599 and 28605 in the list.
- The "code pages" 65000 (UTF-7) and 65001 (UTF-8) which are listed at the end as they are not really code pages, see Is codepage 65001 and utf-8 the same thing?
I agree that the grouped listing as used for encoding selection via status bar is better. But I think, this list is hard coded in UltraEdit while the list in code page selection dialog is filled by calling a Windows kernel function.