I answer the second question first.
UltraEdit for Windows v24.00 and UEStudio v17.00 and all later versions are full Unicode aware applications supporting scripts being encoded with:
- ANSI using system code page for GUI applications or
- UTF-8 without or with BOM or
- UTF-16 Little Endian without or with BOM.
This means also on execution of the script that Unicode characters
But it is also possible to run finds/replaces/find in files/replace in files with characters not supported by system code page in previous versions of UltraEdit supporting only ANSI encoded scripts by using Perl regular expression and encode those characters with their hexadecimal values. I wrote the script UnicodeStringToPerlRegExp.js
for converting a Unicode character sequence or UTF-8 byte stream or even a ANSI character stream into a Perl regular expression string. So a Perl regular expression replace in files can search for bytes of UTF-8 encoded characters and replace them by bytes of a UTF-8 encoded replace string. For example UTF-8 encoded ä
can be searched with \xC3\xA4 and replaced by UTF-8 encoded Ω
using \xE2\x84\xA6 without using the Use encoding
A script file can be also UTF-8 encoded with UE < v24.00 and UES < 17.00 on having no BOM (byte order mark) and UTF-8 encoded characters exist only in comments or finds/replace strings. Please see topic UltraEdit.clipboardContent not supporting Chinese characters?
for more details on what is possible regarding to Unicode characters in UltraEdit scripts with UE < v24.00 and UES < v17.00.
The first question was interesting as nobody has asked that before. I created quickly a script which creates an ANSI, a UTF-8 and a UTF-16 encoded file with just a few very short lines on execution with two characters with a code value greater decimal 127. Then I tested multiple simple, non regular expression Replace in Files
all executed manually with Use encoding
set to Auto-detect
to look how those replaces work on the three different encoded files.
I was astonished to see with UE v220.127.116.11 that characters were replaced by UTF-8 encoded characters in very small ANSI encoded file resulting in having finally ANSI and UTF-8 encoded characters in that file. But this happens only on very small ANSI encoded file with just 208 bytes. The same characters in same ANSI encoded character block in a larger ANSI encoded file with 119 KB were correct replaced and are encoded in ANSI after replace. It looks like the Auto-detect
encoding setting of Find/Replace in Files
needs a certain amount of bytes in a file to correct detect if a file is ANSI and not UTF-8 encoded.
The UTF-8 encoding of UTF-8 encoded file with just 211 bytes and no BOM was always correct detected and updated by the Replace in Files
executed manually by me.
And UTF-16 LE encoded file with BOM was also always correct updated by all Replace in Files
I have to find out with more experiments which amount of bytes is required by UltraEdit to detect ANSI encoding in small ANSI encoded files on running a Find/Replace in Files with enabled Use encoding
set to Auto-detect
It was also interesting for me that the small ANSI encoded file with just 208 bytes was opened always as ANSI and never as UTF-8 encoded. So Replace in Files
encoding auto-detection works a bit different than the encoding auto-detection on opening a file which was not expected by me.
Next I recorded the Replace in Files
executed manually with UE v18.104.22.168 into a macro and played the recorded macro after restoring the different encoded files back to original contents. That worked as expected and produced the same file contents as the manually executed Replace in Files
I looked on macro code and could see value -2
for option Auto-detect
So I modified the initially created script and added the Replace in Files
with exactly the same options as used manually before and recorded into the macro. The two encoding options were written by me into the script file as:
Code: Select all
But that was no good idea because of UltraEdit crashed on script execution on executing the first UltraEdit.frInFiles.replace
with those parameters. I restarted UltraEdit and executed the script again and UltraEdit crashed again. Of course I will report this crash by email to IDM support.
Conclusion: Encoding option Auto-detect
on usage of option Use encoding
is currently not possible in an UltraEdit script, only in an UltraEdit macro or manually.