Sometimes it is necessary to create an UltraEdit macro or script using finds/replaces for ANSI/Unicode strings.
ANSI strings are strings consisting of single byte characters with a code value in range 0 to 255 whereby which character is displayed for a value in range 128 to 255 depends on currently active code page. Therefore a Find/Replace in Files could be a problem when using ANSI characters in search/replace strings and not all files containing single byte characters make use of the same code page. But files of same type (= same file extension) with only 1 byte per character are usually of same code page and therefore Finds/Replaces with ANSI characters are no problem.
More problematic are finds/replaces with Unicode characters having a code value greater than 255. An UltraEdit script must be an ASCII/ANSI file prior UltraEdit for Windows v24.00 and UEStudio v17.00. (A UTF-8 encoded Unicode script is parsed by JavaScript in UE < v24.00 and UES < v17.00 like an ASCII/ANSI file which means no conversion of the UTF-8 coding sequences for characters with a value greater than 127 to the appropriate Unicode character.) Also the Edit Macro dialog of UltraEdit < v24.00 and UEStudio < 17.00 supports only ASCII/ANSI strings and the binary macro storage format in these versions of UltraEdit/UEStudio has also problems with Unicode strings.
The solution for rarely needed ANSI/Unicode finds/replaces in UE < v24.00 and UES < v17.00 is the usage of the Perl regular expression engine even for simple Unicode finds/replaces. The Perl regular expression engine supports \x[0-9a-f][0-9a-f] for specifying a character by its code value in range of 0 to 255 in hexadecimal with a two digit hexadecimal value as well as \x{[0-9a-f][0-9a-f][0-9a-f][0-9a-f]} for specifying a Unicode character by its code value in hexadecimal with a four digit hexadecimal value. A-F can be also used.
For example the Perl regular expression [\x00-\x08\x0E-\x1F] finds a control character usually not present in a text file. The characters with code value 09 to 0D often exist in text files as these are the horizontal tab, line-feed, vertical tab (rarely used), form-feed (page break) and carriage return.
Another example is [\x{2200}-\x{22ff}] which finds mathematical operators in a UTF-16 encoded Unicode file.
UltraEdit has the command Search - Character Properties (Alt+RETURN) to get code value of the character at current position of the caret in hexadecimal. But if multiple ANSI/Unicode characters are in a search/replace string, the manual conversion takes a lot of time.
Solution: The script UnicodeStringToPerlRegExp.js available for download at Macros & Scripts which converts the selected string into a Perl regular expression string with characters in range 128 to 255 encoded as \x[0-9a-f][0-9a-f] and characters with a code value greater than 256 encoded as \x{[0-9a-f][0-9a-f][0-9a-f][0-9a-f]} copied to operating system clipboard. ASCII characters in range 0 to 127 are not modified which means Perl regular expression characters are copied to the clipboard without any modification.
Note: The script does not support a conversion from character to UTF-8 coding sequence which would be needed only when running a Find or Replace in Files in not opened UTF-8 encoded files from within a script or macro.
The line and block comments can be removed from script file by running a replace all (from top of file) searching with Perl regular expression for ^ *//.+[\r\n]+|^ */\*[\s\S]+?\*/[\r\n]+| +//.+$ and using an empty replace string. The first part in this OR expression with three arguments matches entire lines containing only a line comment, the second part matches block comments, and third part matches line comments right to code. Removal of the comments makes the usage of this script more efficient on using it often because of JavaScript interpreter has to interpret less characters and lines.
ANSI strings are strings consisting of single byte characters with a code value in range 0 to 255 whereby which character is displayed for a value in range 128 to 255 depends on currently active code page. Therefore a Find/Replace in Files could be a problem when using ANSI characters in search/replace strings and not all files containing single byte characters make use of the same code page. But files of same type (= same file extension) with only 1 byte per character are usually of same code page and therefore Finds/Replaces with ANSI characters are no problem.
More problematic are finds/replaces with Unicode characters having a code value greater than 255. An UltraEdit script must be an ASCII/ANSI file prior UltraEdit for Windows v24.00 and UEStudio v17.00. (A UTF-8 encoded Unicode script is parsed by JavaScript in UE < v24.00 and UES < v17.00 like an ASCII/ANSI file which means no conversion of the UTF-8 coding sequences for characters with a value greater than 127 to the appropriate Unicode character.) Also the Edit Macro dialog of UltraEdit < v24.00 and UEStudio < 17.00 supports only ASCII/ANSI strings and the binary macro storage format in these versions of UltraEdit/UEStudio has also problems with Unicode strings.
The solution for rarely needed ANSI/Unicode finds/replaces in UE < v24.00 and UES < v17.00 is the usage of the Perl regular expression engine even for simple Unicode finds/replaces. The Perl regular expression engine supports \x[0-9a-f][0-9a-f] for specifying a character by its code value in range of 0 to 255 in hexadecimal with a two digit hexadecimal value as well as \x{[0-9a-f][0-9a-f][0-9a-f][0-9a-f]} for specifying a Unicode character by its code value in hexadecimal with a four digit hexadecimal value. A-F can be also used.
For example the Perl regular expression [\x00-\x08\x0E-\x1F] finds a control character usually not present in a text file. The characters with code value 09 to 0D often exist in text files as these are the horizontal tab, line-feed, vertical tab (rarely used), form-feed (page break) and carriage return.
Another example is [\x{2200}-\x{22ff}] which finds mathematical operators in a UTF-16 encoded Unicode file.
UltraEdit has the command Search - Character Properties (Alt+RETURN) to get code value of the character at current position of the caret in hexadecimal. But if multiple ANSI/Unicode characters are in a search/replace string, the manual conversion takes a lot of time.
Solution: The script UnicodeStringToPerlRegExp.js available for download at Macros & Scripts which converts the selected string into a Perl regular expression string with characters in range 128 to 255 encoded as \x[0-9a-f][0-9a-f] and characters with a code value greater than 256 encoded as \x{[0-9a-f][0-9a-f][0-9a-f][0-9a-f]} copied to operating system clipboard. ASCII characters in range 0 to 127 are not modified which means Perl regular expression characters are copied to the clipboard without any modification.
Note: The script does not support a conversion from character to UTF-8 coding sequence which would be needed only when running a Find or Replace in Files in not opened UTF-8 encoded files from within a script or macro.
The line and block comments can be removed from script file by running a replace all (from top of file) searching with Perl regular expression for ^ *//.+[\r\n]+|^ */\*[\s\S]+?\*/[\r\n]+| +//.+$ and using an empty replace string. The first part in this OR expression with three arguments matches entire lines containing only a line comment, the second part matches block comments, and third part matches line comments right to code. Removal of the comments makes the usage of this script more efficient on using it often because of JavaScript interpreter has to interpret less characters and lines.