Function IsUnicode to detect Unicode file after opening it

Function IsUnicode to detect Unicode file after opening it

6,675585
Grand MasterGrand Master
6,675585

    Oct 24, 2007#1

    Hello script writers!

    As discussed at Selection only returns first character there is currently (see post date) a problem with Unicode files.

    The workaround posted by in0de using

    Code: Select all

    UltraEdit.activeDocument.ASCIIToUnicode();
    UltraEdit.activeDocument.unicodeToASCII();
    does not work on Windows 98. Don't know why. However, it is not a fine solution.

    Here is a much better one. A function which detects if UltraEdit or UEStudio has loaded a file as Unicode or not. By using this function after opening a file it is possible to convert to ASCII only if required because the file is really a Unicode file.

    Code: Select all

    if (IsUnicode()) UltraEdit.activeDocument.unicodeToASCII();
    If you want to report mistakes or have suggestions for further enhancements post a message here.

    The script file IsUnicode.js with the code can be viewed or downloaded from the Macros & Scripts page.

    The line and block comments can be removed from entire script or function IsUnicode by running a replace all (from top of file) searching with Perl regular expression for ^ *//.+[\r\n]+|^ */\*[\s\S]+?\*/[\r\n]+| +//.+$ and using an empty replace string. The first part in this OR expression with three arguments matches entire lines containing only a line comment, the second part matches block comments, and third part matches line comments right to code. Removal of the comments makes the usage of this script or just the function in other scripts more efficient because of JavaScript interpreter has to interpret less characters and lines.

    Many comments were improved on 2018-07-09 as well as a message displayed either with a message box or written to output window..

      Feb 11, 2009#2

      The script file with the function IsUnicode was updated on 2009-02-10.

      I have changed slightly all variable names by adding a prefix letter for the type of the variable. Advantages of using a type prefix letter:
      • The type of a variable is always visible which is a great help.
      • No problem anymore with variable names similar common words used in comments or strings, especially for searching/replacing such variables. For example sDirectory as variable name is much better than just Directory because word Directory can also exist in comments and strings.
      • It is easy to search for all string, number or boolean variables if using a type prefix letter (always lowercase) and the variable name starts with an uppercase character. For example the case sensitive regular expression search string s[A-Z][A-Za-z]+ with option Match Whole Word Only finds all string variables in the script file.
      • Using type prefix letters makes the selection of an existing variable of type string, number or boolean easier in the auto-complete dialog.
      Additionally the global variable used is now named g_nDebugMessage. g_ as additional prefix defines that this variable is a global variable not defined inside the function. Variables with a non standard type don't have a prefix, but have a very special name like WorkingFile.

      The standard prefixes I use are:

      an ... array of numbers (doesn't exist in IsUnicode.js)
      as ... array of strings (doesn't exist in IsUnicode.js)
      b ... boolean
      n ... number
      s ... string

      Further I have added for security a check if the current file is opened/viewed in hex edit mode. The function is not designed for determining the file encoding in hex edit mode. It now returns in this case always false and an appropriate warning is displayed if displaying debug messages are enabled.

      Last I have added a code for demonstrating the usage of the function by simply running it on all open files and telling the user if the file is an ASCII/ANSI file or a Unicode file. The current cursor position in all files is saved before running IsUnicode and restored afterwards.

        Nov 14, 2018#3

        Some small improvements have been made on code of function IsUnicode and the demonstration code with no functional effect on execution on 2018-11-14.
        Best regards from an UC/UE/UES for Windows user from Austria