Where to find the property "UTF-8"

rplantiko · Jun 13, 2012#12012-06-13T09:03+00:00

Hi all,

when I open a file in UltraEdit (v 14.00a+1), the status bar displays the line ending type and the current encoding as a short string, e.g. "DOS" or "U8-UNIX".

In the scripting object model of UltraEdit, where can I access this property?

I found out that scripts often do not work because they implicitly assume a non-unicode encoding. Therefore, I would need this information to adapt the scripts accordingly.

Thanks for pointing this out to me.

Regards,
Rüdiger

Mofi · Jun 13, 2012#22012-06-13T09:37+00:00

In help of UltraEdit there is the page Scripting commands listing all UltraEdit specific functions and properties supported by your version of UltraEdit. If you open View - Views/Lists - Tag List and select tag group UE/UES Script Commands you get also a full list of comands and properties which you can insert to a script you currently write.

UltraEdit.activeDocument.lineTerminator is the property with the information about current line terminator type of active document.

UltraEdit.activeDocument.encoding is the property with the information about current encoding of the file. Value 65001 is for UTF-8. Which value is for which encoding can be seen for example in first dropdown list of Advanced - Set Code Page/Locale.

Both properties are not supported by all versions with scripting support. So it depends on your version of UltraEdit if you can make use of those properties or not.

rplantiko · Jun 13, 2012#32012-06-13T10:16+00:00

Thank you, Mofi,

I wrote the following script to find out all the properties of the UltraEdit document

Code: Select all

var doc = UltraEdit.document[0];

for (x in doc) {
  UltraEdit.outputWindow.write( x + ":" + doc[x] + "\n" );  
  }

The properties "line terminator" and "encoding" are not contained in the list, which suggests that they came in a version later than 14.00a+1. (-> will propose an update of our company license).

Kind regards,
Rüdiger

Jun 13, 2012#42012-06-13T10:30+00:00

A workaround.

Often, I want to do something with selected text in a file, regardless of its encoding. I only know that it only contains characters of the Latin-1 character set.

Since text usually doesn't contain the null byte, and a Latin-1 char x is represented as "x NUL" in UTF-16 (which is the format UltraEdit internally uses for Unicode), the following code works on Latin-1 as well as on Unicode files containing only Latin-1 characters.

The array lines will contain the selected lines (or all lines of the document, if nothing was selected), ready for application of regular expressions or any other string manipulation in JavaScript:

Code: Select all

var doc = UltraEdit.activeDocument;
if (!doc.selection.match(/\S/)) doc.selectAll(); // only whitespace selected?
var selection = UltraEdit.activeDocument.selection
                   .replace( /^\xEF\xBB\xBF/,"")  // remove Unicode-BOM
                   .replace( /\x00/g,"");         // poor man's transformation Latin-1-Unicode subset -> Latin-1 (not water proof, of course!)

var lineEnd   = selection.match( /[\r\n]{1,2}/ ) || "\r\n" ;
var lines     = selection.split( lineEnd );