Copy UTF-8 encoded text into clipboard

Copy UTF-8 encoded text into clipboard

15
Basic UserBasic User
15

    Jul 02, 2015#1

    I am using UltraEdit as an external editor of Stata 14. Stata uses UTF-8 encoded text (without BOM) in command files (so called do-files).

    To send (sections of) command files edited by UltraEdit directly to Stata I am using two AutoIt3 programs (rundo and rundolines defined as UltraEdit tools) written by Friedrich Huebler (see http://huebler.blogspot.ca/2008/04/stata.html) that allow to integrate external editors with Stata. rundo invokes Stata to run the complete command file from the disk, whereas rundolines copies selected parts of the command files into a temporary command file and invokes Stata to run this temporary file. If I save and edit the command file as a UTF-8 encoded file, calling rundo as an UltraEdit tool works fine.

    However, using rundolines together with UltraEdit does not work as it should - it seems as if it is not possible to copy UTF-8 encoded edited in UltraEdit into the clipboard. The idea of rundolines is (1) to copy selected lines of from the editor into Windows' clipboard, (2) retrieve the user specific path of the temp directory, (3) create the temporary file "statacmd.tmp", (4) paste the commands from the clipboard into this file and place it into this directory, (5) activate Stata and (6) let Stata read and execute the commands of temporary file.

    Unfortunately, even if a file with UTF-8 encoded text is edited in UltraEdit, copying UTF-8 encoded texts from UltraEdit into the clipboard seems to change them into a different encoding such that Stata will not display umlauts such as ä, ü, ö, or ß properly.

    To fix this problem I inserted the following AutoIt3 script into line 60 of rundolines.au3 (the file is available at the link given above):

    Code: Select all

    ; ---------- fix for UltraEdit 22.10: ---------
    ; Open temporary file in editor
    $tempfile3 = " " & $tempfile
    Send("^o")
    Sleep($clippause)
    Send($tempfile3 & "{Enter}")
    Sleep($clippause)
    ; Convert file to UTF-8 coding (assumes UltraEdit's keyboard assignment to convert ASCII to UTF-8 is <Alt-u>)
    Send("!u")
    Sleep($clippause)
    ; Save and close temporary file (assumes UltraEdit's keyboard assignment to close a file is <Alt-s>)
    Send("^s")
    Send("!s")
    ; ---------- end fix for UltraEdit. ------------
    Note that in my case $tempfile of the script contains the path and file name "C:\Users\Enzmann\AppData\Local\Temp\statacmd.tmp". The idea of the fix is (1) to open the temporary files created by rundolines in UltraEdit, (2) convert the file from ASCII to UTF-8 (editable) and (3) save and close the file.

    However, the fix does not always work properly because sometimes the first character of this string is stripped when it is copied into the window of the open file dialogue of UltraEdit using the AutoIt3 script command

    Code: Select all

    Send($tempfile3 & "{Enter}")
    The problem seems to occur irregularly (sometimes the first, sometimes more, or sometimes no character is stripped from the path and file name string). I tried to avoid the problem by inserting a space character at the beginning of the string using

    Code: Select all

    $tempfile3 = " " & $tempfile
    which sometimes helps.

    Does anybody know why the first character is sometimes stripped and how to avoid it?

    The best solution were if it would be possible to copy UTF-8 encoded text into the clipboard. Friedrich Huebler reports that he is able to use rundolines successfully with NotePad++ (see http://www.statalist.org/forums/node/1300406). What is possible with NotePad++ should also be possible with UltraEdit.

    6,681583
    Grand MasterGrand Master
    6,681583

      Jul 02, 2015#2

      In memory of any Unicode aware application Unicode characters are always UTF-16 encoded with 16-bit or 32-bit per character (in C/C++ being of type wchar - wide character). Therefore copying text of any Unicode encoded file results in UTF-16 encoded text in clipboard.

      The following UltraEdit script can be used to convert the UTF-16 encoded text in clipboard to a UTF-8 text also put into the clipboard.

      Code: Select all

      // The function toUTF8Array is taken from
      // http://jonisalonen.com/2012/from-utf-16-to-utf-8-in-javascript/
      // and includes the bugfix as posted by Greetz on same page.
      // Additionally the array utf8 is replaced by string utf8.
      
      function toUTF8Array(str) {
          var utf8 = "";
          for (var i=0; i < str.length; i++) {
              var charcode = str.charCodeAt(i);
              if (charcode < 0x80) utf8 += String.fromCharCode(charcode);
              else if (charcode < 0x800) {
                  utf8 += String.fromCharCode(0xc0 | (charcode >> 6),
                                              0x80 | (charcode & 0x3f));
              }
              else if (charcode < 0xd800 || charcode >= 0xe000) {
                  utf8 += String.fromCharCode(0xe0 |  (charcode >> 12),
                                              0x80 | ((charcode >> 6) & 0x3f),
                                              0x80 |  (charcode & 0x3f));
              }
              // surrogate pair
              else {
                  i++;
                  // UTF-16 encodes 0x10000-0x10FFFF by
                  // subtracting 0x10000 and splitting the
                  // 20 bits of 0x0-0xFFFFF into two halves
                  charcode = (((charcode & 0x3ff)<<10)
                             | (str.charCodeAt(i) & 0x3ff))
                             + 0x10000;
                  utf8 += String.fromCharCode(0xf0 |  (charcode >> 18),
                                              0x80 | ((charcode >> 12) & 0x3f),
                                              0x80 | ((charcode >>  6) & 0x3f),
                                              0x80 |  (charcode & 0x3f));
              }
          }
          return utf8;
      }
      
      UltraEdit.clipboardContent = toUTF8Array(UltraEdit.clipboardContent);
      
      You can add additional commands like making the copy of selected text to clipboard before conversion in memory to UTF-8 and calling the configured user tool. Then add the script to script list and assign a hotkey for this script for fast execution by key.

      And of course you don't need all the AutoIt commands to send key strokes to UltraEdit. Run UltraEdit with the parameters

      Code: Select all

      /fni "name of temp file with path" /s,e="Name of UE script file with path"
      or perhaps even better

      Code: Select all

      start "ToUTF8" /wait /min "%ProgramFiles(x86)%\IDM Computer Solutions\UltraEdit\Uedit32.exe" /fni "name of temp file with path" /s,e="Name of UE script file with path"
      The last command uses Windows standard command start to execute UltraEdit minimized in a new instance with file to modify by the script and wait until UltraEdit exits. The UE script can do everything you have coded in AutoIt script.

      By the way: The script produces also correct output for the German umlauts and ß if text file and therefore content in clipboard is encoded in Windows-1252 or ISO 8859-1 instead of UTF-16.
      Best regards from an UC/UE/UES for Windows user from Austria

      15
      Basic UserBasic User
      15

        Jul 02, 2015#3

        Thanks a lot!

        The solution to copy UTF-8 encoded text into the clipboard seems best, I'll give it a try and will report back how it works.

        Your suggestion to run UltraEdit with parameters such that I don't need AutoIt commands to send key strokes to UltraEdit is more difficult to implement because I don't know how to read out the environment variable indicating the definition of the path to the user specific directory for temporary files. In AutoIt this can be achieved by

        Code: Select all

        $tempfile = EnvGet("TEMP") & "\statacmd.tmp"
        (here the file name is added to the path). Thus, I would need to get the path name without using AutoIt to make use of your suggestion that in principle is a good idea.

        6,681583
        Grand MasterGrand Master
        6,681583

          Jul 03, 2015#4

          I have several ideas how to workaround the missing feature to get environment variables from within an UltraEdit script.

          The first one is simply the instruction that a user of this script has to edit the path for the temporary file ONCE at top of the script with UltraEdit before the script can be used.

          The second one is using path of active file as it can be supposed that the active file is stored in a directory where the user has write permissions.

          But if the user of the script has not defined a directory for the temporary file, or there is no active file, or active file is a new file not yet saved, or is a file loaded via FTP, FTPS or SFTP, the script opens the Save As dialog so that the user can select a directory and enter a file name.

          Here is the scripting code for this method requiring function GetFilePath being copied into the script, too.

          Code: Select all

          // Please define here path for temporary file.
          // Example: "C:\\Users\\YourUserName\\AppData\\Local\\Temp\\"
          var sTempFilePath = "";
          
          
          // Other script code.
          
          
          // The default is no file name for the tempory file resulting in
          // opening the Save As dialog on calling UltraEdit.saveAs().
          var sTempFileName = "";
          
          // Has the user of the script once defined the path for the temporary file?
          if (sTempFilePath.length)
          {
             // If the path does not end with a backslash, append a backslash.
             if (sTempFilePath[sTempFilePath.length-1] != '\\') sTempFilePath += '\\';
             // Then append the name of the temporary file.
             sTempFileName = sTempFilePath + "statacmd.tmp"
          }
          else if (UltraEdit.document.length > 0)   // Is any file opened?
          {
             // Has the active file a file name, i.e. is not a new file not yet saved.
             if (UltraEdit.activeDocument.path.length)
             {
                // Is the file not opened via FTP, FTPS or SFTP?
                if(!UltraEdit.activeDocument.isFTP())
                {
                   // Get path of the active file and save the temporary file there.
                   sTempFilePath = GetFilePath(UltraEdit.activeDocument.path);
                   sTempFileName = sTempFilePath + "statacmd.tmp"
                }
             }
          }
          UltraEdit.saveAs(sTempFileName);
          UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
          
          
          // Run the user tool.
          
          The user tool should be a batch file.

          The user tool command line should contain the batch file name and "%f" as parameter.

          The batch file should contain something like:

          Code: Select all

          @echo off
          rem Exit batch processing if batch file was called without a file name.
          if "%~1" == "" goto :EOF
          rem Command to start STATA with "%~1" as parameter and pause execution of batch until STATA finished.
          del "%~1"
          
          I have more ideas like script copies UTF-8 encoded lines to Windows clipboard and then calls the user tool which is a batch file or perhaps better a Windows script executed with cscript.exe which writes the clipboard content into a temporary file in %TEMP% before calling STATA and finally deleting the temporary file. A batch file would require that the user of the script to install an additional tool which writes clipboard content into a file as there is no Windows command which can do that. A Windows script has access to Windows clipboard and can therefore create the file directly without any additional tool.
          Best regards from an UC/UE/UES for Windows user from Austria