SaveAs Macro with Line Terminator and Format Parameters

SaveAs Macro with Line Terminator and Format Parameters

2

    16:25 - Feb 10#1

    Hello,

    I receive various text files with various encoding.  I have older systems which can only accept files with no code page or Windows-1252.  I have had good success using UE to manually open CSV files and save them as ANSI/ASCII with DOS Terminators.  Doing this, the few accented characters I sometimes encounter are rendered nicely (technically they're being converted to similar-looking characters within the more limited ASCII character set) and my other systems accept the files just fine.   

    I'd like to create a macro to do this, however, when recording a macro I click Save As and then set the Line Terminator and Format ... but only the SaveAs and file name is being recorded.  

    I noted the UnicodeToASCII macro command and tried it.  If in fact the selected file is UTF-8, it works, but if it is already ASCII, the command destroys the file contents.   This isn't quite the same as manually selecting the Line Terminator and Format while Saving within UE, as this doesn't destroy the file if it is already ASCII.

    Because I don't know the file format in advance, and see no command to detect it programmatically, I'm at a loss.  How can I either programmatically detect if conversion is needed, or, perform the Save As including the Line Terminator and Format parameters the same as when done manually (and therefore not destroy the file if it is already ASCII)?

    Thanks in Advance

    6,613548
    Grand MasterGrand Master
    6,613548

      10:45 - Feb 11#2

      The macro command SaveAs has as only parameter the new file name without or with path or an empty file name string for opening the Save As dialog window. It is also very difficult to find out with an UltraEdit macro the line ending type (depending on configuration) and the character encoding of the active file for making the necessary conversions before saving the converted file.

      The usage of the following UltraEdit script file is preferred for these reasons.

      Code: Select all

      if (UltraEdit.document.length > 0)  // Is any file opened?
      {
         var sFileName = UltraEdit.activeDocument.path;
         // Is the active file not opened in hex edit mode?
         if (!UltraEdit.activeDocument.isHexModeOn())
         {
            var nFileChanged = 0;
            // Has the active file not DOS/Windows line terminators?
            if (UltraEdit.activeDocument.lineTerminator)
            {
               UltraEdit.activeDocument.unixMacToDos();
               UltraEdit.outputWindow.write(sFileName + " converted to DOS/Windows.");
               nFileChanged = 1;
            }
            else
            {
               UltraEdit.outputWindow.write(sFileName + " is a DOS/Windows text file.");
            }
            // Determine the character encoding of active file and convert
            // UTF-8, UTF-16 LE and UTF-16 BE encoded files to ASCII/ANSI.
            switch (UltraEdit.activeDocument.codePage)
            {
               case 1200:  // UTF-16 little endian encoded file.
               case 1201:  // UTF-16 big endian encoded file.
                           UltraEdit.activeDocument.unicodeToASCII();
                           UltraEdit.outputWindow.write(sFileName + " converted from Unicode to ASCII/ANSI.");
                           nFileChanged +=  2;
                           break;
               case 65001: // UTF-8 encoded file.
                           UltraEdit.activeDocument.UTF8ToASCII();
                           UltraEdit.outputWindow.write(sFileName + " converted from UTF-8 to ASCII/ANSI.");
                           nFileChanged +=  2;
                           break;
               default:    UltraEdit.outputWindow.write(sFileName + " has the code page " + UltraEdit.activeDocument.codePage + ".");
                           break;
            }
            if (nFileChanged > 1)   // Is the character encoding changed?
            {
               var sXmlEncoding = "CP-1252";
               UltraEdit.perlReOn();
               UltraEdit.activeDocument.top();
               UltraEdit.activeDocument.findReplace.mode = 0;
               UltraEdit.activeDocument.findReplace.matchCase = true;
               UltraEdit.activeDocument.findReplace.matchWord = false;
               UltraEdit.activeDocument.findReplace.regExp = true;
               UltraEdit.activeDocument.findReplace.searchDown = true;
               UltraEdit.activeDocument.findReplace.searchInColumn = false;
               UltraEdit.activeDocument.findReplace.preserveCase = false;
               UltraEdit.activeDocument.findReplace.replaceAll = false;
               UltraEdit.activeDocument.findReplace.replaceInAllOpen = false;
               // Search in active file for UTF-8 or UTF-16 or ISO-10646-UCS-2
               // XML encoding declaration and adapt the encoding attribute.
               // The tag ?xml and the attribute encoding must be in lowercase
               // while the value of the attribute encoding is case-insensitive.
               // That is the reason for the unusual search expression.
               UltraEdit.activeDocument.findReplace.replace("^([\\t ]*<\\?xml[^>]+?\\<encoding=[\"'])(?:[Uu][Tt][Ff]-(?:8|16)|[Ii][Ss][Oo]-10646-[Uu][Cc][Ss]-2)","\\1" + sXmlEncoding);
               if (UltraEdit.activeDocument.isFound())
               {
                  UltraEdit.outputWindow.write(sFileName + " changed XML encoding to " + sXmlEncoding + ".");
               }
               else
               {
                  var sHtmlEncoding = "Windows-1252";
                  UltraEdit.activeDocument.findReplace.matchCase = false;
                  // Search in active file for UTF-8 or UTF-16 or ISO-10646-UCS-2
                  // HTML/XHTML charset declaration and adapt the charset attribute.
                  UltraEdit.activeDocument.findReplace.replace("(<meta[^>]+?\\<charset=[\"']?)(?:UTF-(?:8|16)|ISO-10646-UCS-2)","\\1" + sHtmlEncoding);
                  if (UltraEdit.activeDocument.isFound())
                  {
                     UltraEdit.outputWindow.write(sFileName + " changed HTML/XHTML charset to " + sHtmlEncoding + ".");
                  }
               }
            }
            if (nFileChanged) // Is the active file modified by this script?
            {
               // Is the active file a new, unnamed file?
               if (UltraEdit.activeDocument.isName("") && UltraEdit.activeDocument.isExt(""))
               {
                  UltraEdit.saveAs("");
               }
               else  // The active file is a named file which is saved now.
               {
                  UltraEdit.save();
               }
            }
         }
         else  // The active file is most likely a binary file.
         {
            UltraEdit.outputWindow.write(sFileName + " is opened in hex edit mode.");
         }
         if (!UltraEdit.outputWindow.visible)   // Is the output window not visible?
         {
            UltraEdit.outputWindow.showWindow(true);
         }
      }
      
      This script works with UltraEdit for Windows v2023.2.0.33 for ANSI, OEM, UTF-8 and UTF-16 LE encoded text files with DOS/Windows, Unix or Mac line terminators. It does not work for UTF-16 BE encoded files because of the line terminator property has in this case incorrect the default code page number for ANSI encoded files like 1252 instead of 1201 although UltraEdit detects correct the UTF-16 big endian encoding and indicates it correct in the status bar. I reported this issue by email to UltraEdit support.

      The entire condition if (nFileChanged > 1) is for XML and HTML/XHTML files. It changes also the XML encoding or the HTML charset declaration if found in the active file. The entire condition can be removed on not needed like on script used only for other file types.

      The string values assigned to the variables sXmlEncoding and sHtmlEncoding must be adapted if the default ANSI encoding is not Windows-1252.

      The script can be enhanced for processing all opened files or all files in a directory or even a directory tree with a loop without (opened file) or with using GetListOfFiles (directory/directory tree).
      Best regards from an UC/UE/UES for Windows user from Austria

      2

        21:54 - Feb 17#3

        Thanks for the in-depth reply. The script is altering the file from U8-Unix to U8-DOS, as per the bottom bar. However, as I don't want Unicode, I'm looking for it to say DOS in that location on the bottom bar as it would when manually selecting the options with Save As (I think). Is there something more I have to do or do you think it's likely just my various older versions of UE? (It indicates that it ran successfully.)

        6,613548
        Grand MasterGrand Master
        6,613548

          11:07 - Feb 18#4

          Which versions of UltraEdit for Windows are used by you?

          I can only look on how older versions of UltraEdit process the script on knowing their versions. The scripting command UTF8ToASCII was introduced with UltraEdit for Windows v17.30 and UEStudio v11.20. The script does definitely not work as posted by me with older versions of UltraEdit for Windows. HTML5 character set declaration and XML encoding declaration in single quotes are supported since UltraEdit for Windows v24.00.0.42 and UEStudio v17.00.0.15. UTF-8 encoded files are no longer converted on opening to UTF-16 LE since UltraEdit for Windows v25.00 and UEStudio v18.00 which makes also an important difference for the script.

          The script is designed for converting not only the line termination type from Unix/Mac to DOS/Windows but also the character encoding from UTF-8 and UTF-16 LE (and UTF-16 BE) to ASCII/ANSI. That works with UltraEdit for Windows v2023.2.0.33 as it can be seen on the files in the directories Input and Output in the attached ZIP file which contains the files of my test set.

          The issue reported by me with wrong code page number for UTF-16 big endian encoded files was confirmed as a bug of UltraEdit by the UltraEdit support.
          conversion_test_set.zip (52.43 KiB)   0
          Set of ANSI, UTF-8 and UTF-16 encoded files for testing the conversion script and the UE/UES encoding detection capabilities
          Best regards from an UC/UE/UES for Windows user from Austria