Convert all files in a folder from UTF-8 to UTF-16

Convert all files in a folder from UTF-8 to UTF-16

4
NewbieNewbie
4

    Jun 28, 2010#1

    Hi all,
    I'm after some advise from you clever people out there :)

    I am looking for converting some (approximately 100) XML documents from UTF-8 to UTF-16 to make it possible for being loaded by an application.

    I have updated a file manually using the steps below but I'm unsure if this is correct.

    I have updated the encoding in the text of the file as below

    From
    encoding="UTF-8"

    to
    encoding="UTF-16"

    Once this is done I have used the "save as" function and selected the file format to be UTF-16 and saved the document.

    Is this all I need to do?
    How do I know it is UTF-16 and the conversion is successful?
    How do I process this over a folder?

    Happy to upgrade the version of UltraEdit if required.

    Thank you in advance

    Kind Regards

    Neil


    UltraEdit 12.20b+1

    6,677585
    Grand MasterGrand Master
    6,677585

      Jun 28, 2010#2

      neill80 wrote:Is this all I need to do?
      Yes, this is one method. The other is to change the encoding information in the file, then use File - Conversions - UTF-8 to Unicode and save the file with default format. But the method you described is faster here.
      neill80 wrote:How do I know it is UTF-16 and the conversion is successful?
      File size doubled because UTF-16 uses always 2 bytes per character while UTF-8 uses 2 or 3 bytes per character only for non ASCII characters. You can also look into the file with a hex viewer. However, you can trust UE that the conversion was done right.
      neill80 wrote:How do I process this over a folder?
      That is a problem because there are no macro commands (up to UE v17.20) to convert from ANSI/UTF-16 to UTF-8 and vice versa. (With UE v17.30 the commands ASCIIToUTF8 and UTF8ToASCII were introduced). But there is a workaround, you can simply copy entire content from a UTF-8 file into a new Unicode (UTF-16) file and save the new file with the same name as the UTF-8 file (= overwrite UTF-8 file). UTF-8 file are loaded always with temporary conversion to UTF-16 for editing.

      Following macro should work with your version of UltraEdit. You have to adapt the directory "C:\Temp\" and perhaps also the file type "*.*" and the final results line string "Search complete, found " if you are not using English version of UltraEdit. The macro property Continue if a Find with Replace not found should be checked for this macro.

      Please note: I have not tested this macro whether with currently latest version nor with your version of UltraEdit. So please run it on a copy of your UTF-8 files which should be converted to UTF-16.

      FindInFiles "C:\Temp\" "*.*" ""
      Loop
      Find MatchCase Up "Search complete, found "
      IfFound
      ExitLoop
      Else
      NextWindow
      EndIf
      EndLoop
      DeleteLine
      SelectToBottom
      IfSel
      Delete
      EndIf
      Top
      UnicodeToASCII
      Loop
      IfEof
      ExitLoop
      EndIf
      StartSelect
      Key END
      Clipboard 8
      Copy
      EndSelect
      Key HOME
      Key DOWN ARROW
      Open "^c"
      Clipboard 9
      SelectAll
      Copy
      CloseFile NoSave
      NewFile
      ASCIIToUnicode
      Paste
      Top
      UnixReOff
      Find MatchCase "encoding="UTF-8""
      Replace "encoding="UTF-16""
      Clipboard 8
      SaveAs "^c"
      CloseFile NoSave
      EndLoop
      CloseFile NoSave

      You may also first test if your application likes a UTF-16 BOM at top of the file or not. XML files encoded with UTF-8 have normally no UTF-8 BOM (Byte Order Mark - not displayed in the editor) at top of the file. UTF-16 files have normally a BOM and therefore UltraEdit by default saves new UTF-16 files always with BOM. The BOM can be easily removed with a simple Replace In Files when your application does not like the UTF-16 BOM.

      4
      NewbieNewbie
      4

        Jun 28, 2010#3

        Hi Mofi,

        Thank you very much for your post, its much appreciated. I will try the macro and let you know.

        Kind Regards

        Neil