File change while opening an XML file with UltraEdit

File change while opening an XML file with UltraEdit

13
Basic UserBasic User
13

    Feb 23, 2006#1

    Hello,

    if I open an XML file with UltraEdit (version 11.10), the file size immediately reduplicates. Comparing the file, opened with UltraEdit, with a backup version using the compare tool of Total Commander, I see in the changed file between each "usual" a special character appearing as a rectangle. So far I couldn't identify, which kind of character (which ASCII number) this is. If afterwards I try to open the XML file - modified by UltraEdit - with another tool, e. g. MS Word 2003, I get a message like "File cannot be opened, because it caused a problem".
    1. What really happens, if an XML file is opened with UltraEdit, and why?
    2. What kind of special characters is inserted between the other characters?
    3. How can this behavior be switched off?
    4. How can the XML file, already modified by UltraEdit, be reset to its previous content, i.e. how can the special characters be removed again?
    Thanks for all good hints.

    Thomas

    236
    MasterMaster
    236

      Feb 24, 2006#2

      Hi Thomas,

      UE converts all 8-bit Unicode files into 16-bit Unicode while editing it and then reconverts to 8-bit when saving. You might want to check inflated file size with utf-8 encoding for an expert's (Mofi's) take on this.

      The problem with re-opening the file might have to do with BOM settings, I guess.

      HTH,
      Tim

      13
      Basic UserBasic User
      13

        Feb 27, 2006#3

        Yes, the link described my problem (and its explanation by Mofi). Okay, with File/Convert/UTF-8 to ASCII I could reconvert the file to its original content.

        But:
        Is there a configuration by UltraEdit generally to avoid upfront that kind of file inflation at XML files with the attribute encoding="UTF-8"? How?

        Thomas

        6,675585
        Grand MasterGrand Master
        6,675585

          Feb 27, 2006#4

          Thomas, be careful with conversion to ASCII if not changing the encoding info of the XML file. If you identify explicitly in the XML file that the file is encoded in UTF-8, you must save it in UTF-8 or you will have bad characters.
          For example the German umlauts are encoded with 2 bytes in UTF-8, but with a single byte in standard ASCII mode. But a single byte German umlaut is only displayed correct, if the user who opens the XML file has the code page for Western European language active.


          For all UltraEdit/UEStudio users out there in the world editing in a Unicode format without to really knowing what Unicode is, please do following:

          First read the FAQ about UTF-8, UTF-16, UTF-32 & BOM to get the basic knowledge you need.

          Second in UltraEdit or UEStudio open Configuration - File Handling and set following options:


          Conversions

          Uncheck the 2 EBCDIC options if you are not editing EBCDIC files, but check the option On Paste convert line ending to destination type (UNIX/MAC/DOS).


          DOS/UNIX/MAC Handling

          Set the Default file type for new files to whatever you prefer.

          Set the Unix/Mac file detection/conversion to Automatically convert to DOS format to avoid problems with copy and paste with other windows applications.

          Uncheck Only recognize DOS terminated lines (CR/LF) as new lines for editing.


          Save

          Uncheck Write UTF-8 BOM header to ALL UTF-8 files when saved.

          If Write UTF-8 BOM on new files created within this program (if above is not set) should be enabled or not depends on the type of Unicode files you are creating. If you create for example only XML and HTML type files (HTML, HTML, PHP, ASP, ...) in UTF-8, you should uncheck this option, because then the encoding should be defined inside the file with encoding="UTF-8" (XML) or with content="text/html; charset=utf-8" (HTML). See FAQ above for details about BOM and when it should be used.

          Enable Save file as input format (UNIX/MAC/DOS). That's important because we convert every file automatically to DOS for editing, but we want to save it in the original format and not in DOS format. This option is moved from the Save to the DOS/UNIX/MAC Handling configuration dialog in v12.10 of UltraEdit!

          You can set option Trim trailing spaces on file save to whatever you prefer. Normally it is good to activate it because it can reduce the file size a little bit which is interesting for HTML files.

          Temporary Files

          Use the second option Open file without temp file but prompt for each file and set the Threshold for example to 4096 (4 MB). You can set the threshold value to a higher value if your computer has enough performance and your hard disk is fast and you often edit large files.


          Unicode/UTF-8 Detection

          Enable Auto detect UTF-8 files, Detect Unicode(UTF-16) files without BOM and Detect ASCII/ANSI files with Escaped Unicode. You can disable for example the UTF-16 detection if you are sure that you will never edit a UTF-16 file. Every enabled detection increases the file load time of normal ASCII files. But if you don't know what format your files have, it is better to let UE/UES automatically detect it.

          The 3rd option Disable automatic detection of HEX file format on reload is not important for handling Unicode files.
          Best regards from an UC/UE/UES for Windows user from Austria

          236
          MasterMaster
          236

            Feb 28, 2006#5

            Thanks Mofi, that's a big help!

            Cheers,
            Tim