Byte Order Marker (BOM) query?

Byte Order Marker (BOM) query?

2
NewbieNewbie
2

    Jul 27, 2004#1

    We've been having problems with UltraEdit saving out a BOM for UTF-8/Unicode files repeatedly even when the file is already marked.

    Any ideas how to avoid this? It's causing some white space to be displayed in IE on the webpages I am developing.

    matt

    6,675585
    Grand MasterGrand Master
    6,675585

      Sep 17, 2004#2

      Find the answer in the help of UltraEdit - section INI File Selection and Advanced Settings

      Write UTF-8 BOM = 1
      This setting causes the editor to write out the Byte Order Mark (BOM) header in a file when it is saved. If this is not set, it will not write out the BOM unless the file contained it when it was loaded into the editor. If so, the BOM will be written to the file irrespective of the setting. The BOM is an industry standard indicating the contents of the file for various UNICODE formats.

      Write UTF-8 BOM NF = 1
      This setting causes the editor to write out the Byte Order Mark (BOM) header in a file when it is saved if the file is a new file created within UltraEdit. If the Write UTF-8 BOM setting above is set, then the BOM will always be written and this is ignored. Otherwise, the BOM will only be written out for new files if this is set. The BOM is an industry standard indicating the contents of the file for various UNICODE formats.
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        Sep 17, 2004#3

        Thanks.

        I was using an older version of UE where these options were not in the prefs dialog.

        matt

        6,675585
        Grand MasterGrand Master
        6,675585

          Sep 17, 2004#4

          Setting Write UTF-8 BOM is available since v10.10a and can be configured in the configuration dialog since v10.20?. Don't know exactly which version of 10.20. It is available in the configuration dialog in v10.20c.
          Best regards from an UC/UE/UES for Windows user from Austria

          2
          NewbieNewbie
          2

            Feb 07, 2007#5

            Is it possible using UltraEdit to remove a BOM already present in a file? If I try toggling to Hex View and cut the 2 bytes (FF FE), it also eats the first character of my text -- not in the Hex editor, but in the normal mode. Subsequently opening the file in Word or Edit Plus 2 confirms that the character is gone. I am using v10.20 and have deactivated both "write BOM header" fields in Advanced > Config > General, as Mofi suggested above.
            Why is this important? The problem is that we are entering the files (tables of contents) generated in a Postgres SQL base, and the first line identifies a field in a table -- a book ID. The subsequent lines are themselves data (each a line in a table of contents which allows us to create a new table -- sections -- linked to the first table). Obviously the presence of the BOM is a showstopper, because the link can't be made with those extra two bytes. (I hope this is relatively clear, I am not a computer scientist, this is what I understand of the process.)

            My problem is simply to find a way, preferably using UltraEdit rather than EMACS, to get rid of the BOM. Mofi, your reply above seems to indicate that this is not possible?

            Thanks for any suggestions you might have.

            6,675585
            Grand MasterGrand Master
            6,675585

              Feb 07, 2007#6

              It is very easy to remove the BOM from files. You only have to know which BOM you need to remove: UTF-16 LE, UTF-16 BE or UTF-8. I guess, it will be UTF-8 - see second box in the status bar at bottom of the UltraEdit window. U8-DOS indicates an UTF-8 file with DOS line terminations.

              When you open a Unicode file, it is always converted by UltraEdit to UTF-16 LE for editing, independent of the original encoding. So you will see after switching to hex mode always the UTF-16 LE BOM (FF FE). You cannot delete it, because UltraEdit needs it for further editing.

              To delete the BOM from UTF-8 files, you have to remove it without opening the files or you open the files with unchecked Auto detect UTF-8 files configuration setting to load the UTF-8 files as ASCII/ANSI files.

              I suggest to do it without opening the files by using Replace In Files. Search for  and replace it with simply nothing. The 3 strange characters are simply the ANSI characters, if the UTF-8 BOM is displayed as ANSI. ÿþ is the ANSI equivalent for the UTF-16 LE BOM and þÿ for UTF-16 BE BOM.
              Best regards from an UC/UE/UES for Windows user from Austria

              2
              NewbieNewbie
              2

                Feb 07, 2007#7

                Mofi -- Thank you very much for your message, you have really helped us out a lot. I was able to successfully remove the BOM from the entire set of UltraEdit generated files. I appreciate you explaining the bit about UTF-16 as well!

                Greetings from Lyon!