Which code page is for Gmail Google Takeout?

Which code page is for Gmail Google Takeout?

Advanced UserAdvanced User

    Jul 15, 2023#1

    I did the Google Takeout.
    It generated an archive of all messages in a single file with an .mbox extension.
    This file must be opened with a Thunderbird type email client.

    If I open this file with a text editor, I find the accented letters in an unknown code page.
    As an example, I give the correspondences below:

    Code: Select all

    =C3=B3 corresponds to 'ó'
    =C3=AA corresponds to 'ê'
    After a quick search, it says that it's UTF-8.
    Of course, it's not.
    UTF-8 is:

    Code: Select all

    ó for 'ó'
    ê for 'ê'
    1. What code page is this?
    2. How can I adjust the text editor, in this case UltraEdit version, so that the codes are displayed as accented letters normally?
    I'm not talking about to replace characters, but change its display without editing the file.

    Grand MasterGrand Master

      Jul 16, 2023#2

      Please read the Wikipedia article about Email. Emails can be sent only with 7-bit ASCII characters although many programs involved in transfer of an email support now also 8-bit clean. There is originally no support for letters other than A-Za-Z or other characters with a code point value greater 127 decimal in the Unicode table or binary data as in attached files in the body of an email.

      It is of course possible nowadays sending text with an email containing non-ASCII characters as well as attached files with binary data. This is done by email programs by encoding the data either with using various encodings as it can be seen on viewing a mail box file with a text editor like UltraEdit. It is highly recommended to make use of the UltraEdit file open option Encoding with selection of ASCII (ansi code page auto detection) for avoiding a false positive opening of the file with a different text encoding just because of a string in the mail box file indicates, for example, a UTF-8 encoding (of an embedded HTML file).

      A mail box file with multiple emails inside contains usually several times Content-Transfer-Encoding: which is used by email programs to inform another email program how the following part of an email is encoded containing characters or bytes which are not 7-bit ASCII characters (or 8-bit clean).

      There can be used encodings like:
      Mail box files are only for email programs. The mbox storage file format is not designed for viewing or even editing such files in text editors.

      UltraEdit supports encoding text with base64 and decoding base64 encoded text back to text. It is also possible with a trick decoding with UltraEdit a base64 encoded binary file attachment as often stored inside an mail box file back to the binary file contents and save the binary file.

      There is built-in no support for encoding text in quoted-printable form and decode quoted-printable data back to pure text. That can be done very easily with an UltraEdit script. But there must be known the original encoding of the text as stored in the header of the email. While it is easy with an UltraEdit script to decode quoted-printable text back to pure text, there must be first known by the script how are encoded the characters of the original text. There must be known by a script if the original text is encoded in "ANSI"  (one byte per character) with a code page like Windows-1252 or in UTF-8 or in UTF-16 Little Endian or another multi-byte character encoding.
      Best regards from an UC/UE/UES for Windows user from Austria

      Advanced UserAdvanced User

        Jul 16, 2023#3

        Thank you, Mofi.

        Now I know about quoted-printable form and standards for Emails.

        At the header, guidelines are:

        Code: Select all

        X-Gmail-Labels: Sent
        MIME-Version: 1.0
        Content-Type: text/plain; charset="UTF-8"
        Content-Transfer-Encoding: quoted-printable
        So, I have two questions:
        1. What are UltraEdit tricks to decoding a base64 encoded binary file attachment?
        2. Given an UTF-8 charset and Content Transfer Encoding as quoted-printable, what would be a script to convert and get extended characters?
        About question 1: I often use this site to convert base64 content.
        Can UltraEdit do the same using a trick?

        Grand MasterGrand Master

          Jul 16, 2023#4

          The steps for decoding base64 encoded data of a binary file with UltraEdit are:
          1. Open Advanced - Settings or Configuration - Editor - Hex mode and check the configuration setting Allow editing of text files with hex 00's without converting them to spaces.
          2. Create a new file and make sure the encoding is ANSI (Windows-1252 - Latin I) and not UTF-8 or UTF-16.
          3. Copy and paste the base64 encoded block of the binary file from the mbox file into the new file.
          4. Press Ctrl+A to select the pasted base64 encoded data.
          5. Click on ribbon tab Advanced in the second group Active file on the second command Base64 and in the opened popup menu on the second item Base64 decode or in the contemporary menu Advanced in the submenu Base64 on the second menu item Base64 decode or in the traditional menu Edit on the menu item Decode base64.
          6. Press Ctrl+H to switch to hex edit mode.
          7. Press F12 to open the Save as dialog window and save the new file with the binary data with the correct file name with option Encoding set to Default.
          Why do you not simply import the mbox file into any email program?

          The email program must not be Thunderbird. It can be any email program with support of importing an mbox file which sometimes have also the file extension .mbs (mail box storage).
          Best regards from an UC/UE/UES for Windows user from Austria

          Advanced UserAdvanced User

            Jul 16, 2023#5

            Thank you, very much.

            Yes, I use Thunderbird to view such files.
            But I also like to handle all kinds of files using UltraEdit. And like to know those tricks you'd described.