Bug in Base64 Encoding/Decoding for Unicode text

Bug in Base64 Encoding/Decoding for Unicode text

11
Basic UserBasic User
11

    Apr 27, 2010#1

    There's a bug with the built-in Base64 encoding/decoding routines in UltraEdit 16.0.1038. I assume the bug also exists in previous versions, but I haven't tested it.

    The encoding/decoding routines work fine when performed on ASCII plain text. But if you try encoding some Unicode text, and then try to decode it, the bug becomes apparent.

    6,688586
    Grand MasterGrand Master
    6,688586

      Apr 27, 2010#2

      Can you explain the problem more detailed with a step by step list how to reproduce it because I can't reproduce it using UE v16.00.0.1038.

      For testing I took one of my ANSI HTML files (6468 bytes) encoded and decoded it - no problem. Next I converted the file to Unicode (UTF-16 LE) and saved it. I again encoded and decoded it - no problem. I added some characters which must be really encoded in Unicode with 2 bytes (German umlauts) and saved the Unicode file. I encoded and decoded it - no problem. Now I encoded it once again, copied the encoded string to a new ANSI file and decoded it - also correct result.

      Of course encoding a Unicode text with characters not available in the active codepage and decoding this text in an ANSI file results in wrong characters. But that is not a problem of the Base64 encoding/decoding routines. That is the general problem of Unicode to ANSI conversion with characters not available in the ANSI codepage. The encoded data stream does not contain any information of which type the input data stream was. Email programs solve that problem by adding additional information about the original data (= file information like name of file, content-type, etc.) as plain text above the encoded data stream to be able to correct decode the encoded data.
      Best regards from an UC/UE/UES for Windows user from Austria

      11
      Basic UserBasic User
      11

        Apr 27, 2010#3

        Mofi wrote:Can you explain the problem more detailed with a step by step list how to reproduce it because I can't reproduce it using UE v16.00.0.1038.
        1) Create a new file in UltraEdit. Then go to the File Menu ---> Conversions ---> ASCII to Unicode
        2) Type the word "hello" into the editor
        3) Highlight the word with your keyboard or mouse
        4) Edit Menu ---> Encode Base64
        5) Now highlight the encoded text
        5) Edit Menu ---> Decode Base64

        The decoded text will be exactly the same as what you started with. So as you can see, normal ASCII characters are encoded correctly.


        Now let's try encoding something with non-ASCII characters.

        1) Erase all the text in your editor window
        2) Copy and paste the word "привет" into your editor window
        3) Highlight the word with your keyboard or mouse
        4) Edit Menu ---> Encode Base64
        5) Now highlight the encoded text
        5) Edit Menu ---> Decode Base64

        As you can see, the decoded text is not what you started with, and therefore we can see that the problem is with the handling of Base64 encoding of non-ASCII text.

        6,688586
        Grand MasterGrand Master
        6,688586

          Apr 27, 2010#4

          Okay, with those details I could reproduce the problem using UE v16.00.0.1029. Please report this issue by email to IDM support.
          Best regards from an UC/UE/UES for Windows user from Austria

          11
          Basic UserBasic User
          11

            Apr 29, 2010#5

            Done. And I received a response from IDM to let me know that they're able to reproduce the problem, and they're working on fixing it.

              Dec 11, 2010#6

              Seriously?
              After almost 8 months and several new version updates, this simple bug still hasn't been fixed.

              6,688586
              Grand MasterGrand Master
              6,688586

                Mar 13, 2011#7

                I looked on this issue with UE v17.00 and word привет is still not got back after encoding/decoding it with Base64.

                UTF-16 encoded привет is in hexadecimal 3F 04 40 04 38 04 32 04 35 04 42 04. Omitting the high bytes with value 04, those bytes would be in ASCII ?@825B. Using Encode Base64 on UTF-16 string привет results in P0A4MjVC. Using Encode Base64 on ASCII string ?@825B results also in P0A4MjVC.

                So I wanted to know more about Base64 encoding and read (not entirely) the wikipedia article about Base64. According to this article Base64 encoding is only for ASCII strings (single byte strings). The UTF-7 encoding is needed for encoding UTF-16 characters which is called also modified Base64.

                Now the question is, which Base64 encoding is implemented in UltraEdit at all, the standard Base64 or the modified Base64? It looks like standard Base64.

                Of course, standard Base64 encoding can be used for binary files. Therefore the UTF-16 characters could be read as binary array like in hex edit mode and therefore encoding UTF-16 characters with standard Base64 encoding is also possible. But why was UTF-7 encoding introduced if standard Base64 can be used also by reading the UTF-16 strings as binary data stream?

                I don't want to read all the RFCs to get an answer on that question because for my work with UltraEdit that is totally unimportant. But perhaps my post is an explanation why Base64 encoding is not working for UTF-16 characters in UltraEdit.

                11
                Basic UserBasic User
                11

                  May 21, 2011#8

                  Mofi wrote: I don't want to read all the RFCs to get an answer on that question because for my work with UltraEdit that is totally unimportant. But perhaps my post is an explanation why Base64 encoding is not working for UTF-16 characters in UltraEdit.
                  You're over-complicating this issue. It has nothing to do with "standard" or "modified" Base64 encoding, and reading every RFC in the world will not help.
                  The bug in UltraEdit is the result of an improper pointer operation in the string handling routine. The bug is present not only for UTF-16 strings, but also when you specifically tell UltraEdit to produce UTF-8 (on the File menu).

                  I use Base64 encoding in some of my applications which process string values, and the original unmodified Base64 routines from 20 years ago work perfectly on both UTF-8 and UTF-16 strings. The key thing to making it work properly is to account for the correct character size/length (and therefore the correct pointer operation) when referencing them. When I started encoding Unicode text in my programs a couple of years ago, I had the exact same problem which is currently in UltraEdit. I fixed it without reading any RFCs. The only thing I did was some good old-fashioned debugging and some trial-and-error.

                  I had hoped it would be fixed in UltraEdit by now, but I guess the programmers are too busy doing much more important things like replacing icons and implementing a new serial number registration system. :roll:

                  2

                    Nov 22, 2013#9

                    I'd like to include PNG files in an HTML file according to the data URI scheme. To do this, I need to convert binary PNG files as base64 strings. This is how I try do do this (using UltraEdit 20 on WinXP SP3 32-bit):
                    1. Open the file reddot.png with UE.
                    2. Switch from hex mode to text mode.
                    3. Mark the text, use encode to base64.

                    It should yield:
                    iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAGUlEQVQY02NgAIL/QATDDOgCcAniBbGZCQCVRCnXEyWCYwAAAABJRU5ErkJggg==

                    While it is:
                    iVBORw0KGgogICANSUhEUiAgIAUgICAFCAYgICCNbyblICAgHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBCAO9TXL0Y4OHyAgICBJRU5ErkJggg==

                    Can anyone help me with the process of correctly encoding to base64? Or is the bug still unfixed?

                    Thanks in advance
                    Mike
                    reddot.png (85Bytes)
                    Example file for the above base64 code

                    6,688586
                    Grand MasterGrand Master
                    6,688586

                      Nov 22, 2013#10

                      The Encode Base64 and Decode Base64 features in UltraEdit are designed for text data, not for binary data. So you cannot do this encoding with UltraEdit. With switching for a binary file from hex edit mode to text mode, all bytes with value 0 are replaced by a space character and perhaps also bytes with decimal value 10 (line-feed) and 13 (carriage return) are modified for correct displaying the data now as text.

                      You could write an UltraEdit script which takes the binary data of a file opened in hex edit mode and writes to a new file or to the clipboard the base64 encoded string. But it is easier to do this with other tools designed for creating text files with base64 encoded strings from binary input files. A lot of such tools exist, use a WWW search engine to find one of them.
                      Best regards from an UC/UE/UES for Windows user from Austria

                      2

                        Nov 22, 2013#11

                        Aw, shucks! Thanks for clarifying. I've been assuming that inlining binary files into text like files such as HTML was the major application of base64 conversions, so UltraEdit could surely do this and I was simply missing the crucial step or something.

                        Thanks again
                        Mike