Remember a Bug related to Base64

Keywords: Programming Java encoding codec ascii

This paper originally planned to write two parts. The first is to record the latest bugs related to Base64. The second is the detailed explanation of the principle of Base64 encoding. It turns out it's half written, eh? How can I say such a long time about a thing that is not complicated? It's not conducive to reading and understanding (actually, I'm a little lazy to go to leisure and entertainment today), so the detailed explanation of the principle of Base64 coding will be brought in the next article, please pay attention to it.

0x01 phenomenon encountered

A provides an interface to B, and the contract interface parameter Base64 is passed after encoding.

However, when A decodes the parameters passed by B in Base64, it is wrong:

Illegal base64 character a

0x02 cause analysis

After searching, we found that this is a pit that many netizens have stepped on. In short, there are different implementations of Base64 codec and decoder, some of which are not compatible with each other.

For example, the phenomenon I encountered above can be fully simulated and reproduced by using the following code:

package org.mazhuang.base64test;

import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.util.Base64Utils;
import sun.misc.BASE64Encoder;

@SpringBootApplication
public class Base64testApplication implements CommandLineRunner {
    @Override
    public void run(String... args) throws Exception {
        byte[] content = "It takes a strong man to save himself, and a great man to save another.".getBytes();
        String encrypted = new BASE64Encoder().encode(content);
        byte[] decrypted = Base64Utils.decodeFromString(encrypted);
        System.out.println(new String(decrypted));
    }

    public static void main(String[] args) {
        SpringApplication.run(Base64testApplication.class, args);
    }

}

Exception will be reported when the above code is executed:

Caused by: java.lang.IllegalArgumentException: Illegal base64 character a
	at java.util.Base64$Decoder.decode0(Base64.java:714) ~[na:1.8.0_202-release]
	at java.util.Base64$Decoder.decode(Base64.java:526) ~[na:1.8.0_202-release]

Note: if the string in the test code is very short, such as "Hello, World", it can be decoded normally.

That is to say, it is problematic to use sun.misc.BASE64Encoder to encode and org.springframework.util.Base64Utils to decode. We can use them to encode the above strings respectively, and then output to see the difference. Test code:

byte[] content = "It takes a strong man to save himself, and a great man to save another.".getBytes();

System.out.println(new BASE64Encoder().encode(content));
System.out.println("--- Gorgeous divider ---");
System.out.println(Base64Utils.encodeToString(content));

Output:

SXQgdGFrZXMgYSBzdHJvbmcgbWFuIHRvIHNhdmUgaGltc2VsZiwgYW5kIGEgZ3JlYXQgbWFuIHRv
IHNhdmUgYW5vdGhlci4=
--- Gorgeous divider ---
SXQgdGFrZXMgYSBzdHJvbmcgbWFuIHRvIHNhdmUgaGltc2VsZiwgYW5kIGEgZ3JlYXQgbWFuIHRvIHNhdmUgYW5vdGhlci4=

You can see that the content of sun.misc.BASE64Encoder is wrapped, and the ASCII code of the newline character is exactly 0x0a, so it seems that the explanation is correct. Let's take a closer look at the source of this difference.

0x03 further

Hold down CTRL or COMMAND key in IDEA and click method name to jump to their implementation.

3.1 sun.misc.BASE64Encoder.encode

This writing method mainly involves two classes, BASE64Encoder and CharacterEncoder under sun.misc package, in which the latter is the parent class of the former.

Its actual encoding method is in the character encode r file. The annotated version is as follows:


public void encode(InputStream inStream, OutputStream outStream)
    throws IOException {
    int     j;
    int     numBytes;
    // bytesPerLine is implemented in BASE64Encoder, returning 57
    byte    tmpbuffer[] = new byte[bytesPerLine()];

    // Constructing a PrintStream with outStream
    encodeBufferPrefix(outStream);

    while (true) {
        // Read up to 57 bytes
        numBytes = readFully(inStream, tmpbuffer);
        if (numBytes == 0) {
            break;
        }
        // Nothing.
        encodeLinePrefix(outStream, numBytes);
        // Each time 3 bytes are processed, encoded as 4 bytes, and the complement 0 and '='
        for (j = 0; j < numBytes; j += bytesPerAtom()) {
            // ...
        }
        if (numBytes < bytesPerLine()) {
            break;
        } else {
            // Line feed
            encodeLineSuffix(outStream);
        }
    }
    // Nothing.
    encodeBufferSuffix(outStream);
}

Then we can see the encoded format in the annotation of the CharacterEncoder class:

[Buffer Prefix]
[Line Prefix][encoded data atoms][Line Suffix]
[Buffer Suffix]

According to the implementation class BASE64Encoder, Buffer Prefix, Buffer Suffix and Line Prefix are all empty, and Line Suffix is \ n.

So far, we have found the line feed part of the implementation - in this encoder implementation, 57 bytes are read as a line for encoding (76 bytes after encoding).

3.2 org.springframework.util.Base64Utils.encodeToString

This writing method mainly involves org.springframework.util.Base64Utils and java.util.Base64. It can be seen that the former is mainly the encapsulation of the latter.

Base64Utils.encodeToString is the coder of Base64.Encoder.RFC4648

// isURL = false,newline = null,linemax = -1,doPadding = true
static final Encoder RFC4648 = new Encoder(false, null, -1, true);

Note the values for newline and linemax.

Then look at the Base64.encode0 method of the actual coding implementation:

private int encode0(byte[] src, int off, int end, byte[] dst) {
    // ...
    while (sp < sl) {
        // ...

        // This condition will not be met, and no new line will be added
        if (dlen == linemax && sp < end) {
            for (byte b : newline){
                dst[dp++] = b;
            }
        }
    }
    // ...
    return dp;
}

So... There is no newline in this implementation.

Summary of 0x04

After the above analysis, the truth is clear. The implementation of the two encoders is not the same. We should pay attention to using the matching codec in the development process. That is, we should use the corresponding codec under the same Java package to encode.

As for why there are different realizations, what are their origins, enmities, feuds, detailed principles of Base64 and so on? I'd like to invite you to listen to them! -P

If you are interested in my article, you can pay attention to my WeChat official account of "programmers".

Posted by littlegreenman on Sun, 01 Mar 2020 04:10:48 -0800