Protostuff Serialization of String

Keywords: Java Jedis Redis

cause

When using redis, in order to facilitate operation, the following help classes are written, including a help function for hmset, which is serialized using Protostuff serialization.

/**
     * mset
     */
    public <K extends Serializable, V extends Serializable> void hmset(String key, Map<K, V> hash, int expireSeconds) {
        ShardedJedis jedis = null;
        try {
            jedis = pool.getResource();
            byte[] keybytes = ProtostuffUtil.serialize(key);
            Map<byte[], byte[]> bytesMaps = Maps.newHashMap();
            hash.forEach((k, v) -> {
                bytesMaps.put(ProtostuffUtil.serialize(k), ProtostuffUtil.serialize(v));
            });
            jedis.hmset(keybytes, bytesMaps);
            if (expireSeconds > 0) {
                jedis.expire(keybytes, expireSeconds);
            }
        } finally {
            closeJedis(jedis);
        }
    }

The approximate function of this code is to serialize the values in the parameter hash and store each kv in redis. However, when we found that we could not get the value at hget, hgetall showed the value; after investigation, we found that it was a pit caused by serialization problems for String...

hashCode of String Protostuff Serialization

Let's first look at String's equals method. In this method, we compare the value length and value of two strings, but we don't compare whether the hashCode of two strings is equal. Usually when we judge whether two strings are equal, we call the equals method.
Looking at the hashCode method, string overrides the hashCode method. There is a hash field to cache the hash value that has been calculated. Here's the problem. Consider the following code:

       String abc="abc";  //At this point, the hash field value is 0

        byte[] serialize = ProtostuffUtil.serialize(abc);

        String bbb=new String(abc.getBytes()); // At this time hash is 0, abc and bbb are two different objects.

        byte[] serialize2=ProtostuffUtil.serialize(bbb);

        Arrays.equals(serialize,serialize2); // true

The second paragraph

        String abc="abc";  //At this point, the hash field value is 0

        byte[] serialize = ProtostuffUtil.serialize(abc);

        String bbb=new String(abc.getBytes()); // At this time hash is 0, abc and bbb are two different objects.

        bbb.hashCode() ;  //Modified hash value

        abc.equals(bbb); // true

        byte[] serialize2=ProtostuffUtil.serialize(bbb);

        Arrays.equals(serialize,serialize2); // false

The reason is that when protostuff is serialized, the hash value of String is also serialized, looking directly at the source code:
The method com. dyuproject. protostuff. runtime. MappedSchema writeTo traverses all fields of an object.

public final void writeTo(Output output, T message) throws IOException
    {
        for(Field<T> f : fields)
            f.writeTo(output, message);
    }


fields is also assigned in its construction method, where the incoming parameters are initially obtained:
com.dyuproject.protostuff.runtime.RuntimeSchema#createFrom(java.lang.Class, java.util.Set

  final Map<String,java.lang.reflect.Field> fieldMap = findInstanceFields(typeClass);

So we know why the serialization results of protostuff will be different after string calculates hashCode. When will the hashCode of the string be modified? HashMap is put. String's hashCode method saves the result to the hash field when calculating hash, so it changes the value of the field.

  static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

The code at the beginning of the article passes in parameters one by one HashMap, while serializing the fields is done through direct traversal, so if the key is a string, its hashCode must be valuable.

hash.forEach((k, v) -> {
                bytesMaps.put(ProtostuffUtil.serialize(k), ProtostuffUtil.serialize(v));
            });

If a string with a hash value of 0 at this time, even if the content is the same, the byte array serialized by ProtostuffUtil is inconsistent with the set, which leads to the initial problem.

Java serialization of String

Knowing the causes and consequences of this pit, let's look at Java's own serialization approach.

Code 3

       String abc="abc";

        byte[] serialize = SerializationUtils.serialize(abc);

        String bbb=new String(abc.getBytes());

        bbb.hashCode();

        boolean equals1 = abc.equals(bbb);

        byte[] serialize2=SerializationUtils.serialize(bbb);

        boolean equals = Arrays.equals(serialize, serialize2); //true
        System.out.println(equals); 

As you can see, even after the hashCode has been modified, the serialization results of the two objects are the same. The reason is that the serialization that comes with java makes special handling of strings.
In java.io.ObjectOutputStream#writeObject0, if the object instance is String, java.io.ObjectOutputStream#writeString is called for processing.

            if (obj instanceof String) {
                writeString((String) obj, unshared);
            } else if (cl.isArray()) {
                writeArray(obj, desc, unshared);
            } else if (obj instanceof Enum) {
                writeEnum((Enum<?>) obj, desc, unshared);
            } else if (obj instanceof Serializable) {
                writeOrdinaryObject(obj, desc, unshared);
            } else {
                if (extendedDebugInfo) {
                    throw new NotSerializableException(
                        cl.getName() + "\n" + debugInfoStack.toString());
                } else {
                    throw new NotSerializableException(cl.getName());
                }
            }

In writeString, only the value of String is processed. So Java's own serialization approach is insensitive to String's hashCode value.

summary

  • Be careful with any serialization approach.
  • Source code is very important and implementation is very important.

Posted by Rongisnom on Sun, 30 Jun 2019 18:48:00 -0700