Deeply explore the characteristics and principle of FST's sharp tool for fast serialization and compression of memory

Keywords: Java

Concept and definition of FST

The full name of FST serialization is Fast Serialization Tool, which is an alternative implementation of Java serialization. Since the two serious shortcomings of Java serialization mentioned above have been greatly improved in FST, the characteristics of FST are as follows:

  • The serialization provided by JDK is increased by 10 times and the volume is reduced by more than 3-4 times
  • Supports out of heap Maps and the persistence of out of heap Maps
  • Support serialization to JSON

Use of FST serialization

There are two ways to use FST, one is shortcut, and the other requires ObjectOutput and ObjectInput.

Directly use the serialization and deserialization interfaces provided by FSTConfiguration

public static void serialSample() {
  FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration();
  User object = new User();
  object.setName("huaijin");
  object.setAge(30);
  System.out.println("serialization, " + object);
  byte[] bytes = conf.asByteArray(object);
  User newObject = (User) conf.asObject(bytes);                              System.out.println("deSerialization, " + newObject);
}

FSTConfiguration also provides a Class interface for registering objects. If not registered, the Class Name of the object will be written by default. This provides an easy-to-use and efficient API way to get byte [] directly without using ByteArrayOutputStreams.

Using ObjectOutput and ObjectInput, you can more finely control the writing of serialization:

static FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration();
static void writeObject(OutputStream outputStream, User user) throws IOException {
    FSTObjectOutput out = conf.getObjectOutput(outputStream);        out.writeObject(user);
    out.close();}
static FstObject readObject(InputStream inputStream) throws Exception {    FSTObjectInput input = conf.getObjectInput(inputStream);
 User fstObject = (User) input.readObject(User.class);                          input.close();
 return fstObject;}

Application of FST in Dubbo

  • The repackaging of FstObjectInput and FstObjectOutput in Dubbo solves the problem of serializing and deserializing null pointers.
  • The FstFactory factory class is constructed to generate FstObjectInput and FstObjectOutput using the factory pattern. The singleton mode is also used to control that FstConfiguration is a singleton in the whole application, and all objects to be serialized are registered to FstConfiguration during initialization.
  • The same serialization interface FstSerialization is provided externally, providing the capabilities of serialize and deserialize.

FST serialization / deserialization

FST serialized storage format

Basically, all serialized objects stored in Byte form have similar storage structures, regardless of class files, so files and dex files. There is no innovative format in this regard. At most, some compression optimization has been done on the field content, including the utf-8 coding we most often use.

FST's serialized storage and general byte formatted storage schemes are not innovative, such as the FTS serialized byte file below

00000001:  0001 0f63 6f6d 2e66 7374 2e46 5354 4265
00000010:  616e f701 fc05 7630 7374 7200 

Format:

Header|Class name length|Class name String|Field 1 type(1Byte) | [length] | content|Field 2 type(1Byte) | [length] | content|...
  • 0000: byte array type: 00 identifies OBJECT
  • 0001: class name code, 00 indicates UTF code, and 01 indicates ASCII code
  • 0002: Length of class name (1Byte) = 15
  • 0003~0011: Class name string (15Byte)
  • 0012: Integer type ID 0xf7
  • 0013: value of Integer = 1
  • 0014: String type ID 0xfc
  • 0015: length of String = 5
  • 0016~001a: String value "v0str"
  • 001b~001c: END

It can be seen from the above that the Integer type takes up only one byte after serialization (the value is equal to 1), rather than 4Byte in memory. Therefore, it can be seen that the compression is carried out according to certain rules. For the specific code, see the reading of different types in FSTObjectInput#instantiateSpecialTag, and FSTObjectInput also defines the enumeration values corresponding to different types:

public class FSTObjectOutput implements ObjectOutput {
    private static final FSTLogger LOGGER = FSTLogger.getLogger(FSTObjectOutput.class);
    public static Object NULL_PLACEHOLDER = new Object() {
        public String toString() { return "NULL_PLACEHOLDER"; }};
    public static final byte SPECIAL_COMPATIBILITY_OBJECT_TAG = -19; // see issue 52    
    public static final byte ONE_OF = -18;    
    public static final byte BIG_BOOLEAN_FALSE = -17;    
    public static final byte BIG_BOOLEAN_TRUE = -16;    
    public static final byte BIG_LONG = -10;   
    public static final byte BIG_INT = -9;    
    public static final byte DIRECT_ARRAY_OBJECT = -8;    
    public static final byte HANDLE = -7;    
    public static final byte ENUM = -6;    
    public static final byte ARRAY = -5;    
    public static final byte STRING = -4;    
    public static final byte TYPED = -3; // var class == object written class    
    public static final byte DIRECT_OBJECT = -2;    
    public static final byte NULL = -1;    
    public static final byte OBJECT = 0;    
    protected FSTEncoder codec;    
    ...
}

FST serialization and deserialization principle

Byte serialization of objects is equivalent to persistent storage. During deserialization, if the definition of Bean changes, the deserializer must make a compatible solution. We know that serialVersionUID plays an important role in Version control for JDK serialization and deserialization. FST's solution to this problem is to sort through the @ Version annotation.

During the anti sequence operation, FST will first reflect all members of or object Class and sort these members. This sorting plays a key role in compatibility, that is, the principle of @ Version. A defFieldComparator is defined in FSTClazzInfo to sort all fields of the Bean:

public final class FSTClazzInfo {
    public static final Comparator<FSTFieldInfo> defFieldComparator = new Comparator<FSTFieldInfo>() {
    @Override
    public int compare(FSTFieldInfo o1, FSTFieldInfo o2) {
        int res = 0;
    
        if ( o1.getVersion() != o2.getVersion() ) {
         return o1.getVersion() < o2.getVersion() ? -1 : 1;
    }
            // order: version, boolean, primitives, conditionals, object references 
            if (o1.getType() == boolean.class && o2.getType() != boolean.class) {                
            return -1;
            } 
            if (o1.getType() != boolean.class && o2.getType() == boolean.class) {  
            return 1; 
            }
            if (o1.isConditional() && !o2.isConditional()) { 
            res = 1; 
            } else if (!o1.isConditional() && o2.isConditional()) {                res = -1;
            } else if (o1.isPrimitive() && !o2.isPrimitive()) {                                 res = -1;
            } else if (!o1.isPrimitive() && o2.isPrimitive())                                   res = 1;
//              if (res == 0) // 64 bit / 32 bit issues
//                  res = (int) (o1.getMemOffset() - o2.getMemOffset());            
        if (res == 0)
            res = o1.getType().getSimpleName().compareTo(o2.getType().getSimpleName());            if (res == 0)
            res = o1.getName().compareTo(o2.getName());
        if (res == 0) {    return o1.getField().getDeclaringClass().getName().compareTo(o2.getField().getDeclaringClass().getName()); 
   } 
        return res;
    } 
    };  
    ...
}

From the code implementation, we can see that the priority of comparison is the Version size of the Field, followed by the Field type, so generally speaking, the larger the Version, the lower the sorting. As for why to sort, take a look at the FSTObjectInput#instantiateAndReadNoSer method

public class FSTObjectInput implements ObjectInput {
  protected Object instantiateAndReadNoSer(Class c, FSTClazzInfo clzSerInfo, FSTClazzInfo.FSTFieldInfo referencee, int readPos) throws Exception {                   Object newObj;                                                                 newObj = clzSerInfo.newInstance(getCodec().isMapBased());
        ...
        } else {
            FSTClazzInfo.FSTFieldInfo[] fieldInfo = clzSerInfo.getFieldInfo();             readObjectFields(referencee, clzSerInfo, fieldInfo, newObj,0,0);           }
        return newObj;
    }
    protected void readObjectFields(FSTClazzInfo.FSTFieldInfo referencee, FSTClazzInfo serializationInfo, FSTClazzInfo.FSTFieldInfo[] fieldInfo, Object newObj, int startIndex, int version) throws Exception {
        if ( getCodec().isMapBased() ) {
            readFieldsMapBased(referencee, serializationInfo, newObj);                     if ( version >= 0 && newObj instanceof Unknown == false)                           getCodec().readObjectEnd();
            return;
        }
        if ( version < 0 )
            version = 0;
        int booleanMask = 0;
        int boolcount = 8;
        final int length = fieldInfo.length;
        int conditional = 0;
        for (int i = startIndex; i < length; i++) {  // Notice the loop here            
            try {
                FSTClazzInfo.FSTFieldInfo subInfo = fieldInfo[i];                            if (subInfo.getVersion() > version ) {   // You need to move on to the next iteration 
             int nextVersion = getCodec().readVersionTag();  // The next version of the object stream 
             if ( nextVersion == 0 ) // old object read
             {
                 oldVersionRead(newObj); 
                 return; 
             } 
              if ( nextVersion != subInfo.getVersion() ) {  // The version of the same Field cannot be changed, and the version change is synchronized with the version of the stream 
                  throw new RuntimeException("read version tag "+nextVersion+" fieldInfo has "+subInfo.getVersion()); 
              }          readObjectFields(referencee,serializationInfo,fieldInfo,newObj,i,nextVersion);  // Start the recursion of the next Version 
                    return;
                } 
                if (subInfo.isPrimitive()) {
                    ...  
                } else {
                    if ( subInfo.isConditional() ) { 
                        ... 
                    } // object saves the read value in FSTFieldInfo. object subobject = readobjectwithheader (subinfo); subInfo.setObjectValue(newObj, subObject);
                } 
                    ...

From the logic of this code, we can basically know the principle of compatibility between serialization and deserialization of FST. Note that the loop inside is based on the sorted file, and each FSTFieldInfo records its position, type and other details in the object stream:

Serialization:

  • Sort all fields of the Bean by Version (excluding static and transient modified member s). The default version of the Field without @ version annotation is 0; If the versions are the same, they are sorted by version, Boolean, primitives, conditions, and object references
  • Write Bean fields to the output stream one by one according to the sorted fields
  • @The Version of Version can only be increased but not decreased. If it is equal, the default sorting rule may cause inconsistency between the order of the file in the stream and the order of the FSTFieldInfo [] array in memory, resulting in injection errors

Deserialization:

  • The deserialization is parsed according to the format of the object stream, and the Field order saved in the object stream is consistent with the FSTFieldInfo order in memory
  • Field s of the same version exist in the object stream and are missing in the memory Bean: exceptions may be thrown (there will be backward compatibility problems)
  • The object stream contains a higher version Field that is not available in the memory Bean: normal (the old version is compatible with the new version)
  • The Field of the same version is missing in the object stream and exists in the memory Bean: an exception is thrown
  • The version of the same Field in the object stream and memory Bean is inconsistent: an exception is thrown
  • The memory Bean adds a Field no higher than the maximum version: throw an exception

Therefore, this usage rule can be analyzed from the above code logic: the usage principle of @ version is that each new field is annotated with @ version, and the value of version is set to the maximum value of the current version plus one. Field deletion is not allowed

In addition, take a look at the annotation of @ Version annotation: it clearly states that it is used for backward compatibility

package org.nustaq.serialization.annotations;
import java.lang.annotation.ElementType;import java.lang.annotation.Retention;import java.lang.annotation.RetentionPolicy;import java.lang.annotation.Target;
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.FIELD})

/**
* support for adding fields without breaking compatibility to old streams. 
* For each release of your app increment the version value. No Version annotation means version=0.
* Note that each added field needs to be annotated.
*
* e.g.
*
* class MyClass implements Serializable {
*
*     // fields on initial release 1.0 
*     int x;
*     String y;
*
*     // fields added with release 1.5
*     @Version(1) String added;
*     @Version(1) String alsoAdded;
*
*     // fields added with release 2.0
*     @Version(2) String addedv2;
*     @Version(2) String alsoAddedv2;
*
* }
*
* If an old class is read, new fields will be set to default values. You can register a VersionConflictListener
* at FSTObjectInput in order to fill in defaults for new fields.
*
* Notes/Limits:
* - Removing fields will break backward compatibility. You can only Add new fields.
* - Can slow down serialization over time (if many versions)
* - does not work for Externalizable or Classes which make use of JDK-special features such as readObject/writeObject
*   (AKA does not work if fst has to fall back to 'compatible mode' for an object).
* - in case you use custom serializers, your custom serializer has to handle versioning
*
*/public @interface Version {
    byte value();
}
public class FSTBean implements Serializable {
    /** serialVersionUID */
    private static final long serialVersionUID = -2708653783151699375L;             private Integer v0in
    private String v0str;
}

Prepare serialization and deserialization methods

public class FSTSerial {
    
    private static void serialize(FstSerializer fst, String fileName) {                 try {
        FSTBean fstBean = new FSTBean();
        fstBean.setV0int(1);
        fstBean.setV0str("v0str");
        byte[] v1 = fst.serialize(fstBean);
        
        FileOutputStream fos = new FileOutputStream(new File("byte.bin"));             fos.write(v1, 0, v1.length);
        fos.close();
        
        } catch (Exception e) {
        e.printStackTrace();
        }
     }
    
    private static void deserilize(FstSerializer fst, String fileName) {           try {
        FileInputStream fis = new FileInputStream(new File("byte.bin"));               ByteArrayOutputStream baos = new ByteArrayOutputStream();                       byte[] buf = new byte[256];
        int length = 0;
        while ((length = fis.read(buf)) > 0) {
            baos.write(buf, 0, length);
        }
        fis.close();
        buf = baos.toByteArray();
        FSTBean deserial = fst.deserialize(buf, FSTBean.class);                         System.out.println(deserial);
        System.out.println(deserial);
        
    } catch (Exception e) {
        e.printStackTrace();
    }
  }
    public static void main(String[] args) {
        FstSerializer fst = new FstSerializer();
        serialize(fst, "byte.bin");
        deserilize(fst, "byte.bin");
    }
}

Posted by JParishy on Fri, 05 Nov 2021 20:57:18 -0700