Dex file structure
File header
typedef struct { u1 magic[MAGIC_LENGTH]; /* includes version number */ u4 checksum; /* adler32 Verify remaining length files */ u1 signature[kSHA1DigestLen]; /* SHA-1 Document signature */ u4 fileSize; /* length of entire file */ u4 headerSize; /* offset to start of next section */ u4 endianTag; u4 linkSize; u4 linkOff; u4 mapOff; u4 stringIdsSize; //String table size offset u4 stringIdsOff; u4 typeIdsSize; //Type table size offset u4 typeIdsOff; u4 protoIdsSize; //Prototype table size offset u4 protoIdsOff; u4 fieldIdsSize; //Field table size offset u4 fieldIdsOff; u4 methodIdsSize; //Function table size offset u4 methodIdsOff; u4 classDefsSize; //Class definition table size offset u4 classDefsOff; u4 dataSize; //Segment size offset u4 dataOff; }DexHeader;
Because DexHeader is a fixed length structure, it is better to format it directly
Leb128 encoding
Each LEB128 consists of 1 to 5 bytes, all of which together represent a 32-bit value. Except that the highest flag bit of the last byte is 0,
Others are 1. The remaining 7 bits are the payload, and the 7 bits of the second byte are connected. The symbol of the signed LEB128 is determined by the highest payload of the last byte.
For example: 0x7f80
01111111 10000000
Parsing 0x3f80 by unsigned leb128
Parse - 128 according to the signed leb128 (note to convert the complement first)
The specific parsing algorithm is in the example code
String table
The string table contains the strings used in the dex file / code
String table stores StringId, and the specific string value is in data segment data
typedef struct { u4 stringDataOff; /* string_data_item deviation */ }DexStringId; struct string_data_item { u2 uleb128; //String length u1 str[1]; //String content }
The first two bytes of string_data_item are encoded by uleb128, and the length of the string can be obtained after decoding
Type table
typedef struct { u4 descriptorIdx; /* index pointing to a string 'ID */ }DexTypeId;
Field table
typedef struct { u2 classIdx; /* index into typeIds list for defining class */ u2 typeIdx; /* index into typeIds for field type */ u4 nameIdx; /* index into stringIds for field name */ }DexFieldId;
Field describes the member variable / static variable in a class
Prototype table
typedef struct { u4 shortyIdx; /* index into stringIds for shorty descriptor */ u4 returnTypeIdx; /* index into typeIds list for return type */ u4 parametersOff; /* file offset to type_list for parameter types */ }DexProtoId;
Proto prototype describes a function's return type parameter type list
Because parameters may be multiple parametersOff points to a type list structure
typedef struct { u2 typeIdx; /* index into typeIds */ }DexTypeItem; typedef struct { u4 size; /* #of entries in list */ DexTypeItem list[1]; /* entries */ }DexTypeList;
If parametersOff is 0, the function has no parameters
Function table
typedef struct { u2 classIdx; /* index into typeIds list for defining class */ u2 protoIdx; /* index into protoIds for method prototype */ u4 nameIdx; /* index into stringIds for method name */ }DexMethodId;
Method describes the class prototype name of the function
Class data
typedef struct{ u4 classIdx; /* index into typeIds for this class */ u4 accessFlags; u4 superclassIdx; /* index into typeIds for superclass */ u4 interfacesOff; /* file offset to DexTypeList */ u4 sourceFileIdx; /* index into stringIds for source file name */ u4 annotationsOff; /* file offset to annotations_directory_item */ u4 classDataOff; /* file offset to class_data_item */ u4 staticValuesOff; /* file offset to DexEncodedArray */ }DexClassDef;
If superclassIdx is 0, the parent class is java/lang/Object
Interfaces off / annotations off / classDataOff / staticvaluesoff are all represented by a possible 0, indicating that there is no data of this type in the class. For example, a tag class may have classDataOff of 0 because no function / field is defined
sourceFileIdx may be an invalid id
#define kDexNoIndex 0xffffffff /* not a valid index value */
classDataOff indicates that the offset of class data points to the class data structure
struct class_data{ u4_uleb128 staticFieldsSize; u4_uleb128 instanceFieldsSize; u4_uleb128 directMethodsSize; u4_uleb128 virtualMethodsSize; DexField staticFields[staticFieldsSize]; DexField instanceFields[instanceFieldsSize]; DexMethod directMethods[directMethodsSize]; DexMethod virtualMethods[virtualMethodsSize]; } //encoded field typedef struct { //origin type is uleb128 u4 fieldIdx; /* Point to index in a field table */ u4 accessFlags; }DexField; //encoded method typedef struct{ //origin type is uleb128 u4 methodIdx; /* Point to index in a function table */ u4 accessFlags; u4 codeOff; /* DexCode deviation*/ }DexMethod; typedef struct { u2 registersSize; //Number of registers used in the code block u2 insSize; //Entry number u2 outsSize; //Number of parameters u2 triesSize; //Try u4 debugInfoOff; /* file offset to debug info stream */ u4 insnsSize; /*Number of bytecodes*/ u2 insns[1]; //Bytecode content //The following content will appear only when triessize > 0 //padding aligns four bytes between try handler table and bytecode /* followed by optional u2 padding */ //Try cat deals with table contents. Here, try handler table in class file is implemented /* followed by try_item[triesSize] */ /* followed by uleb128 handlersSize */ /* followed by catch_handler_item[handlersSize] */ }DexCode;
The translation of dex bytecode is similar to that of class bytecode. It's better to translate to the specification
Overview
What's the advantage of android vm using dex bytecode instead of class bytecode?
- The dex file is a combination of multiple class files, which combines multiple constant pools into one constant pool, avoiding constant redundancy and facilitating constant memory sharing at runtime
- Loading a dex can load multiple interdependent class es, reducing file io
- arm cpu has many general registers. vm designs the execution flow based on registers, which will speed up the function transfer and execution