Dex file resolution

Keywords: C encoding Java Android

Dex file structure

File header

typedef struct {
    u1  magic[MAGIC_LENGTH];           /* includes version number */
    u4  checksum;           /* adler32 Verify remaining length files */
    u1  signature[kSHA1DigestLen]; /* SHA-1 Document signature */
    u4  fileSize;           /* length of entire file */
    u4  headerSize;         /* offset to start of next section */
    u4  endianTag;
    u4  linkSize;
    u4  linkOff;
    u4  mapOff;
    u4  stringIdsSize;   //String table size offset
    u4  stringIdsOff;
    u4  typeIdsSize;    //Type table size offset
    u4  typeIdsOff;
    u4  protoIdsSize;   //Prototype table size offset
    u4  protoIdsOff;
    u4  fieldIdsSize;   //Field table size offset
    u4  fieldIdsOff;
    u4  methodIdsSize;   //Function table size offset
    u4  methodIdsOff;
    u4  classDefsSize;  //Class definition table size offset
    u4  classDefsOff;
    u4  dataSize;   //Segment size offset
    u4  dataOff;
}DexHeader;

Because DexHeader is a fixed length structure, it is better to format it directly

Leb128 encoding

Each LEB128 consists of 1 to 5 bytes, all of which together represent a 32-bit value. Except that the highest flag bit of the last byte is 0,

Others are 1. The remaining 7 bits are the payload, and the 7 bits of the second byte are connected. The symbol of the signed LEB128 is determined by the highest payload of the last byte.

For example: 0x7f80

01111111 10000000

Parsing 0x3f80 by unsigned leb128
Parse - 128 according to the signed leb128 (note to convert the complement first)

The specific parsing algorithm is in the example code

String table

The string table contains the strings used in the dex file / code

String table stores StringId, and the specific string value is in data segment data

typedef struct {
    u4 stringDataOff;      /* string_data_item deviation */
}DexStringId;

struct string_data_item {
    u2 uleb128; //String length
    u1 str[1];  //String content
}

The first two bytes of string_data_item are encoded by uleb128, and the length of the string can be obtained after decoding

Type table

typedef struct {
    u4  descriptorIdx;      /* index pointing to a string 'ID */
}DexTypeId;

Field table

typedef struct {
    u2  classIdx;           /* index into typeIds list for defining class */
    u2  typeIdx;            /* index into typeIds for field type */
    u4  nameIdx;            /* index into stringIds for field name */
}DexFieldId;

Field describes the member variable / static variable in a class

Prototype table

typedef struct {
    u4  shortyIdx;          /* index into stringIds for shorty descriptor */
    u4  returnTypeIdx;      /* index into typeIds list for return type */
    u4  parametersOff;      /* file offset to type_list for parameter types */
}DexProtoId;

Proto prototype describes a function's return type parameter type list

Because parameters may be multiple parametersOff points to a type list structure

typedef struct {
    u2  typeIdx;            /* index into typeIds */
}DexTypeItem;


typedef struct {
    u4  size;               /* #of entries in list */
    DexTypeItem list[1];    /* entries */
}DexTypeList;

If parametersOff is 0, the function has no parameters

Function table

typedef struct {
    u2  classIdx;           /* index into typeIds list for defining class */
    u2  protoIdx;           /* index into protoIds for method prototype */
    u4  nameIdx;            /* index into stringIds for method name */
}DexMethodId;

Method describes the class prototype name of the function

Class data

typedef struct{
    u4  classIdx;           /* index into typeIds for this class */
    u4  accessFlags;
    u4  superclassIdx;      /* index into typeIds for superclass */
    u4  interfacesOff;      /* file offset to DexTypeList */
    u4  sourceFileIdx;      /* index into stringIds for source file name */
    u4  annotationsOff;     /* file offset to annotations_directory_item */
    u4  classDataOff;       /* file offset to class_data_item */
    u4  staticValuesOff;    /* file offset to DexEncodedArray */
}DexClassDef;

If superclassIdx is 0, the parent class is java/lang/Object

Interfaces off / annotations off / classDataOff / staticvaluesoff are all represented by a possible 0, indicating that there is no data of this type in the class. For example, a tag class may have classDataOff of 0 because no function / field is defined

sourceFileIdx may be an invalid id

#define kDexNoIndex 0xffffffff          /* not a valid index value */

classDataOff indicates that the offset of class data points to the class data structure

struct class_data{
    u4_uleb128 staticFieldsSize;
    u4_uleb128 instanceFieldsSize;
    u4_uleb128 directMethodsSize;
    u4_uleb128 virtualMethodsSize;
    
    DexField staticFields[staticFieldsSize];
    DexField instanceFields[instanceFieldsSize];
    DexMethod directMethods[directMethodsSize];
    DexMethod virtualMethods[virtualMethodsSize];
}

//encoded field
typedef struct {
    //origin type is uleb128
    u4 fieldIdx;    /* Point to index in a field table */
    u4 accessFlags;
}DexField;


//encoded method
typedef struct{
    //origin type is uleb128
    u4 methodIdx;    /* Point to index in a function table */
    u4 accessFlags;
    u4 codeOff;      /* DexCode deviation*/
}DexMethod;

typedef struct {
    u2  registersSize;  //Number of registers used in the code block
    u2  insSize;  //Entry number
    u2  outsSize; //Number of parameters
    u2  triesSize;  //Try 
    u4  debugInfoOff;       /* file offset to debug info stream */
    u4  insnsSize;          /*Number of bytecodes*/
    u2  insns[1];   //Bytecode content
    //The following content will appear only when triessize > 0
    //padding aligns four bytes between try handler table and bytecode
    /* followed by optional u2 padding */ 
    //Try cat deals with table contents. Here, try handler table in class file is implemented
    /* followed by try_item[triesSize] */
    /* followed by uleb128 handlersSize */
    /* followed by catch_handler_item[handlersSize] */
}DexCode;

The translation of dex bytecode is similar to that of class bytecode. It's better to translate to the specification

Overview

What's the advantage of android vm using dex bytecode instead of class bytecode?

  1. The dex file is a combination of multiple class files, which combines multiple constant pools into one constant pool, avoiding constant redundancy and facilitating constant memory sharing at runtime
  2. Loading a dex can load multiple interdependent class es, reducing file io
  3. arm cpu has many general registers. vm designs the execution flow based on registers, which will speed up the function transfer and execution

Code in this article

DexParserDemo

Reference document

Bytecode for the Dalvik VM

Posted by russellbcv on Sat, 02 Nov 2019 11:37:07 -0700