Write a Java virtual machine in Java

Keywords: Java Go

Project link Welcome, star

preface

Refer to "write your own Java virtual machine" to write a JVM.
This book uses Go to write a JVM without JIT, PGO, or even GC. It can be said that it is very useless. Then I rewrite the JVM in Java. The Java virtual machine written in Java can be said to be more useless.
Regardless of Ecology (because rewriting a local project does not need to rely on Ecology), the design concepts of Go and Java tend to be simple, and rewriting is not troublesome:

  1. The stacked routine used in the original project can be simply replaced by threads without pooling (a small number)
  2. The Channel can be replaced by Buffer+Semaphore
  3. The meta capability brought by the Go type keyword can also be achieved by Java using getClass.getName() runtime reflection
  4. Java generics can also bring some convenience. Memory and IO are obviously not a problem here.
  5. Class library: Go flag package used for command line parsing can be replaced by jcommander

So to sum up, there is no technical difficulty in this JVM, but a larger demo is used to learn the principle of JVM.

Class file search

The main file is in the classpath directory. Use jcommander to parse the command line parameters.
jcommander documentation: https://jcommander.org/
Search for classes from the class path. The java class path is divided into three parts: startup class path, extension class path and user class path
The classpath is specified by the user using command line parameters
The execution order is classpath initialization – > find user provided classes
The Entry interface is used to represent the classpath item. It combines DirEntry, ZipEntry, CompositeEntry and wildcarentry. DirEntry represents the classpath in the form of directory, ZipEntry represents the classpath in the form of zip or jar, and CompositeEntry represents the path where the file separator divides multiple files, Wildcarentry is used to indicate that the end of * refers to all files in the directory.

Class file parsing

The basic data unit constituting the class file is bytes, and the data is stored in the class file in the big end mode.
The key is the ClassReader class, which is used to assist byte operations.

/**
 * @author treblez
 * @Description A class that assists in reading data
 */
public class ClassReader {
    private final ByteBuffer buf;
    ClassReader(byte[] data){
        buf = ByteBuffer.allocate(data.length+5);
        buf.put(data);
        // Note that the flag bit is cleared
        buf.rewind();
    }
    
    public byte readUint8() {
        return buf.get();
    }
    
    public char readUint16() {
        byte[] tmp = new byte[2];
        buf.get(tmp,0,2);
        return (char) (((tmp[0] & 0xFF) << 8) | (tmp[1] & 0xFF));
    }
    
    public int readUint32()  {
        byte[] tmp = new byte[4];
        buf.get(tmp,0,4);
        // Note operator priority
        return  ((tmp[3]&0xff) |((tmp[2]&0xff) << 8) | ((tmp[1]&0xff)  << 16) | ((tmp[0]&0xff) << 24));
    }
    
    public long readUint64() {
        byte[] tmp = new byte[8];
        buf.get(tmp,0,8);
        return  (((long)(tmp[0] & 0xFF) << 56) | ((long)(tmp[1] & 0xFF) << 48) | ((long)(tmp[2] & 0xFF) << 40)
                | ((long)(tmp[3] & 0xFF) << 32) |
                (tmp[4] & 0xFF << 24) | (tmp[5] & 0xFF << 16) | (tmp[6] & 0xFF << 8) | (tmp[7] & 0xFF));
    }
    /**
     *Read uint16 table, and the size is specified by the data at the beginning
      */
    public char[] readUint16s() {
        var n = readUint16();
        char[] s = new char[n];
        for(int i=0;i<n;i++){
            s[i] = readUint16();
        }
        return s;
    }

    public byte[] readBytes(int n) {
        byte[] ret = new byte[n];
        buf.get(ret, 0, n);
        return ret;
    }
}

The read order of byte stream is as follows:

    void read(ClassReader reader) throws Exception {
    	// Verify magic number
        readAndCheckMagic(reader);
        // Verify version number
        readAndCheckVersion(reader);
        // Read constant pool
        constantPool = new ConstantPool().readConstantPool(reader);
        //Class access flag bitmask
        accessFlags = reader.readUint16();
        /*
         * Class and superclass indexes, thisClass must be a valid constant pool index
         * superClass It is only 0 in Object.class and must be valid in other files
         */
        thisClass = reader.readUint16();
        superClass = reader.readUint16();
        //The interface index table gives the names of all interfaces implemented by this class
        interfaces = reader.readUint16s();
        // Field table
        fields = MemberInfo.readMembers(reader, constantPool);
        // Method table
        methods = MemberInfo.readMembers(reader, constantPool);
        // Attribute table
        attributes = AttributeInfo.readAttributes(reader, constantPool);
    }

The value of magic number must be 0xCAFEBABE. Class, superclass and interface table are stored in the way of constant pool index.
Fields, methods and classes have access flags implemented using bitmask. After the access flag is the constant pool index, which gives the descriptor of the field or method, and finally the attribute table.
The constant pool contains a lot of constant information, including numeric and string constants, class and interface names, field and method names, etc. Constant types are marked as 8-bit unsigned integers:

int CONSTANT_CLASS = 7;
    int CONSTANT_FIELDREF = 9;
    int CONSTANT_METHODREF = 10;
    int CONSTANT_INTERFACE_METHODREF = 11;
    int CONSTANT_STRING = 8;
    int CONSTANT_INTEGER = 3;
    int CONSTANT_FLOAT = 4;
    int CONSTANT_LONG = 5;
    int CONSTANT_DOUBLE = 6;
    int CONSTANT_NAME_AND_TYPE = 12;
    int CONSTANT_UTF8 = 1;
    int CONSTANT_METHOD_HANDLE = 15;
    int CONSTANT_METHOD_TYPE = 16;
    int CONSTANT_INVOKE_DYNAMIC = 18;

    /**
     * Read constant information
     *
     * @param reader
     */
    void readInfo(ClassReader reader) throws IOException;

    /**
     * Read tag values, new create specific constants, and then call readInfo to read constant information.
     * @param reader
     * @param cp
     * @return
     * @throws Exception
     */
    static ConstantInfo readConstantInfo(ClassReader reader, ConstantPool cp) throws Exception {
        var tag = reader.readUint8();
        ConstantInfo ret = switch (tag) {
            case CONSTANT_INTEGER -> new ConstantIntegerInfo();
            case CONSTANT_FLOAT -> new ConstantFloatInfo();
            case CONSTANT_LONG -> new ConstantLongInfo();
            case CONSTANT_DOUBLE -> new ConstantDoubleInfo();
            case CONSTANT_UTF8 -> new ConstantUtf8Info();
            case CONSTANT_STRING -> new ConstantStringInfo(cp);
            case CONSTANT_CLASS -> new ConstantClassInfo(cp);
            case CONSTANT_FIELDREF -> new ConstantFieldRefInfo(cp);
            case CONSTANT_METHODREF -> new ConstantMethodRefInfo(cp);
            case CONSTANT_INTERFACE_METHODREF -> new ConstantInterfaceMethodRefInfo(cp);
            case CONSTANT_NAME_AND_TYPE -> new ConstantNameAndTypeInfo();
            // The following three are to support the SE7 invokedynamic instruction
            // That is, the method referenced by the call point qualifier is dynamically resolved at runtime, and then the method is executed
            case CONSTANT_METHOD_TYPE -> new ConstantMethodTypeInfo();
            case CONSTANT_METHOD_HANDLE -> new ConstantMethodHandleInfo();
            case CONSTANT_INVOKE_DYNAMIC -> new ConstantInvokeDynamicInfo();
            default -> throw new Exception("java.lang.ClassFormatError: constant pool tag!");
        };
        ret.readInfo(reader);
        return ret;
    }

The bytecode of the method is stored in the attribute table, and the Deprecated (not recommended) and Synthetic (source file does not exist) are used as markers. The SourceFile indicates the source file name, constantvalue indicates the value of the constant expression, constantvalue indicates the value of the constant expression, the Code attribute stores bytecode and other method information, and Exceptions indicates the exception table thrown, LineNumberTable and LocalVariableTable store the line number and local variable information of the method.

Posted by 303tech on Tue, 16 Nov 2021 08:38:24 -0800