Understanding java class loading mechanism

Keywords: Java Back-end

I'm going to move most of this article. Don't scold me

Reference link:
Deep understanding of Java classloader
Java class loading mechanism
Deep understanding of Java class loading

Class class

No organization, just keep records

When a Java program runs, the system always identifies all objects by the so-called runtime type. This information records the class to which each object belongs. Virtual machines usually use this information to select the correct objects and methods for execution, and the class used to store these types of information is class. Class encapsulates the running state of an object and interface. When a class is loaded, the object of class is created, that is, the object of class encapsulates the type information of a class. The operation of class can be realized through the operation on the object, which is the principle of reflection.
Since the object of Class encapsulates the information of a Class, this information generally includes Class name, implemented interface, parent Class, member method, member variable, annotation and other information, that is, we can manipulate this information.
Each instance of Class represents a running Class
Class class has no public constructor, which means that it cannot create an instance through new. Class objects are created by the jvm.
At the same time, you need to know that in the JVM, each class has only one unique class object, and identifying a unique class is through its fully qualified name and its class loader. When running the program, the JVM will first judge whether the current class has been loaded in the cache, that is, the findLoadedClass method. If it is not loaded, then the loading permission will be handed over to the parent loader of the current class loader for loading, and the parent loader will continue to submit the loading permission upward. After knowing that the parent class of a certain level loader is null, the loading permission of this class will be handed over to the startup class loader. If the startup class loader cannot load this class, the loading permission will be handed over to the next time in the reverse order, Until the class is successfully loaded at a certain level, this is the parent delegation mechanism of java. We will observe this mechanism from the code level below. The class loader loads classes through the loadClass method. In the loadClass method, the lowest level of class loading calls a method called findClass, which loads the class file into memory, and then converts the bytecode into a java Class object through the defineClass method.
We can get a Class object in the centralized way shown below

package com.armandhe.javabase;

public class ClassLoaderTest1 {
    public static void main(String[] args)  throws Exception{
        ClassLoaderTest1 classLoaderTest1 = new ClassLoaderTest1();
        Class<? extends ClassLoaderTest1> aClass = classLoaderTest1.getClass();
        Class<ClassLoaderTest1> classLoaderTest1Class = ClassLoaderTest1.class;
        Class<?> aClass1 = Class.forName("com.armandhe.javabase.ClassLoaderTest1");
        System.out.println(aClass1 == classLoaderTest1Class);
        System.out.println(aClass1 == aClass1);
    }
}


We can see that the Class objects of the same Class obtained through the three are the same, which verifies the conclusion that there is only one Class object in the same Class. We can see that their Class loaders are actually the same:

Later, we will demonstrate the different effects of class loader.
Let's take another look at the following effects:

Here we load two classes with different names. Of course, their Class instances are not equal, but their Class loaders are the same. According to the above theory, as long as there is a difference between the Class loader and the fully qualified name, the Class instances of the two classes are not equal.
It can be described with the following figure:

Java reflection mechanism

As mentioned above, when loading a Class, the JVM will first create a Class instance of the Class, which stores all the fields, methods and other information related to the Class. Through this instance, the corresponding Class object can be generated. In Java, Class instances can be generated manually. This feature breaks through the shackles of manually generating Class objects, and Java implements the reflection mechanism through this feature.
Reflection is to know all the properties and methods of any class in the running state; For any object, you can call any of its methods and properties; And can change his attributes.
I won't talk about the details of java reflection. The main methods used are several methods to obtain attributes, methods and construction methods.

Java class loading

Java class loading generally goes through five stages: loading - > verification - > preparation - > parsing - > initialization. Of course, the complete declaration cycle of a class should also include using domain unloading.

  • Loading: a stage of Class loading process: find such bytecode file through the full qualification of a Class, and create a Class object using the bytecode file

  • Verification: the purpose is to ensure that the information contained in the byte stream of the Class file meets the requirements of the current virtual machine and will not endanger the security of the virtual machine. It mainly includes four kinds of verification, file format verification, metadata verification, bytecode verification and symbol reference verification.

  • Preparation: allocate memory for class variables (i.e. static modified field variables) and set the initial value of such variables, i.e. 0 (e.g. static int i=5; here only I is initialized to 0, and the value of 5 will be assigned during initialization). Static modified with final is not included here, because final will be allocated during compilation. Note that initialization will not be allocated for instance variables, Class variables are allocated in the method area, while instance variables are allocated to the Java heap along with the object.

  • Parsing: it is mainly the process of replacing the symbol reference in the constant pool with a direct reference. Symbol reference is a set of symbols to describe the target, which can be any literal, while direct reference is a pointer directly to the target, a relative offset, or a handle indirectly located to the target. There are class or interface parsing, field parsing, class method parsing and interface method parsing (this involves the reference of bytecode variables. For more details, please refer to in depth Java virtual machine).

  • Initialization: in the last stage of class loading, if the class has a super class, it will be initialized, and the static initializer and static initialization member variables will be executed (for example, the static variable that has only initialized the default value will be assigned in this stage, and the member variables will also be initialized).

The three truncations after loading are collectively referred to as the connection phase.
The verification phase is described as follows:

  • File format verification: such as whether it starts with the magic number 0xCAFEBABE, whether the major and minor version numbers are within the processing range of the current virtual machine, constant rationality verification, etc.
    This stage ensures that the input byte stream can be correctly parsed and stored in the method area, and the format meets the requirements of describing a Java type information.
  • Metadata verification: whether there is a parent class, whether the inheritance chain of the parent class is correct, whether the abstract class implements all the methods required in its parent class or interface, and whether fields and methods conflict with the parent class.
    The second stage is to ensure that there is no metadata information that does not comply with the Java language specification.
  • Bytecode verification: through the analysis of data flow and control flow, it is determined that the program semantics is legal and logical. For example, ensure that the jump instruction will not jump to the bytecode instruction outside the method body.
  • Symbol reference validation: occurs during the parsing phase to ensure that symbol references can be converted into direct references.

In the initialization stage, it is actually the process of executing the < clinit > () method. The < clinit > () method is a method that combines the assignment action of class variables collected by the compiler in the class with the statements in the static code block, that is, executing the < clinit > () method will assign values to class variables and execute the statements in the static code block in the class< The clinit > () method does not show that the < clinit > () method of the parent class is called implicitly, that is, we manually call the < clinit > () method of the parent class during initialization, and the jvm will ensure that the < clinit > () method of the parent class is executed before the < clinit > () method of the child class. Of course, the interface is an exception. Several interfaces do not need to call the < clinit > () method of the parent class, except when using the static scalar in the parent interface.
When there is no static code block and class variable copy operation in a class, < clinit > () may not exist.
The JVM will ensure that the < clinit > () method of a class is executed only once in multiple threads. When a thread is executing < clinit > (), other threads need to block and wait until the current thread < clinit > () is executed.

Class loading timing

For the initialization phase, the virtual machine specification specifies that there are and only five situations in which classes must be "initialized" immediately (and loading, verification and preparation naturally need to start before this):

  • When new, getstatic, putstatic or invokestatic bytecode instructions are encountered, if the class has not been initialized, its initialization needs to be triggered first. The corresponding scenarios are: instantiating an object with new, reading or setting the static field of a class (except the static field modified by final and the result has been put into the constant pool during compilation), and calling the static method of a class.
  • When making reflection calls to a class, if the class has not been initialized, its initialization needs to be triggered first.
  • When the parent class of the initialization class has not been initialized, the initialization of its parent class needs to be triggered first. (when an interface is initialized, it is not required that all its parent interfaces have completed initialization.)
  • When the virtual machine starts, the user needs to specify a main class to be executed (the class containing the main() method), and the virtual machine initializes this main class first.
  • When using the dynamic language support of JDK 1.7, if a java.lang.invoke.MethodHandle instance is used, the final parsing result is REF_getStatic,REF_putStatic,REF_invokeStatic's method handle, and the class corresponding to this method handle has not been initialized, you need to trigger its initialization first.
    The behavior in the above five scenarios is called active reference to a class. In addition, all methods of referencing classes will not trigger initialization, which is called passive reference, for example:
  • Referencing a static field of a parent class through a subclass does not result in subclass initialization.
  • Referencing a class through an array definition does not trigger the initialization of this class. MyClass[] cs = new MyClass[10];
  • Constants are stored in the constant pool of the calling class at the compilation stage. In essence, they are not directly referenced to the class defining constants, so the initialization of the class defining constants will not be triggered.

Class loading

Class loading means that the loader has a fully qualified name of a class to read the binary byte stream of this class into the JVM, and then convert it into a class instance object corresponding to the target class. Three built-in class loaders are provided in the JVM: startup class loader, extension class loader and application class loader.

Start class loader

The startup class loader is used to load the classes required by the JVM itself. This class loader is implemented in C + +, has no parent class, and its parent loader is null. It is responsible for loading the core class library under% JAVAHOME%/lib or the jar package under the path specified by the - X bootclasspath parameter into memory. The startup class logger only loads classes starting with Java, javax, sun, etc. If I want to load a java.lang.String class now, but the class is not in the above directory, its loading permission will be sent to the startup class loader due to the parental delegation mechanism, but it will not be loaded because it is not under the above directory. This ensures that the core class library of Java will not be polluted and tampered with, which is the charm of the parent delegation mechanism.

extensions class loader

The extension class loader is implemented by ExtClassLoader, which is the internal static class of sun.misc.Launcher. It is responsible for loading% Java_ The class library in the home% / lib / ext directory or in the path specified by the command - Djava.ext.jar.

system class loader

Also called application class loader, it is implemented by sun.misc.Launcher$AppClassLoader. It is responsible for loading the class library under the specified path of the system class path java -classpath or - D java.class.path, that is, the classpath path we often use. Developers can directly use the system class loader. Generally, this class loader is the default class loader in the program, which can be obtained through the ClassLoader#getSystemClassLoader() method.

Parental delegation mechanism

The parent delegation mode requires that all class loaders except the top-level startup class loader should have their own parent class loader. Please note that the parent-child relationship in the parent delegation mode is not the so-called class inheritance relationship, but the combination relationship is used to reuse the relevant code of the parent class loader. The relationship between class loaders is as follows:

The parent delegation mode was introduced after Java 1.2. Its working principle is that if a class loader receives a class loading request, it will not load it first, but delegate the request to the loader of the parent class for execution. If the parent class loader still has its parent class loader, it will further delegate upward and recurse in turn, The request will eventually reach the top-level startup class loader. If the parent class loader can complete the class loading task, it will return successfully. If the parent class loader cannot complete the loading task, the child loader will try to load it by itself.
The advantage of using parent delegation mode is that Java classes have a hierarchical relationship with priority along with their class loader. Through this hierarchical relationship, you can avoid repeated loading of classes. When the father has loaded the class, it is not necessary to load the child ClassLoader again. Secondly, considering the security factors, the types defined in the Java core API will not be replaced at will. Suppose a class named java.lang.Integer is passed through the network and passed to the startup class loader through the parental delegation mode, and the startup class loader finds the class with this name in the core Java API and finds that the class has been loaded, The java.lang.Integer passed from the network will not be reloaded, but the loaded Integer.class will be returned directly, so as to prevent the core API library from being tampered with at will. You may wonder, what if we customize a class named java.lang.SingleInterge under the classpath (this class is made up of nonsense)? This class does not exist in java.lang. it is passed to the startup class loader through the parent delegation mode. Since there is no such class in the parent class loader path, it will not be loaded. It will be loaded by reverse delegation to the child class loader. Finally, the class will be loaded through the system class loader. However, this is not allowed because java.lang is the core API package and requires access rights. The following exceptions will be reported when forced loading

java.lang.SecurityException: Prohibited package name: java.lang

Let's understand the parental delegation mechanism through the following figure:

As can be seen from the figure, the top-level class loader is ClassLoader class, which is an abstract class. All subsequent class loaders inherit from ClassLoader (excluding startup class loader). Here we mainly introduce several important methods in ClassLoader.

Parental delegation process

Let's look at the code of AppClassLoader#loadClass

public Class<?> loadClass(String var1, boolean var2) throws ClassNotFoundException {
            int var3 = var1.lastIndexOf(46);
            if (var3 != -1) {
                SecurityManager var4 = System.getSecurityManager();
                if (var4 != null) {
                    var4.checkPackageAccess(var1.substring(0, var3));
                }
            }

            if (this.ucp.knownToNotExist(var1)) {
                Class var5 = this.findLoadedClass(var1);
                if (var5 != null) {
                    if (var2) {
                        this.resolveClass(var5);
                    }

                    return var5;
                } else {
                    throw new ClassNotFoundException(var1);
                }
            } else {
                return super.loadClass(var1, var2);
            }
        }

We have built a security manager. We don't know what it means. It hasn't reached that point yet. The main code is the following if judgment.
First, use the findLoadedClass method to find out whether the current class has been loaded in the cache. If it has not been loaded, call the super.laodClass method to load the class. We follow up the super.loadClass method:

protected Class<?> loadClass(String name, boolean resolve)
        throws ClassNotFoundException
    {
        synchronized (getClassLoadingLock(name)) {
            // First, check if the class has already been loaded
            Class<?> c = findLoadedClass(name);
            if (c == null) {
                long t0 = System.nanoTime();
                try {
                    if (parent != null) {
                        c = parent.loadClass(name, false);
                    } else {
                        c = findBootstrapClassOrNull(name);
                    }
                } catch (ClassNotFoundException e) {
                    // ClassNotFoundException thrown if class not found
                    // from the non-null parent class loader
                }

                if (c == null) {
                    // If still not found, then invoke findClass in order
                    // to find the class.
                    long t1 = System.nanoTime();
                    c = findClass(name);

                    // this is the defining class loader; record the stats
                    sun.misc.PerfCounter.getParentDelegationTime().addTime(t1 - t0);
                    sun.misc.PerfCounter.getFindClassTime().addElapsedTimeFrom(t1);
                    sun.misc.PerfCounter.getFindClasses().increment();
                }
            }
            if (resolve) {
                resolveClass(c);
            }
            return c;
        }
    }

The first is to get a lock. We don't know what to do. Then, the findLoadedClass method is also called to find in the cache. If it is not found, judge whether the parent class is empty. If it is not empty, call the loadClass method of the parent class to load the class. If it is empty, call the startup class loader to load the class. If it is still not found and c==null, the findClass method is called.
If we know that the parent class of the extension class loader is null, it will certainly let the startup class loader load the class.
So why is it null? Look at the following code:

public ExtClassLoader(File[] var1) throws IOException {
            super(getExtURLs(var1), (ClassLoader)null, Launcher.factory);
            SharedSecrets.getJavaNetAccess().getURLClassPath(this).initLookupCache(this);
        }

The constructor of the parent class is called here. Note that the second parameter here is null. Let's follow:

public URLClassLoader(URL[] urls, ClassLoader parent,
                          URLStreamHandlerFactory factory) {
        super(parent);
        // this is to make the stack depth consistent with 1.1
        SecurityManager security = System.getSecurityManager();
        if (security != null) {
            security.checkCreateClassLoader();
        }
        acc = AccessController.getContext();
        ucp = new URLClassPath(urls, factory, acc);
    }

Here, we see that we continue to call the constructor of the parent class, and continue to follow:

protected SecureClassLoader(ClassLoader parent) {
        super(parent);
        // this is to make the stack depth consistent with 1.1
        SecurityManager security = System.getSecurityManager();
        if (security != null) {
            security.checkCreateClassLoader();
        }
        initialized = true;
    }

Continue with:

protected ClassLoader(ClassLoader parent) {
        this(checkCreateClassLoader(), parent);
    }

continue:

private ClassLoader(Void unused, ClassLoader parent) {
        this.parent = parent;
        if (ParallelLoaders.isRegistered(this.getClass())) {
            parallelLockMap = new ConcurrentHashMap<>();
            package2certs = new ConcurrentHashMap<>();
            assertionLock = new Object();
        } else {
            // no finer-grained lock; lock on the classloader instance
            parallelLockMap = null;
            package2certs = new Hashtable<>();
            assertionLock = this;
        }
    }

Here, we can see that this.parent is assigned as the passed formal parameter parent, and the parameter value is null, so the parent class of the extension class loader is null.

When we follow the findClass method, we find:

The exception is thrown directly, so if we need to load a class with any path, we need to override the findClass method.
The function implemented in the findClass method is to load class files from the file system into memory, and then generate a class object through the defineClass method. The logic of the defineClass method is implemented in the ClassLoader class. Look at the following implementation of findClass:

@Override
    protected Class<?> findClass(String name) throws ClassNotFoundException {
        System.out.println("Looking for class@");
        // Gets the byte array of the class
        byte[] classData = new byte[0];

        classData = getClassData(name);
        if (classData == null) {
            throw new ClassNotFoundException();
        } else {
            //Generating class objects using defineClass
            return defineClass(name, classData, 0, classData.length);
        }

    }

The getClassData method loads bytecode data from the file system:

private byte[] getClassData(String className) {
        System.out.println("Getting byte data");
        try {
            byte[] bytes = new byte[4096];
            int len = -1;
            String s = classNameToPath(className);
            ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
            InputStream inputStream = new FileInputStream(new File(s));
            //        ObjectInputStream objectInputStream = new ObjectInputStream(inputStream);
            while ((len = inputStream.read(bytes)) != -1) {
                byteArrayOutputStream.write(bytes, 0, len);
            }
            return byteArrayOutputStream.toByteArray();
        }catch (IOException e){
            e.printStackTrace();
        }
        return null;
}

classNameToPath is the logic to convert the entered file name to the corresponding path:

private String classNameToPath(String classname){
        return rootDir + File.separatorChar + classname.replace('.', File.separatorChar) + ".class";
    }

Such a simple logic of the findClass method is implemented. The complete code is as follows:

package com.armandhe.javabase;

import java.io.*;
import java.lang.Class;



class FileClassLoader extends ClassLoader{
    private String rootDir;

    public FileClassLoader(String rootDir) {
        this.rootDir = rootDir;
    }


    @Override
    protected Class<?> findClass(String name) throws ClassNotFoundException {
        System.out.println("Looking for class@");
        // Gets the byte array of the class
        byte[] classData = new byte[0];

        classData = getClassData(name);
        if (classData == null) {
            throw new ClassNotFoundException();
        } else {
            //Generating class objects using defineClass
            return defineClass(name, classData, 0, classData.length);
        }

    }

    private String classNameToPath(String classname){
        return rootDir + File.separatorChar + classname.replace('.', File.separatorChar) + ".class";
    }

    private byte[] getClassData(String className) {
        System.out.println("Getting byte data");
        try {
            byte[] bytes = new byte[4096];
            int len = -1;
            String s = classNameToPath(className);
            ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
            InputStream inputStream = new FileInputStream(new File(s));
            //        ObjectInputStream objectInputStream = new ObjectInputStream(inputStream);
            while ((len = inputStream.read(bytes)) != -1) {
                byteArrayOutputStream.write(bytes, 0, len);
            }
            return byteArrayOutputStream.toByteArray();
        }catch (IOException e){
            e.printStackTrace();
        }
        return null;
}}

public class ClassLoaderTest {
    public static void main(String[] args){
        String rootDir = "E:\\desktop\\java\\javatest\\target\\classes";
        FileClassLoader fileClassLoader = new FileClassLoader(rootDir);
//        fileClassLoader.getClassData("test");
        System.out.println("Parent loader of custom class loader:"+fileClassLoader.getParent());
        System.out.println("System class loader:"+ClassLoader.getSystemClassLoader());
        System.out.println("Parent loader of system class loader:"+ClassLoader.getSystemClassLoader().getParent());
        System.out.println("Parent loader of extension class loader:"+ClassLoader.getSystemClassLoader().getParent().getParent());
        try {
            Class<?> aClass = fileClassLoader.findClass("com.armandhe.javabase.ClassLoaderDemo");
            System.out.println(aClass.newInstance());
            aClass.getMethod("test").invoke(null);
        } catch (Exception e) {
            System.out.println("Class not found!");
        }
    }
}

Operation results:

As mentioned above, there is a parameter var2 in the loadClass method. If var2==true, the resolveClass method is used. Using this method, the Class object of the Class can be created and resolved at the same time. As we said earlier, the link phase is mainly to verify the bytecode, allocate memory for Class variables and set the initial value, and convert the symbol reference in the bytecode file into a direct reference.
There is also a thread context loader, which can't be written!!! Go and read the original
jdbc uses the thread context loader. Generally speaking, the thread context loader uses the system class loader, but in some middleware, it uses the custom class loader, such as tomcat.

Posted by GaryC on Wed, 03 Nov 2021 10:56:50 -0700