iOS underlying principle 28: LLVM compilation process and Clang plug-in development

Keywords: C iOS

This paper mainly understands the compilation process of LLVM and the development of clang plug-in



LLVM is a framework system of architecture compiler, which is written in C + +. It is used to optimize the compile time, link time, run time and idle time of programs written in any programming language. Remain open to developers and compatible with existing scripts

Traditional compiler design

Source Code + front-end Frontend + Optimizer + back-end (code generator) + Machine Code, as shown in the figure below

Traditional compiler design

Compiler architecture of ios

The compiler used by OC, C and C + + is Clang in the front end, swift in the swift, and LLVM in the back end

Module description

  • Front end Frontend: the task of the front end of the compiler is to parse the source code (compilation stage), which will   Lexical analysis, syntax analysis, semantic analysis, check the source code for errors, and then build an Abstract Syntax Tree   AST), the front end of llvm also generates intermediate representation (IR). It can be understood that llvm is a compiler + optimizer. It receives IR intermediate code and outputs IR to the back end, which is translated into the target instruction set through the back end

  • Optimizer optimizer: the optimizer is responsible for various optimizations to improve the running time of the code, such as eliminating redundant calculations

  • Back end (Code Generator): map the code to the target instruction set, generate machine code, and optimize the code related to machine code

Design of LLVM

The most important aspect of LLVM design is to use the general code representation (IR), which is used to represent the code in the compiler. All llvms can independently write the front end for any programming language and the back end for any hardware architecture, as shown below

In a popular word, the design of LLVM is separated from the front end and the back end. No matter the front end or the back end changes, it will not affect the other end

About Clang

Clang is a sub project of LLVM project. It is a lightweight compiler based on LLVM architecture diagram. At the beginning of its birth, it was designed to replace GCC and provide faster compilation speed. It is a compiler responsible for C, C + +, OC languages and belongs to the whole LLVM architecture   Compiler front-end, for developers, studying clang can bring us a lot of benefits

LLVM compilation process

  • Create a new file and write the following code

int test(int a,int b){
    return a + b + 3;

int main(int argc, const char * argv[]) {
    int a = test(1, 2);
    return 0;
  • You can print the compilation process of the source code through the command

 clang -ccc-print-phases main.m
 //************Compilation process************
 //0 - input file: source file found
+- 0: input, "main.m", objective-c

//1 - preprocessing stage: this process includes macro replacement and header file import
+- 1: preprocessor, {0}, objective-c-cpp-output

//2 - compilation stage: perform lexical analysis, syntax analysis, check whether the syntax is correct, and finally generate IR
+- 2: compiler, {1}, ir

//3 - back end: here, LLVM will optimize one pass by one. Each pass will do something and finally generate assembly code
+- 3: backend, {2}, assembler

//4 - assembly code generation object file
+- 4: assembler, {3}, object

//5 - link: link the required dynamic library and static library to generate executable files
+- 5: linker, {4}, image((mirror file)

//6 - binding: generate corresponding executable files through different architectures
6: bind-arch, "x86_64", {5}, image

LLVM compilation process

The following explains the above processes respectively, where 0 is mainly the input file, that is, find the source file. There is not much explanation here

1, Preprocessing compilation phase

This stage mainly deals with the replacement of macros and the import of header files. You can execute the following commands. After execution, you can see the import of header files and the replacement of macros

//Directly view the replacement results on the terminal
clang -E main.m

//Generate the corresponding file and view the replaced source code
clang -E main.m >> main2.m

It should be noted that:

  • typedef   When aliasing a data type, it is not replaced in the preprocessing phase

  • define will be replaced in the preprocessing stage, so it is often used for code confusion. The purpose is to ensure app security. The implementation logic is to alias the core classes and core methods in the app with system similar names, and then be replaced in the preprocessing stage to achieve the purpose of code confusion

2, Compilation phase

The compilation stage is mainly to analyze and check the morphology and syntax, and then generate the intermediate code IR

1. Lexical analysis

After preprocessing, lexical analysis will be carried out. Here, the code will be cut into token s, such as size brackets, equals sign and string,

  • You can view it through the following command

clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
  • If the header file cannot be found, specify the sdk

clang -isysroot (own SDK (path) -fmodules -fsyntax-only -Xclang -dump-tokens main.m

 clang -isysroot /Applications/ -fmodules -fsyntax-only -Xclang -dump-tokens main.m

2. Syntax analysis

After lexical analysis is completed, syntax analysis is the task. Its task is to verify whether the syntax is correct. On the basis of lexical analysis, word sequences are combined into various phrases, such as programs, statements, expressions, etc., and then all nodes form an Abstract Syntax Tree (AST). The syntax analysis program judges whether the program is correct in structure

  • You can view the results of parsing through the following command

clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
  • If the import header file cannot be found, you can specify the SDK

 clang -isysroot (own SDK (path) -fmodules -fsyntax-only -Xclang -ast-dump main.m

 clang -isysroot /Applications/ -fmodules -fsyntax-only -Xclang -ast-dump main.m

The following is the result of syntax analysis

Among them, it mainly explains the meaning of several keywords

  • -FunctionDecl function

  • -ParmVarDecl parameter

  • -CallExpr calls a function

  • -BinaryOperator operator operator

3. Generate intermediate code IR

After completing the above steps, the intermediate code IR is generated. The Code Generation will traverse the syntax tree from top to bottom and gradually translate it into LLVM IR,

  • You can generate a. ll text file through the following command to view the IR code. In this step, OC code will perform runtime bridging, such as property synthesis, ARC processing, etc

clang -S -fobjc-arc -emit-llvm main.m

//The following is the basic syntax of IR
@ Global identity
% Local identification
alloca Open up space
align memory alignment 
i32 32bit,4 Bytes
store Write memory
load Read data
call Call function
ret return

The following is the generated intermediate code. ll file

The parameters of the test function are interpreted as

Compilation phase-4

  • Of course, IR files can be optimized in OC. The general settings are set in target - Build Setting - Optimization Level. The optimization levels of LLVM are - o0 - O1 - O2 - O3 - OS (the first is the capital letter O). The following is the command to generate intermediate code IR with optimization

clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll

This is the optimized intermediate code

  • After xcode7, start bitcode. Apple will further optimize it to generate. bc intermediate code. We generate. bc code through the optimized IR code

clang -emit-llvm -c main.ll -o main.bc

3, Back end

LLVM is mainly optimized through one Pass at the back end. Each Pass does something and finally generates assembly code

Generate assembly code

  • We generate assembly code from the final. bc or. ll code

 clang -S -fobjc-arc main.bc -o main.s clang -S -fobjc-arc main.ll -o main.s
  • Generating assembly code can also be optimized

clang -Os -S -fobjc-arc main.m -o main.s

At this time, the format of the generated main.s file is assembly code

Generate assembly code

4, Generate target file

The generation of object file is that the assembler inserts the assembly code, converts the assembly code into machine code, and finally outputs the object file

clang -fmodules -c main.s -o main.o

You can view the symbols in main.o through the nm command

$xcrun nm -nm main.o

The following are the symbols in main.o in the file format   Target file

Generate target file

  • _ The printf function is an undefined, external   of

  • undefined means that the symbol cannot be found in the current file_ printf

  • external indicates that the symbol is externally accessible

5, Link

Link mainly links the required dynamic library and static library to generate executable files, where

  • Static libraries are merged with executables

  • Dynamic libraries are independent

The connector links the compiled. O file with the. Dyld. A file to generate a mach-o file

clang main.o -o main

View symbols after links

$xcrun nm -nm main

undefined means that dynamic binding will be performed at run time

  • View by command   What is the format of main? In this case   mach-o executable

6, Bind

Binding mainly generates the corresponding mach-o executable file through different architectures

Clang plug-in development

1. Preparatory work

Due to domestic network restrictions, you need to download the source code of llvm with the help of image. Here is the image link

  • Download LLVM project

git clone
  • Download compiler RT, libcxx and libcxxabi in the projects directory of LLVM

cd ../projects

git clone

git clone 

git clone
  • Install the extra tool under Clang's tools

cd ../tools/clang/tools

git clone

2. LLVM compilation

Since the latest LLVM only supports cmake to compile, you need to install cmake

Installing cmake

  • Check whether cmake is installed in brew. If it is already installed, skip the following steps

brew list
  • Install cmake through brew

brew install cmake

Compile LLVM

There are two compilation methods:

  • Compiling LLVM with xcode

  • Compiling LLVM through ninja

Compiling LLVM with xcode

  • Compile cmake into Xcode project

mkdir build_xcode

cd build_xcode

cmake -G Xcode ../llvm
  • Compiling Clang using xcode

    • Select Create Schemes automatically

      Compile LLVM-1

    • Compile (CMD + B), select ALL_BUILD Secheme for compilation, estimated 1 + hours

Note: all is used here_ The i386 architecture is deprecated. You should update your arch build setting to remove the i386 architecture, but no good solution has been found (to be added later)

Alternative: select Create Schemes manually, and then compile Clang + ClangTooling

Compiling LLVM through ninja

  • If you use Ninja to compile, you also need to install ninja. Use the following command to install ninja

brew install ninja
  • Create a new build in the root directory of LLVM source code_ The Ninja directory will eventually be in build_ Generate ` ` under Ninja directory`

  • Create a new LLVM in the root directory of LLVM source code_ Release directory, and the final compiled file will be in LLVM_ Under the release folder path

cd llvm_build

//Pay attention to dcmake_ INSTALL_ There must be no spaces after prefix
cmake -G Ninja ../llvm -DCMAKE_INSTALL_PREFIX= Installation path (local)/ Users/xxx/xxx/LLVM/llvm_release)
  • Execute compilation and installation instructions at one time


ninja install

3. Create plug-in

  • Create a new plug-in CJLPlugin under / llvm/tools/clang/tools

  • In the CMakeLists.txt file in the / llvm / tools / Lang / Tools Directory, add _lang_subdirectory (CJLPlugin), where CJLPlugin is the plug-in name created in the previous step

  • Create two new files in the CJLPlugin directory, CJLPlugi.cpp   And CMakeLists.txt, and add the following code to CMakeLists.txt

//1. Create in CJLPlugin directory through terminal
touch CJLPlugin.cpp

touch CMakeLists.txt

//2. Add the following code to CMakeLists.txt
add_llvm_library( CJLPlugin MODULE BUILDTREE_ONLY 

Create plug-in-3

  • Next, use cmake to rebuild the Xcode project, and execute the following command in the build_xcode directory

cmake -G Xcode ../llvm
  • Finally, you can see the customized CJLPlugin directory under the Loadable modules directory in the xcode project of LLVM, and then you can write plug-in code in it

    Create plug-in-4

Write plug-in code

In the CJLPlugin.cpp file in the CJLPlugin directory, add the following code

// create by CJL
// 2020/11/15

#include <iostream>
#include "clang/AST/AST.h"
#include "clang/AST/DeclObjC.h"
#include "clang/AST/ASTConsumer.h"
#include "clang/ASTMatchers/ASTMatchers.h"
#include "clang/Frontend/CompilerInstance.h"
#include "clang/ASTMatchers/ASTMatchFinder.h"
#include "clang/Frontend/FrontendPluginRegistry.h"

using namespace clang;
using namespace std;
using namespace llvm;
using namespace clang::ast_matchers;
//Namespace with the same name as the plug-in
namespace CJLPlugin {

//Step 3: callback function after scanning
//4. Custom callback class, inherited from MatchCallback
class CJLMatchCallback: public MatchFinder::MatchCallback {
    //CI delivery path: CreateASTConsumer method parameter in CJLASTAction class - constructor of CJLConsumer - private property of CJLMatchCallback, which is obtained from CJLASTConsumer constructor through constructor
    CompilerInstance &CI;
    //Determine whether it is a user source file
    bool isUserSourceCode(const string filename) {
        //File name is not empty
        if (filename.empty()) return  false;
        //The source code in non xcode is considered to be the user's
        if (filename.find("/Applications/") == 0) return false;
        return  true;

    //Judge whether it should be modified with copy
    bool isShouldUseCopy(const string typeStr) {
        //Judge whether the type is NSString | NSArray | NSDictionary
        if (typeStr.find("NSString") != string::npos ||
            typeStr.find("NSArray") != string::npos ||
            typeStr.find("NSDictionary") != string::npos/*...*/)
            return true;
        return false;
    CJLMatchCallback(CompilerInstance &CI) :CI(CI) {}
    //Override run method
    void run(const MatchFinder::MatchResult &Result) {
        //Obtain relevant nodes through result -- obtain according to the node tag (the tag needs to be consistent with the CJLASTConsumer construction method)
        const ObjCPropertyDecl *propertyDecl = Result.Nodes.getNodeAs<ObjCPropertyDecl>("objcPropertyDecl");
        //Judge that the node has a value and is a user file
        if (propertyDecl && isUserSourceCode(CI.getSourceManager().getFilename(propertyDecl->getSourceRange().getBegin()).str()) ) {
            //15. Get node description information
            ObjCPropertyDecl::PropertyAttributeKind attrKind = propertyDecl->getPropertyAttributes();
            //Gets the type of the node and converts it to a string
            string typeStr = propertyDecl->getType().getAsString();
//            Cout < - --------- got it: "< < typestr < - ---------" < < endl;
            //Judge that copy should be used, but copy is not used
            if (propertyDecl->getTypeSourceInfo() && isShouldUseCopy(typeStr) && !(attrKind & ObjCPropertyDecl::OBJC_PR_copy)) {
                //Use CI to send warning messages
                //Get diagnostic engine through CI
                DiagnosticsEngine &diag = CI.getDiagnostics();
                //An error is reported through the diagnostic engine report, that is, an exception is thrown
                Error location: getBeginLoc node start location
                Error: getCustomDiagID (level, prompt)
                diag.Report(propertyDecl->getBeginLoc(), diag.getCustomDiagID(DiagnosticsEngine::Warning, "%0 - This place is recommended copy!!"))<< typeStr;

//Step 2: scan configuration completed
//3. Custom CJLASTConsumer, inherited from ASTConsumer, is used to listen to the information of AST nodes -- filters
class CJLASTConsumer: public ASTConsumer {
    //Lookup filter for AST nodes
    MatchFinder matcher;
    //Define callback class object
    CJLMatchCallback callback;
    //Create matcherFinder object in constructor
    CJLASTConsumer(CompilerInstance &CI) : callback(CI) {
        //Add a MatchFinder, and each objcPropertyDecl node is bound with an objcPropertyDecl ID (to match the objcPropertyDecl node)
        //Callback is actually rewriting the run method in CJLMatchCallback (the real callback is the callback run method)
        matcher.addMatcher(objcPropertyDecl().bind("objcPropertyDecl"), &callback);
    //Implement two callback methods HandleTopLevelDecl and HandleTranslationUnit
    //After parsing a top-level declaration, call back once (the top-level node is equivalent to a global variable and function declaration)
    bool HandleTopLevelDecl(DeclGroupRef D){
//        Cout < < "parsing..." < < endl;
        return  true;
    //The entire file parses the completed callback
    void HandleTranslationUnit(ASTContext &context) {
//        Cout < < "file parsing completed!" < < endl;
        //Give the context (i.e. AST syntax tree) of the parsed file to the matcher

//2. Inherit PluginASTAction to implement our custom Action, that is, custom AST syntax tree behavior
class CJLASTAction: public PluginASTAction {
    //Overloading ParseArgs and CreateASTConsumer methods
    bool ParseArgs(const CompilerInstance &ci, const std::vector<std::string> &args) {
        return true;
    //Returns an object of type ASTConsumer, where ASTConsumer is an abstract class, the base class
     Resolve the given plug-in command line parameters.
     - param CI Compiler instance for reporting diagnostics.
     - return true if the resolution is successful; otherwise, the plug-in will be destroyed and no action will be taken. The plug-in is responsible for reporting errors using the Diagnostic object of CompilerInstance.
    unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI, StringRef iFile) {
        //Returns the custom CJLASTConsumer, that is, the subclass object of ASTConsumer
         CI be used for:
         - Determine whether the file enables the user's
         - Throw warning
        return unique_ptr<CJLASTConsumer> (new CJLASTConsumer(CI));


//Step 1: register the plug-in and customize the AST syntax tree Action class
//1. Register plug-ins
static FrontendPluginRegistry::Add<CJLPlugin::CJLASTAction> CJL("CJLPlugin", "This is CJLPlugin");

Its principle is mainly divided into three steps

  • Step 1: register the plug-in and customize the AST syntax tree Action class

    • Used to determine whether the file is the user's

    • Used to throw a warning

    • Inherited from PluginASTAction and customized ASTAction, two methods ParseArgs and CreateASTConsumer need to be overloaded. The key method is CreateASTConsumer. There is a parameter CI in the method, that is, the compiled instance object, which is mainly used in the following two aspects

    • To register a plug-in through FrontendPluginRegistry, you need to associate the plug-in name with the custom ASTAction class

  • [step 2] the scanning configuration is completed

    • HandleTopLevelDecl: call back once after parsing a top-level declaration

    • HandleTranslationUnit: a callback that parses the entire file and gives the context (i.e. AST syntax tree) after the file is parsed   matcher

    • It inherits from the ASTConsumer class and implements the custom subclass CJLASTConsumer. It has two parameters: MatchFinder object matcher and CJLMatchCallback custom callback object callback

    • Implement the constructor, mainly to create the MatchFinder object and give the CI bed to the callback object

    • Implement two callback methods

  • Step 3: callback function after scanning

    • 1. Through result, obtain the corresponding node according to the node tag. At this time, the tag needs to be consistent with that in the CJLASTConsumer construction method

    • 2. Judge that the node has a value and is a user file, that is, the private method of isUserSourceCode

    • 3. Get node description information

    • 4. Gets the type of the node and converts it to a string

    • 5. Judge that copy should be used, but copy is not used

    • 6. Get diagnostic engine through CI

    • 7. Reporting errors through the diagnostic engine

    • Inherited from MatchFinder::MatchCallback, custom callback class CJLMatchCallback

    • Defines the private property of CompilerInstance, which is used to receive CI information passed by ASTConsumer class

    • Override run method

Then test the plug-in in the terminal

//Command format
 Self compiled clang File path  -isysroot /Applications/ -Xclang -load -Xclang plug-in unit(.dyld)route -Xclang -add-plugin -Xclang Plug in name -c Source path

/Users/XXX/Desktop/build_xcode/Debug/bin/clang -isysroot /Applications/ -Xclang -load -Xclang /Users/XXXX/Desktop/build_xcode/Debug/lib/CJLPlugin.dylib -Xclang -add-plugin -Xclang CJLPlugin -c /Users/XXXX/Desktop/XXX/XXXX/test demo/testClang/testClang/ViewController.m

Test plug-in

4. Xcode integration plug-in

Loading plug-ins

  • Open the test project in target - > build settings - > other C flags   Add the following

 -Xclang -load -Xclang (.dylib)Dynamic library path -Xclang -add-plugin -Xclang CJLPlugin

Loading plug-ins

Set compiler

  • Because the clang plug-in needs to use the corresponding version to load, if the versions are inconsistent, the compilation will fail

  • Add two user-defined settings in the Build Settings column, CC and CXX

    • CC   The corresponding is the absolute path of the clang compiled by yourself

    • CXX   The corresponding is the absolute path of the clang + + compiled by yourself

Set compiler-2

  • Next, search index in Build Settings and change the Default of enable index wihle building functionality to NO

    Finally, recompile the test project and the following results will appear

Posted by direction on Mon, 20 Sep 2021 03:04:54 -0700