LLVM

Table of Contents

1 Note from LLVM

When doing casting, don't use C++ static_cast or dynamic_cast. Instead, use the following:

  • isa<>: check and return bool
  • cast<>: this is checked cast from base class to derived. will trigger assertion failure if the instance is not class.
  • dyncast<>: this is checking cast, will return nullptr on failure

These functions also have one under clang namespace.

The type inside the cast should NOT be the pointer.

2 Build and install LLVM system

svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
cd llvm/tools
svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
cd llvm/tools/clang/tools
svn co http://llvm.org/svn/llvm-project/clang-tools-extra/trunk extra
cd llvm/projects
svn co http://llvm.org/svn/llvm-project/compiler-rt/trunk compiler-rt
cd ..
mkdir build

Using make

cmake -G "Unix Makefiles" ..
make
make install

Or using Ninja

cmake -G Ninja ..
ninja
ninja install

Note that using Ninja will be slow and consume a lot of memory. The resulting executable is huge (in my case the clang executable is 5G). This is because the debug information is built-in to the executable. So use the release build:

cmake -G Ninja .. -DLLVM_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release

Variables:

  • DLLVM_ENABLE_RTTI=ON: support RTTI

3 try the llvm toolchain

  • clang --help
  • clang file.c -fsyntax-only check for correctness
  • clang file.c -S -emit-llvm -o - print out unoptimized llvm code
  • clang file.c -S -emit-llvm -o - -O3
  • clang file.c -S -O3 -o - output native machine code

from http://llvm.org/docs/GettingStarted.html

Use the following code to test

#include <stdio.h>
int main() {
  printf("hello world\n");
  return 0;
}

It can be used just as gcc:

# one way to run
clang hello.c -o hello
./hello

Compile into llvm bitcode:

clang -O3 -emit-llvm hello.c -c -o hello.bc

Bit code can be inspected by converting back to IR:

# look at the assemble code
llvm-dis < hello.bc | less

Bitcode can be run directly:

lli hello.bc

Alternatively, you can compile LLVM bitcode into assembly file, then assemble and run it

llc hello.bc -o hello.s
gcc hello.s -o hello.native
./hello.native

4 Use the framework

4.1 Project setup

project directory should look like this

pass-project/CMakeLists.txt
pass-project/mypass/CMakeLists.txt
pass-project/mypass/MyPass.cpp

The top level CMakeLists.txt will configure the environment, including finding the LLVM package

find_package(LLVM REQUIRED CONFIG)

add_definitions(${LLVM_DEFINITIONS})
include_directories(${LLVM_INCLUDE_DIRS})

add_definitions(-std=c++11) # patch: used c++ 11
# patch: I didn't compile LLVM with rtti,
# so I need to disable rtti when compiling pass
# or I will get error when opt -load my pass
SET(CMAKE_CXX_FLAGS "-Wall -fno-rtti")

add_subdirectory(hellopass)

The sub-directory CMakeLists.txt file will tell the pass source files

add_library(HebiPass MODULE MyPass.cpp)

The pass source file should look like this:

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"

using namespace llvm;

namespace {
  struct Hello : public FunctionPass {
    static char ID;
    Hello() : FunctionPass(ID) {}
    bool runOnFunction(Function &F) override {
      errs() << "Hello: ";
      errs().write_escaped(F.getName()) << "\n";
      return false;
    }
  };
}
char Hello::ID = 0;
static RegisterPass<Hello> X("hello", "Hello World Pass", false, false);

Compile it into the shared library. To run it

  1. first load the library by -load /path/to/so/file.
  2. -hello means to run this path. The name is given in the source file by RegisterPass class.
cmake .
make # output mypass/libHebiPass.so
opt -load ./mypass/libHebiPass.so -hello < hello.bc

It first load the library

4.2 Passes

4.2.1 Various passes

All these functions return false indicating they do not modify the code, true otherwise.

class ModulePass {
virtual bool runOnModule(Module &M) = 0;
}
class FunctionPass {
virtual bool runOnFunction(Function &F) = 0;
}
class BasicBlockPass {
virtual bool runOnBasicBlock(BasicBlock &BB) = 0;
}

4.2.2 register a pass

The four parameters:

  1. command line option to invoke the path (-hello)
  2. Help message
  3. If a pass walks CFG without modifying it then the third argument is set to true;
  4. if a pass is an analysis pass, for example true for dominator tree pass
  static RegisterPass<Hello> X("hello", "Hello World Pass",
                               false /* Only looks at CFG */,
                               false /* Analysis Pass */);

4.2.3 Pass Interaction

MyPass::getAnalysisUsage will set the required passes. It also tells what information is modified (or preserved) by this pass.

void MyPass::getAnalysisUsage(AnalysisUsage &AU) const {
  AU.setPreservesAll();
  // AU.setPreservesCFG();
  AU.addRequired<LoopInfoWrapperPass>();
}

Inside that Pass, you can use getAnalysis to get that pass itself. In this example, getLoopInfo is a method of LoopInfoWrapperPass.

bool MyPass::runOnFunction(Function &F) {
// this must be in the Pass class
  LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
  //...
}

4.3 LLVM template

isa:

if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
    return true;

cast: This is a checked cast. If the cast is not valid, assertion failure.

cast<Instruction>(V)->getParent()

dyn_cast: This is a checking cast. If not valid, NULL pointer is returned.

if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) {}

4.4 Values

4.4.1 Function

Iterating basic blocks:

// func is a pointer to a Function instance
for (Function::iterator it = func->begin(), end = func->end(); it != end; ++it) {
  BasicBlock *bb = &*i;
}

Iterating instructions directly:

// f is a pointer to a Function instance
for (inst_iterator it=inst_begin(f), end=inst_end(f);it!=end;++it) {
  Instruction *inst = &*it;
}

4.4.2 BasicBlock

// blk is a pointer to a BasicBlock instance
for (BasicBlock::iterator it=blk->begin(), end=blk->end();it!=end;++it) {
  Instruction *inst = &*it;
}

4.5 User

Get users of a value:

  Function *F;
  for (User *U : F->users()) {
    if (Instruction *Inst = dyn_cast<Instruction>(U)) {
      errs() << "F is used in instruction:\n";
      errs() << *Inst << "\n";
    }

Get used values of an instruction:

Instruction *pi;
for (Use &U : pi->operands()) {
  Value *v = U.get();
}

4.6 CFG

CFG consists of basic blocks.

#include "llvm/Support/CFG.h"
BasicBlock *BB = ...;

for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
  BasicBlock *Pred = *PI;
}

5 Reference