Table of Contents

1 GNU C Preprocessor

The followings are done separately:

  1. Three transformations
    • all comments are replaced with single spaces
    • backslash-newline combo is removed
    • predefined macros are replaced
  2. include header files
  3. macro expansion
  4. conditional compilation
  5. line control

It seems that these are not necessarily sequential. You can invoke any of them.

1.1 Invoking

  • -C: do not discard comments
  • -nostdinc: do not search standard system directory for header files, use only -I directories.
  • D, U, -undef: predefine or un-predefine macros. Note that this is pre-defined macros only, will not affect the macros in code
  • -dM: instead of normal preprocessed program, output all the #define directives (including both pre-defined and user-defined)
  • -dD: output preprocessed + user-defined define directive
  • -dI: output preprocessed + #include directive
  • -M, -MM, MD, MMD: There are also some commands dealing with source code dependencies

1.1.1 TODO The line control might be very useful for Helium

1.1.2 TODO nostdinc, what is the standard?

1.2 Macro

The simple form is a single identifier. The input is scanned sequentially. Each time a substitution happens, the result is added to the front of remaining text, allowing nested expansion. This also means, if later the nested macro is redefined to another value, the new value is always used.

In terms of macros with arguments, the substitution checks for comma to separate parameters. The parameters can be any arbitrary expression that does not contain comma. However, a single parameter can have comma. This is done by matching parenthesis for each parameter, i.e. the parameter must have balanced parenthesis, and a comma inside parenthesis does not end the parameter. However, the brackets and braces are not checked for balance. Like the simple macros, the output is added to the front of remaining text and get expanded accordingly.

When defining a argument macro, the argument names (should be specifically the open parenthesis) must follow macro name immediately without any space.

2 Source Code Reading

2.1 How the invoke of clang works through the APIs

Cannot find

2.2 TODO How the recursive descent parser is implemented

2.2.1 TODO AST hierarchy

2.2.2 TODO How it is different from the yacc parsers

2.2.3 TODO Is it feasible to rewrite a Recursive Descent Parser in lisp in a few days? Why it is better?

2.3 TODO How the error recovery works?

2.4 TODO Preprocessor?

2.5 TODO Manage the symbol table

2.6 TODO type system

3 Note from LLVM

When doing casting, don't use C++ static_cast or dynamic_cast. Instead, use the following:

  • isa<>: check and return bool
  • cast<>: this is checked cast from base class to derived. will trigger assertion failure if the instance is not class.
  • dyncast<>: this is checking cast, will return nullptr on failure

These functions also have one under clang namespace.

The type inside the cast should NOT be the pointer.

4 clang::tooling

4.1 clang::tooling::ClangTool

ClangTool::ClangTool(const CompilationDatabase &Compilations, 
                     ArrayRef< std::string > SourcePaths);
ClangTool::run (ToolAction *Action);
// directly build ASTs
ClangTool::buildASTs(std::vector<std::unique_ptr<ASTUnit> > &ASTs);

4.2 clang::tooling::CompilationDatabase

  • clang::tooling::CompilationDatabase
  • clang::tooling::JSONCompilationDatabase
static CompilationDatabase::loadFromDirectory(StringRef BuildDirectory);
class JSONCompilationDatabase : public CompilationDatabase {};
static JSONCompilationDatabase::loadFromFile(StringRef FilePath);

4.3 clang::tooling::runToolOnCode

  • runToolOnCode
  • runToolOnCodeWithArgs
  • buildASTFromCode
  • buildASTFromCodeWithArgs
using namespace clang::tooling;
bool runToolOnCode(clang::FrontendAction *ToolAction,
                   const Twine &Code,
                   const Twine &FileName="");
bool runToolOnCodeWithArgs(clang::FrontendAction *ToolAction,
                           const Twine &Code,
                           const std::vector<std::string> &Args,
                           const Twine &FileName="");
buildASTFromCode(const Twine &Code,
                 const Twine &FileName="");
buildASTFromCodeWithArgs(const Twine &Code,
                         const std::vector<std::string> &Args,
                         const Twine &FileName="");

4.4 TODO clang::tooling::Range

4.5 TODO clang::tooling::Replacement

  • clang::tooling::Replacement
  • clang::tooling::Replacements

5 Compiler

5.1 CompilerInstance

// high level invocation
bool ExecuteAction(FrontendAction &act);
bool hasInvocation();
CompilerInvocation& getInvocation();
void setInvocation(std::shared_ptr<CompilerInvocation> value);

// options
DiagnosticOptions& getDiagnosticOpts();
FrontendOptions& getFrontendOpts();
HeaderSearchOptions& getHeaderSearchOpts();
LangOptions& getLangOpts();
PreprocessorOptions& getPreprocessorOpts();

// diagnoatic
bool hasDiagnostics();
DiagnosticsEngine& getDiagnostics();
void setDiagnostics(DiagnosticsEngine *value);

// managers
bool hasFileManager();
FileManager& getFileManager();
void setFileManager(FileManager *value);
bool hasSourceManager();
SourceManager& getSourceManager();
void setSourceManager(SourceManager *value);

bool hasPreprocessor();
Preprocessor& getPreprocessor();
void setPreprocessor(std::shared_ptr<Preprocessor> value);

bool hasASTContext();
ASTContext& getASTContext();

bool hasASTConsumer();
ASTConsumer& getASTConsumer();

// construction
void createDiagnostics();
void createFileManager();
void createSourceManager(FileManager &FileMgr);
void createPreprocessor(TranslationUnitKind TUKind);

5.2 Preprocessor

DiagnosticEngine &getDiagnostics();
FileManager& getFileManager();
SourceManager& getSourceManager();
IdentifierTable& getIdentifierTable();

// macros
bool isMacroDefined(StringRef id);
bool isMacroDefined(const IdentifierInfo *ii);
MacroDefinition getMacroDefinition(const IdentifierInfo *ii);
MacroInfo *getMacroInfo(const IdentifierInfo *ii);
macro_iterator macro_begin(bool IncludeExternalMacros=true);
macro_iterator macro_end(bool IncludeExternalMacros=true);

typedef MacroMap::const_iterator macor_iterator;
typedef llvm::DenseMap<const IdentifierInfo *, MacroState> MacroMap;

5.2.1 MacroDefinition

MacroInfo *getMacroInfo();

5.2.2 MacroInfo

// this does not include the "#define"
SourceLocation getDefinitionLoc();
SourceLocation getDefinitionEndLoc();

bool isFunctionLike();
bool isObjectLike();
bool isC99Varargs();
bool isGNUVarargs();
bool isVariadic();
bool isBuiltinMacro();

The builtin cannot distinguish most of the "builtin" macros. So in order to detect user defined macros, get the source manager, and check isWrittenInMainFile (not isInMainFile) for the source location.

6 General

6.1 IdentifierInfo

StringRef getName();
bool hasMacroDefinition();

6.2 clang::ASTUnit

ASTContext &ASTunit::getASTContext();

6.3 clang::ASTContext

SourceManager &getSourceManager();
const LangOptions &getLangOpts();
TranslationUnitDecl *getTranslationUnitDecl();
DiagnosticsEngine &getDiagnostics();
FullSourceLoc getFullLoc(SourceLocation loc);

6.4 clang::SourceManager

FileID getMainFileID();
FileEntry *getFileEntryForID(FileID FID);
SourceLocation getSpellingLoc(SourceLocation loc);
std::pair<FileID, unsigned> getDecomposedLoc(SourceLocation loc);
std::pair<FileID, unsigned> getDecomposedSpellingLoc(SourceLocation loc);
std::pair<FileID, unsigned> getDecomposedIncludedLoc(FileID FID);
bool isInMainFile(SourceLocation loc);
// PresumedLoc
bool isInFileID(SourceLocation loc, FileID FID);
// SpellingLoc
bool isWrittenInMainFile(SourceLocation loc);
  • clang::FileEntry
StringRef getName();

6.5 Location

  • clang::SourceRange
SourceLocation getBegin();
SourceLocation getEnd();
bool operator==(const SourceRange &X);
bool operator!=(const SourceRange &X);
  • clang::SourceLocation: no interesting member functions. Use SourceManager to decode it. But typically we are not going to use SourceManager, instead, use ASTContext to decode it into FullSourceLoc.
  • clang::FullSourceLoc : public clang::SourceLocation
bool hasManager();
SourceManager& getManager();
unsigned getSpellingLineNumber();
unsigned getSpellingColumnNumber();
unsigned getLineNumber();
unsigned getColumnNumber();
FileEntry *getFileEntry();

7 clang::Type

The raw type will be whatever appeared in the source code. If a type is a typedef to another type (may be pointer), then the "type" will not record the pointer information.

7.1 canonical type

Every instance of type has a canonical type pointer.

  • If the type is a simple primitive type, the pointer points to itself
  • If any part of the type has typedef, the pointer will point to a type instance that is equivalent to it but without typedefs. You can check whether two types are the same by comparing this pointer.

You should not use isa/cast/dyncast on types (e.g. isa<PointerType>(expr->getType())). The reason is it is not canonical. So use help functions instead: expr->getType()->isPointerType().

7.2 QualType

The type and its qualifiers (const, volatile, restrict) are seperate. That is the QualType. It is designed to be small and pass-by-value. It is essentially a pair of (Type*, bits) where the bits stores the qualifiers.

This helps making only one type for each kind, e.g. int, const int, volatile const int.

const Type* getTypePtr() const;
const Type& operator*() const;
const Type* operator->() const;

SplitQualType split() const;
class SplitQualType {
  const Type *Ty;
  Qualifiers Quals;

bool isCanonical();
QualType getCanonicalType() const;
bool isNull();

bool isConstQualified();
bool isVolatileQualified();
bool isRestrictQualified();
bool hasLocalQualifiers();
bool hasQualifiers();

Qualifiers getQualifiers();

QualType withConst();
QualType withVolatile();
QualType withRestrict();

void dump();
std::string getAsString();

static std::string getAsString(SplitQualType split);
static std::string getAsString(const Type *ty, Qualifiers qs);

8 clang::Decl

SourceLocation getLocStart();
SourceLocation getLocEnd();
SourceLocation getLocation();
Kind getKind();
DeclContext *getDeclContext();

8.1 clang::DeclContext

All classes inherited from it:

  • clang::BlockDecl
  • clang::FunctionDecl
  • clang::TagDecl
    • clang::EnumDecl
    • clang::RecordDecl
  • clang::TranslationUnitDecl
decl_range decls();
decl_iterator decls_begin();
decl_iterator decls_end()

This should provide all information to get the children nodes.

class clang::DeclContext::decl_iterator {
  typedef Decl* value_type;
  typedef const value_type& reference;
  reference operator*() const;
  value_type operator->() const;
  decl_iterator& operator++();
  decl_iterator operator++(int);
  friend operator==(decl_iterator x, decl_iterator y);
  friend operator!=(decl_iterator x, decl_iterator y);

typedef llvm::iterator_range<decl_iterator> clang::DeclContext::decl_range;

// OK, now the reference of llvm::iterator_range
class llvm::iterator_range<IteratorT> {
  IteratorT begin() const;
  IteratorT end() const;

8.2 clang::TranslationUnitDecl

It is also inherited from DeclContext.

8.3 clang::BlockDecl

Like a unamed FunctionDecl. Also inherited from DeclContext.

ArrayRef<ParmVarDecl*> parameters();
param_iterator param_begin();
param_iterator param_end();

8.4 clang::NamedDecl

IdentifierInfo *getIdentifier();
StringRef getName();
std::string getNameAsString();

8.4.1 clang::LabelDecl

LabelStmt *getStmt();
SourceRange getSourceRange();

8.4.2 clang::TypeDecl

No interesting methods.

  • clang::TypeDecl
    • clang::TypedefNameDecl
      • clang::TypedefDecl: No interesting methods
    • clang::TagDecl
      • clang::EnumDecl
      • clang::RecordDecl TagDecl
  • struct
  • union
  • class
  • enum
typedef TagTypeKind TagKind;
enum TagTypeKind {
SourceRange getBraceRange();
SourceLocation getInnerLocStart();
SourceLocation getOuterLocStart();
SourceRange getSourceRange();
bool isThisDeclarationADefinition();
TagDecl *getDefinition();
StringRef getKindName();
TagKind getTagKind();
bool isStruct();
bool isInterface();
bool isClass();
bool isUnion();
bool isEnum(); clang::EnumDecl
enumerator_range enumerators();
enumerator_iterator enumerator_begin();
enumerator_iterator enumerator_end(); clang::RecordDecl
  • struct
  • union
  • class
field_range fields();
field_iterator field_begin();
field_iterator field_end();
bool field_empty();

8.4.3 clang::ValueDecl

Declaration of either

  • a variable
  • a function
  • an enum constant
QualType getType();
  • clang::ValueDecl
    • clang::EnumConstantDecl:
    • clang::DeclaratorDecl
      • clang::FunctionDecl
      • clang::FieldDecl
      • clang::VarDecl clang::EnumConstantDecl

An instance of this object exists for each enum constant that is defined.

Expr* getInitExpr();
const llvm::APSInt &getInitVal();
SourceRange getSourceRange(); clang::DeclaratorDecl
TypeSourceInfo *getTypeSourceInfo();
SourceLocation getInnerLocStart();
SourceLocation getOuterLocStart();
SourceRange getSourceRange();
SourceLocation getLocStart();
NestedNameSpecifier *getQualifier();
SourceLocation getTypeSpecStartLoc(); clang::FunctionDecl
  • Also inherit from clang::DeclContext
SourceRange getSourceRange();
SourceRange getReturnTypeSourceRange();
DeclarationNameInfo getNameInfo();

FunctionDecl *getDefinition();
Stmt *getBody();
// even if it is only a declaration, the body is still available
bool isThisDeclarationADefinition();
bool isMain();
ArrayRef<ParmVarDecl*> parameters();
bool param_empty();
param_iterator param_begin();
param_iterator param_end();
size_t param_size();
ParmVarDecl *getParamDecl(unsigned i);
QualType getReturnType();


DeclarationName getName();
SourceLocation getBeginLoc();
SourceLocation getEndLoc();
SourceRange getSourceRange()
SourceLocation getLocStart();
SourceLocation getLocEnd(); clang::FieldDecl
unsigned getFieldIndex();
bool isBitField();
bool hasInClassinitializer();
Expr *getInClassInitializer();
RecordDecl* getParent();
SourceRange getSourceRange(); clang::VarDecl

Represent a variable declaration or definition.

SourceRange getSourceRange();
StorageClass getStorageClass();
bool isStaticLocal();
bool hasExternalStorage();
bool hasGlobalStorage();
bool isLocalVarDecl();
bool isLocalVarDeclOrParm();
bool isFunctionOrMethodVarDecl();
DefinitionKind isThisDeclarationADefinition();
VarDecl *getDefinition();
bool isFileVarDecl();
const Expr *getAnyInitializer();
bool hasInit();
Expr *getInit();
  • clang::ParmVarDecl : clang::VarDecl
SourceRange getSourceRange();
unsigned getFunctionScopeIndex();
bool hasDefaultArg();
Expr *getDefaultArg();
SourceRange getDefaultArgRange();

9 clang::Stmt

SourceRange getSourceRange();
SourceLocation getLocStart();
SourceLocation getLocEnd();
void dump();
void dumpColor();
void dumpPretty(ASTContext &Context);
void viewAST(); // via graphviz
child_range children();
child_iterator child_begin();
child_iterator child_end();

All subclasses has

SourceLocation getLocStart();
SourceLocation getLocEnd();
child_range children();

9.1 Single

9.1.1 clang::BreakStmt

SourceLocation getBreakLoc();

9.1.2 clang::ReturnStmt

SourceLocation getReturnLoc();
Expr *getRetValue();

9.1.3 clang::ContinueStmt

SourceLocation getContinueLoc();

9.2 Conditional

9.2.1 clang::IfStmt

Stmt *getInit();
Expr *getCond();
Stmt *getThen();
Stmt *getElse();

SourceLocation getIfLoc();
SourceLocation getElseLoc();

9.2.2 clang::SwitchCase

Has two subclasses

  • clang::CaseStmt
  • clang::DefaultStmt
SwitchCase *getNextSwitchCase();
SourceLocation getKeywordLoc();
SourceLocation getColonLoc();
Stmt *getSubStmt(); // ??


SourceLocation getCaseLoc();
SourceLocation getEllipsisLoc(); // ??
SourceLocation getColonLoc();

Expr *getLHS();
Expr *getRHS();
Stmt *getSubStmt();


Stmt *getSubStmt();
SourceLocation getDefaultLoc();
SourceLocation getColonLoc();

9.2.3 clang::SwitchStmt

VarDecl *getConditionVariable();
DeclStmt *getConditionVariableDeclStmt();
Stmt *getInit();
Expr *getCond();
Stmt *getBody();
SwitchCase *getSwitchCaseList();

SourceLocation getSwitchLoc();

9.2.4 clang::LabelStmt

LabelDecl *getDecl();
const char *getName();
Stmt *getSubStmt();

9.2.5 clang::GotoStmt

LabelDecl *getLabel();
SourceLocation getGotoLoc();
SourceLocation getLabelLoc();

9.3 loop

9.3.1 clang::DoStmt

Expr *getCond();
Stmt *getBody();
SourceLocation getDoLoc();
SourceLocation getWhileLoc();
// why no LParen??
SourceLocation getRParenLoc();

9.3.2 clang::ForStmt

VarDecl *getConditionVariable();
const DeclStmt *getConditionVariableDeclStmt();

Stmt *getInit();
Expr *getCond();
Expr *getInc();

Stmt *getBody();

SourceLocation getForLoc();
SourceLocation getRParenLoc();
SourceLocation getLParenLoc();

9.3.3 clang::WhileStmt

VarDecl *getConditionVariable();
const DeclStmt *getConditionVariableDeclStmt();

Expr *getCond();
Stmt *getBody();
SourceLocation getWhileLoc();

9.4 Other

9.4.1 clang::CompoundStmt

bool body_empty();
unsigned size();
body_range body();
body_iterator body_begin();
body_iterator body_end();
Stmt *body_front();
Stmt *body_back();
reverse_body_iterator body_rbegin();
reverse_body_iterator body_rend();

SourceLocation getLBracLoc();
SourceLocation getRBracLoc();

9.4.2 clang::DeclStmt

This is adapter class for mixing declarations with statements and expressions.

bool isSingleDecl();
Decl *getSingleDecl();
decl_range decls();
decl_iterator decl_begin();
decl_iterator decl_end();
reverse_decl_iterator decl_rbegin();
reverse_decl_iterator decl_rend();

typedef DeclGroupRef::iterator clang::DeclStmt::decl_iterator;
typedef Decl** clang::DeclGroupRef::iterator;

9.4.3 TODO clang::Expr

This is a big topic. In a seperate outline.

It is a subclass of Stmt, this allows an expression to be transparently used in any place a Stmt is required.

10 clang::Expr

SourceLocation getExprLoc();
bool isLValue();
bool isXValue();
bool isGLValue();

ExprValueKind getValueKind();
bool isIntegerConstantExpr(const ASTContext &ctx);

10.1 General Tips

Got an expr, how to get the variables inside it, and refer to the

  • type
  • where defined

Some examples

  • b>0
    • BinaryOperator
      • ImplicitCastExpr
        • DeclRefExpr ParmVar (Var) b
      • IntegerLiteral
  • a=b+c
    • BinaryOperator =
      • DeclRefExpr Var a
      • BinaryOperator +
        • ImplicitCastExpr L2R
          • DeclRefExpr Var b
        • ImplicitCastExpr L2R
          • DeclRefExpr Var c
  • a+=b*c
    • CompoundAssignOperator +=
      • DeclRefExpr Var a
      • BinaryOperator
        • ImplicitCastExpr L2R
          • DeclRefExpr Var b
        • ImplicitCastExpr L2R
          • DeclRefExpr Var c
  • a++
    • UnaryOperator ++
      • DeclRefExpr Var a
  • foo(a,b)
    • CallExpr
      • ImplicitCastExpr FunctionToPointerDecay
        • DeclRefExpr Function 'foo'
      • ImplicitCastExpr
        • DeclRefExpr a
  • a=foo() + bar()
    • BinaryOperator =
      • DeclRefExpr a
      • BinaryOperator +
        • CallExpr
          • ImplicitCastExpr
            • DeclRefExpr Function bar
        • CallExpr
          • ImplicitCastExpr
            • DeclRefExpr Function bar
  • a=b*(b+c)
    • BinaryOperator =
      • ImplicitCastExpr
        • DeclRefExpr b
      • ParenExpr
        • BinaryOperator +
          • ImplicitCastExpr DeclRefExpr
          • ImplicitCastExpr DeclRefExpr
  • a.mem
    • ImplicitCastExpr L2R
      • MemberExpr .mem
        • DeclRefExpr Var x "struct A"
  • p->mem
    • ImplicitCastExpr L2R
      • MemberExpr ->mem
        • ImplicitCastExpr L2R
          • DeclRefExpr Var px "struct A *"

10.2 clang::CallExpr

Expr *getCallee();
Decl *getCalleeDecl();
FunctionDecl *getDirectCallee();
unsigned getNumArgs();
Expr **getArgs();
Expr *getArg(unsigned Arg);

arg_range arguments();
arg_iterator arg_begin();
arg_iterator arg_end();

unsigned getNumCommas();
unsigned getBuiltinCallee();

QualType getCallReturnType(const ASTContext &Ctx);
SourceLocation getRParenLoc();

10.3 clang::BinaryOperator

SourceLocation getExprLoc();
SourceLocation getOperatorLoc();
Opcode getOpcode();
Expr *getLHS();
Expr *getRHS();

StringRef getOpcodeStr();

bool isAdditiveOp();
bool isShiftOp();
bool isBitwiseOp();
bool isRelationalOp();
bool isEqualityOp();
bool isComparisonOp();
bool isLogicalOp();
bool isAssignmentOp();
bool isCompoundAssignmentOp();
bool isShiftAssignOp();

10.3.1 clang::CompoundAssignOperator

Like +=, -=, etc. Don't have interesting methods though.

10.4 clang::CastExpr

It has two children class

  • clang::ExplicitCastExpr
  • clang::ImplicitCastExpr

ImplicitCastExpr appears very often because it represent many type of cast. For example

  • call a function needs to use the cast FunctionToPointerDecay
  • use a value in the righ hand side will need the cast LValueToRValue

The methods of the children classes are not interesting at all. So it is convenient to use these methods:

CastKind getCastKind();
const char *getCastKindName();
Expr *getSubExpr();
Expr *getSubExprAsWritten();

10.5 clang::ParenExpr

This is a paren expr. It does not include the condition of a if-stmt, etc.

Expr *getSubExpr();

SourceLocation getLParen();
SourceLocation getRParen();

SourceLocation getLocStart();
SourceLocation getLocEnd();

10.6 clang::MemberExpr

This is the member access operator (. and ->). It is for struct and union members.

Expr *getBase();
// get the member declaration to which this expression refers
ValueDecl *getMemberDecl();

DeclarationNameInfo getMemberNameInfo();
SourceLocation getOperatorLoc();
bool isArrow();
SourceLocation getMemberLoc();
SourceLocation getLocStart();
SourceLocation getLocEnd();
SourceLocation getExprLoc();

10.7 clang::UnaryOperator

except sizeof and alignof, but include

  • postinc/postdec and various extensions
Opcode getOpcode();
Expr *getSubExpr();
SourceLocation getOperatorLoc();
bool isPrefix();
bool isPostfix();
bool isIncrementOp();
bool isDecrementOp();
bool isIncrementDecrementOp();
bool isArithmeticOp();

SourceLocation getLocStart();
SourceLocation getLocEnd();
SourceLocation getExprLoc();

static bool isPostfix(Opcode Op);
static StringRef getopcodeStr(Opcode Op);

10.8 clang::DeclRefExpr

A reference to a declared variable, function, enum, etc.

ValueDecl *getDecl();
DeclarationNameInfo getNameInfo();
SourceLocation getLocation();
SourceLocation getLocStart();
SourceLocation getLocEnd();

10.9 clang::ConditionalOperator

  • clang::Expr
    • clang::AbstractConditionalOperator
      • clang::ConditionalOperator
      • clang::BinaryConditionalOperator (not interesting)

This is ?: ternary operator.

Expr *getCond();
Expr *getTrueExpr();
Expr *getFalseExpr();

// don't know what this is
Expr *getLHS();
Expr *getRHS();

SourceLocation getLocStart();
SourceLocation getLocEnd();

10.10 Other Not Interesting Ones

10.10.1 clang::LambdaExpr

10.10.2 clang::IntegerLiteral

10.10.3 clang::ImplicitValueInitExpr

10.10.4 clang::InitListExpr (C++)

10.10.5 clang::ParenListExpr

10.10.6 clang::StmtExpr

This is the GNU Statement Expression extension: ({int X=4;X;}). Not very useful for me.

10.10.7 clang::StringLiteral

10.10.8 clang::TypoExpr

11 Topics

11.1 Clang AST to source code

11.1.1 clang::Rewriter

#include "clang/Rewrite/Core/Rewriter.h"

SourceManager &getSourceMgr();
void setSourceMgr(SourceManager &SM, const LangOptions &LO);

int getRangeSize(SourceRange range);
std::string getRewrittenText(SourceRange range);

bool InsertText(SourceLocation loc, StringRef str, bool InsertAfter=true, bool indentNewLines=false);
bool InsertTextAfter(SourceLocation loc, StringRef str);
bool InsertTextAfterToken(SourceLocation loc, StringRef str);
bool InsertTextBefore(SourceLocation loc, StringRef str);

bool RemoveText(SourceLocation start, unsigned length);
bool RemoveText(SourceRange range);

bool ReplaceText(SourceLocation start, unsigned OrigLength, StringRef NewStr);
bool ReplaceText(SourceRange range, StringRef NewStr);
bool ReplaceText(SourceRange range, SourceRange replacementRange);

bool IncreaseIndentation(SourceRange range, SourceLocation parentIndent);

RewriteBuffer &getEditBuffer(FileID FID);
const RewriteBuffer *getRewriteBufferFor(FileID FID) const;

buffer_iterator buffer_begin();
buffer_iterator buffer_end();
bool overwriteChangedFiles();

Usage example

Rewriter rewriter;
rewriter.setSourceMgr(source_manager, )

11.2 Create AST

11.2.1 Using Compilation Database

Using compilation database can make sure clang uses the right flags. This is usually the include path, but also some flags like -std=c99.

In order to get the compilation database file (compilecommands.json):

  • for cmake project, runs cmake with -DCMAKE_EXPORT_COMPILE_COMMANDS=ON will do the job
  • for non-cmake project, use Bear. It runs the ordinary build and intercept the exec calls issued by the build tools. The command to run is bear make instead of make

Thus, for all the projects, it is possible to get the compilation database as long as:

  • cmake is able to finish success (no missing dependencies)
  • make can finish

As an example, to use the data base, invoke clang tooling by:

CompilationDatabase *db = CompilationDatabase::loadFromDirectory("/path/to/build");
// or use the child class
JSONCompilationDatabase *json_db = JSONCompilationDatabase::loadFromFile("/path/to/compile_commands.json");
// directly use
ClangTool tool(*db, ["a.c", "b.c"]);
// or use the command line arguments
// usage: exe -p /path/to/build a.c b.c
static cl::OptionCategory MyToolCategory("my-tool options");
CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
ClangTool tool(OptionsParser.getCompilations(),

11.2.2 From Code String

Of cource using the database will introduce overhead to obtain the database. We may only care about the header path

  • running runToolOnCode will use -fsyntax-only
  • system header files: I don't think libTooling default will use them, so be sure to use
  • local headers: get all the folders, and add -Ixxx flags

The runToolOnCode can do this. It accepts a FrontendAction, and typically calls a RecursiveASTVisitor. Instead of runToolOnCode, there's also a buildASTFromCode family:

11.3 LibTooling

11.3.1 Project Setup Main File

First of all, get the CMakeLists.txt setup:

The first line:

cmake_minimum_required(VERSION 3.0)

Setting directory to lib and bin


Other setup


Thread library:

find_package (Threads)

LLVM library configuration:

message(STATUS "Using LLVMCOnfig.cmake in: ${LLVM_DIR}")

Clang library setup

find_package(Clang REQUIRED CONFIG)

Trouble shooting setup

# Otherwise error: undefined reference to typeinfo for xxx

link library

link_libraries(clang clangTooling clangFrontend clangFrontendTool)
link_libraries(libclang gtest)

Add sub-directories

add_subdirectory (src)
add_subdirectory (test) Sub-directory files

src/CMakeLists.txt to add libraries, executables

add_library (Sqr sqr.cpp sqr.h)
add_executable (demo main.cpp)
target_link_libraries (demo Sqr)

add_executable(ast ast.cpp)
add_executable(token token.cpp)
add_executable(rewriter rewriter.cpp)


The only requirement is to have enable_testing before add_test. The command can be in src level list if no test source files.

add_test(NAME toktest COMMAND hetok ../test/a.c)
add_test(NAME MyTest COMMAND Test)

11.3.2 Header files

Some representative header files:

#include "clang/AST/ASTConsumer.h"
#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Frontend/CompilerInstance.h"
#include "clang/Frontend/FrontendAction.h"
#include "clang/Tooling/Tooling.h"
#include "clang/Frontend/FrontendActions.h"
#include "llvm/Support/CommandLine.h"
#include "clang/Tooling/CommonOptionsParser.h"

11.3.3 Entry Point

The entry point is creating the tooling::ClangTool class. Just pass argc/v into it. The command line option -- at the end to invoke the tool will not trying to find compilation database.

int main(int argc, const char **argv) {
  CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
  ClangTool Tool(OptionsParser.getCompilations(), OptionsParser.getSourcePathList());<MyAction>().get());

The Tool would run on some "action". This is our main logic. The action derives from ASTFrontendAction, and override the CreateASTConsumer class.

class MyAction : public clang::ASTFrontendAction {
  virtual std::unique_ptr<clang::ASTConsumer>
  CreateASTConsumer(clang::CompilerInstance &Compiler, llvm::StringRef InFile) {
    return std::unique_ptr<clang::ASTConsumer>
      (new MyConsumer(&Compiler.getASTContext()));

The Consumer would derive from ASTConsumer and override HandleTranslationUnit. This function is called when the whole translation unit is parsed. This provides the entry point of the AST by the top most decl by Context.getTranslationUnitDecl().

The visitor will automatically call WalkUpFromXXX(x) to recursively visit child nodes of x returning false of TraverseXXX or WalkUpFromXXX will terminate the traversal. By default this will be a pre-order traversal. Calling a method to change to post-order.

class MyConsumer : public clang::ASTConsumer {
  explicit MyConsumer(ASTContext *Context)
    : Visitor(Context) {}
  virtual void HandleTranslationUnit(clang::ASTContext &Context) {
  MyVisitor Visitor;

The visitor itself implement what to do with each AST node. Override the list of VisitXXX method for each type of AST node.

class TokenVisitor
  : public RecursiveASTVisitor<TokenVisitor> {
  explicit TokenVisitor(ASTContext *Context)
    : Context(Context) {}
  bool VisitCXXRecordDecl(CXXRecordDecl *Declaration) {}
  bool VisitFunctionDecl(FunctionDecl *func_decl) {}
  ASTContext *Context;

11.3.4 Location

  • Decl::getLocStart -> SourceLocation loc
  • context->getFullLoc(loc) -> FullSourceLoc full
  • full.getSpellingLinenumber

11.3.5 APIs


SourceLocation getLocStart ();
SourceLocation getLocEnd ();
virtual SourceRange getSourceRange ();


FullSourceLoc getFullLoc (SourceLocation Loc) const
SourceManager& getSourceManager ()


unsigned getSpellingLineNumber (bool *Invalid=nullptr) const
unsigned getSpellingColumnNumber (bool *Invalid=nullptr) const
FileID  getFileID () const


FileManager& getFileManager () const;
FileID getMainFileID () const; // this file being processed
const FileEntry *getFileEntryForID (FileID FID) const;

11.4 Use As Command

dump ast
filter to only dump part of the AST
list ast nodes
clang -Xclang -ast-dump -fsyntax-only a.c
clang -emit-ast a.c
clang-check -ast-list lib/parser.cpp | grep AddValue
clang-check -ast-dump -ast-dump-filter=StdStringA --

12 Reference

Author: Hebi Li

Created: 2017-11-27 Mon 03:18