UPDATE: 2010-11-9 Someone else has obsoleted this page. See that page for a more up to date tutorial! :)

OK, so unfortunately, this page was written on September 28, 2008. I'm not sure which version of clang this corresponds to, but it is at this point (November 20, 2009) a year out of date. Many (all?) of the examples no longer work. The purpose of this page is to remedy that. All of these examples should work with clang 2.6. You can download the 2.6 source code for LLVM/Clang here. Let's begin!

First, make sure you have the doxygen tree for clang, it is insanely useful. Do not just use the online docs -- they will most likely be out of date for your version (which should be 2.6).

Tutorial 1

Not much has actually changed for the first part. We're basically trying to create a Preprocessor. Here is how I did that:

llvm::raw_stdout_ostream ost;
TextDiagnosticPrinter tdp(ost);
Diagnostic diag(&tdp);
LangOptions lang;
SourceManager sm;
FileManager fm;
HeaderSearch headers(fm);
TargetInfo *ti = TargetInfo::CreateTargetInfo(LLVM_HOSTTRIPLE);
Preprocessor pp(diag, lang, *ti, sm, headers);

I did not move the Preprocessor into a header of its own. I also found that the -fno-rtti* is no longer necessary. You'll need to do your own leg-work for the required headers (not hard, you do have the docs, don't you?)

* Actually, sometimes this option is still required (though apparently not if you're using a Mac). Thanks to Anton Lokhmotov for the catch!

Also, I found his compile line slightly awkward (mine is not much better, though) so here is the one I use:

g++ clang-test.cpp -g -fno-rtti `llvm-config --cxxflags --ldflags --libs` \
-lclangBasic -lclangLex -lclangDriver -lclangFrontend -lclangParse \
-lclangAST -lclangSema -lclangAnalysis

The llvm-config command will give you all of the LLVM libs with the --libs flag, so we only have to add the clang libraries. This code does not do anything, just like the other tutorial.


Tutorial 2: Processing a file

Add this to your file:

const FileEntry *file = fm.getFile("foo.c");
sm.createMainFileID(file, SourceLocation());

Token Tok;

        do {
                std::cerr << std::endl;
        } while(Tok.isNot(tok::eof));

This of course assumes you have a foo.c file in the same directory. Did I mention #include files are bad at this point? Yeah I probably should have. And that brings us to Tutorial 3!


Tutorial 3: Include Files

C code without #include files is like gin without tonic. You can do it, but why? I don't recall ever writing a useful C file that did not have at least one. So let's correct our preprocessor so it can handle them.

InitHeaderSearch init(headers);
TargetInfo *ti = TargetInfo::CreateTargetInfo(LLVM_HOSTTRIPLE);
Preprocessor pp(diag, lang, *ti, sm, headers);

PreprocessorInitOptions ppio;
InitializePreprocessor(pp, ppio);

Now it should be able to pull in all of the #include files and spit out all of the tokens. It will probably be pretty long if you include any files.


Tutorial 4: Parsing the file

Tokens are good, but now we need to parse the file and here's how we do that:

IdentifierTable tab(lang);
MinimalAction action(pp);
Parser p(pp, action);

And now that prints out statistics about parsing. If you want to do more than just print out the average identifier length, you'll have to keep reading...


Tutorial 5: Doing something interesting

Now for the good stuff. We can subclass MinimalAction and have it do our bidding. ActOnDeclarator gets called when declarators are discovered. This code will do fairly well and demonstrates how to weed out declarators with undesirable properties:

class MyAction : public MinimalAction {
        const Preprocessor& pp;
        MyAction(Preprocessor& prep)
                : MinimalAction(prep), pp(prep) {}
        virtual Action::DeclPtrTy
        ActOnDeclarator(Scope *S, Declarator &D) {
                // Print names of global variables. Differentiating between
                // global variables and global functions is Hard in C, so this
                // is only an approximation.
                const DeclSpec& DS = D.getDeclSpec();
                SourceLocation loc = D.getIdentifierLoc();
                if (
                        // Only global declarations...
                        D.getContext() == Declarator::FileContext
                        // ...that aren't typedefs or `extern` declarations...
                        && DS.getStorageClassSpec() != DeclSpec::SCS_extern
                        && DS.getStorageClassSpec() != DeclSpec::SCS_typedef
                        // ...and no functions...
                        && !D.isFunctionDeclarator()
                                                // ...and in a user header
                        && !pp.getSourceManager().isInSystemHeader(loc)
                        ) {
                        IdentifierInfo *II = D.getIdentifier();
                        std::cerr << "Found global user declarator " << II->getName(
) << std::endl;
                return MinimalAction::ActOnDeclarator(S, D);


Tutorial 6: Semantic Analysis

If you read the parent page you'll see that there are problems with the previous tutorial. We need to do more than simple parsing. We need to subclass ASTConsumer to help us:

class MyASTConsumer : public ASTConsumer {
        virtual void HandleTopLevelDecl(DeclGroupRef D) {
                static int count = 0;
                DeclGroupRef::iterator it;
                for(it = D.begin();
                    it != D.end();
                    it++) {
                        VarDecl *VD = dyn_cast<VarDecl>(*it);
                        if(!VD) continue;
                        if(VD->isFileVarDecl() &&
                           VD->getStorageClass() != VarDecl::Extern) {
                                std::cerr << "Read top-level variable decl: '"
<< VD->getDeclName().getAsString() << "'\n";

We also need to make some modifications to our main function. We need to call the ParseAST function, which just so happens to call EnterMainSourceFile so we no longer need to call that ourselves. So add the following to the bottom of our main:

IdentifierTable tab(lang);
SelectorTable sel;
Builtin::Context builtins(*ti);
MyASTConsumer c;
ASTContext ctx(lang, sm, *ti, tab, sel, builtins);
ParseAST(pp, &c, ctx, false, true);


Tutorial 8: Working with the AST

The other page sort of trails off at this point. Hopefully, this page will do a little better. This was actually the part I was after, so I took it upon myself to do a little more legwork. At any rate, he refers to the "strangely recurring pattern". He's actually referring to the curiously recurring template pattern. The purpose of this pattern is basically to speed up polymorphism. As you may or may not know, when you declare a function as virtual, a virtual function table is built. When said function is called, this table is used to look up the proper function. This incurs all sorts of delays. Normally this would not be problematic, but when performance is a huge concern, most people simply avoid virtual functions. Clang takes a different approach and uses this pattern which allows for something called static polymorphism. Templated classes do not sacrifice performance because they effectively build new code for each template instantiation. If you look at the documentation, find the StmtVisitor.h file. Inside this file you'll see a very large Visit() function. To be continued...

And that's as far as I went. Hope it was helpful. Shoot me an email if you find any errors on this page. Thanks!


Valid HTML 4.01 Strict