UPDATE: 2010-11-9 Someone else has obsoleted this page. See that page for a more up to date tutorial! :)
OK, so unfortunately, this page was written on September 28, 2008. I'm not sure which version of clang this corresponds to, but it is at this point (November 20, 2009) a year out of date. Many (all?) of the examples no longer work. The purpose of this page is to remedy that. All of these examples should work with clang 2.6. You can download the 2.6 source code for LLVM/Clang here. Let's begin!
First, make sure you have the doxygen tree for clang, it is insanely useful. Do not just use the online docs -- they will most likely be out of date for your version (which should be 2.6).
Tutorial 1
Not much has actually changed for the first part. We're basically trying to create a Preprocessor. Here is how I did that:
llvm::raw_stdout_ostream ost;
TextDiagnosticPrinter tdp(ost);
Diagnostic diag(&tdp);
LangOptions lang;
SourceManager sm;
FileManager fm;
HeaderSearch headers(fm);
TargetInfo *ti = TargetInfo::CreateTargetInfo(LLVM_HOSTTRIPLE);
Preprocessor pp(diag, lang, *ti, sm, headers);
I did not move the Preprocessor into a header of its own. I also found that
the -fno-rtti* is no longer necessary. You'll need to do your
own leg-work for the required headers (not hard, you do have the docs,
don't you?)
* Actually, sometimes this option is still required (though apparently not if you're using a Mac). Thanks to Anton Lokhmotov for the catch!
Also, I found his compile line slightly awkward (mine is not much better, though) so here is the one I use:
g++ clang-test.cpp -g -fno-rtti `llvm-config --cxxflags --ldflags --libs` \
-lclangBasic -lclangLex -lclangDriver -lclangFrontend -lclangParse \
-lclangAST -lclangSema -lclangAnalysis
The llvm-config command will give you all of the LLVM libs
with the --libs flag, so we only have to add the clang
libraries. This code does not do anything, just like the other tutorial.
Tutorial 2: Processing a file
Add this to your file:
const FileEntry *file = fm.getFile("foo.c");
sm.createMainFileID(file, SourceLocation());
pp.EnterMainSourceFile();
Token Tok;
do {
pp.Lex(Tok);
if(diag.hasErrorOccurred())
break;
pp.DumpToken(Tok);
std::cerr << std::endl;
} while(Tok.isNot(tok::eof));
This of course assumes you have a foo.c file in the same
directory. Did I mention #include files are bad at this point?
Yeah I probably should have. And that brings us to Tutorial 3!
Tutorial 3: Include Files
C code without #include files is like gin without tonic.
You can do it, but why? I don't recall ever writing a useful C file that
did not have at least one. So let's correct our preprocessor so it can
handle them.
...
InitHeaderSearch init(headers);
init.AddDefaultSystemIncludePaths(lang);
init.Realize();
TargetInfo *ti = TargetInfo::CreateTargetInfo(LLVM_HOSTTRIPLE);
Preprocessor pp(diag, lang, *ti, sm, headers);
PreprocessorInitOptions ppio;
InitializePreprocessor(pp, ppio);
Now it should be able to pull in all of the #include files
and spit out all of the tokens. It will probably be pretty long if you
include any files.
Tutorial 4: Parsing the file
Tokens are good, but now we need to parse the file and here's how we do that:
IdentifierTable tab(lang);
MinimalAction action(pp);
Parser p(pp, action);
p.ParseTranslationUnit();
tab.PrintStats();
And now that prints out statistics about parsing. If you want to do more than just print out the average identifier length, you'll have to keep reading...
Tutorial 5: Doing something interesting
Now for the good stuff. We can subclass MinimalAction and
have it do our bidding. ActOnDeclarator gets called when
declarators are discovered. This code will do fairly well and demonstrates
how to weed out declarators with undesirable properties:
class MyAction : public MinimalAction {
const Preprocessor& pp;
public:
MyAction(Preprocessor& prep)
: MinimalAction(prep), pp(prep) {}
virtual Action::DeclPtrTy
ActOnDeclarator(Scope *S, Declarator &D) {
// Print names of global variables. Differentiating between
// global variables and global functions is Hard in C, so this
// is only an approximation.
const DeclSpec& DS = D.getDeclSpec();
SourceLocation loc = D.getIdentifierLoc();
if (
// Only global declarations...
D.getContext() == Declarator::FileContext
// ...that aren't typedefs or `extern` declarations...
&& DS.getStorageClassSpec() != DeclSpec::SCS_extern
&& DS.getStorageClassSpec() != DeclSpec::SCS_typedef
// ...and no functions...
&& !D.isFunctionDeclarator()
// ...and in a user header
&& !pp.getSourceManager().isInSystemHeader(loc)
) {
IdentifierInfo *II = D.getIdentifier();
std::cerr << "Found global user declarator " << II->getName(
) << std::endl;
}
return MinimalAction::ActOnDeclarator(S, D);
}
};
Tutorial 6: Semantic Analysis
If you read the parent
page you'll see that there are problems with the previous tutorial. We
need to do more than simple parsing. We need to subclass
ASTConsumer to help us:
class MyASTConsumer : public ASTConsumer {
public:
virtual void HandleTopLevelDecl(DeclGroupRef D) {
static int count = 0;
DeclGroupRef::iterator it;
for(it = D.begin();
it != D.end();
it++) {
VarDecl *VD = dyn_cast<VarDecl>(*it);
if(!VD) continue;
if(VD->isFileVarDecl() &&
VD->getStorageClass() != VarDecl::Extern) {
std::cerr << "Read top-level variable decl: '"
<< VD->getDeclName().getAsString() << "'\n";
}
}
}
};
We also need to make some modifications to our main function.
We need to call the ParseAST function, which just so happens to
call EnterMainSourceFile so we no longer need to call that
ourselves. So add the following to the bottom of our main:
IdentifierTable tab(lang);
SelectorTable sel;
Builtin::Context builtins(*ti);
MyASTConsumer c;
ASTContext ctx(lang, sm, *ti, tab, sel, builtins);
ParseAST(pp, &c, ctx, false, true);
Tutorial 8: Working with the AST
The other page sort of trails off at this point. Hopefully, this page will
do a little better. This was actually the part I was after, so I took it upon
myself to do a little more legwork. At any rate, he refers to the "strangely
recurring pattern". He's actually referring to the
curiously
recurring template pattern. The purpose of this pattern is basically to speed up
polymorphism. As you may or may not know, when you declare a function as
virtual, a virtual function table is built. When said function is
called, this table is used to look up the proper function. This incurs all sorts of
delays. Normally this would not be problematic, but when performance is a huge concern,
most people simply avoid virtual functions. Clang takes a different approach and uses this
pattern which allows for something called static polymorphism. Templated classes
do not sacrifice performance because they effectively build new code for each
template instantiation. If you look at the documentation, find the
StmtVisitor.h file. Inside this file you'll see a very large
Visit() function. To be continued...
And that's as far as I went. Hope it was helpful. Shoot me an email if
you find any errors on this page. Thanks!
-Justin