share
Stack OverflowLearning to write a compiler
[+699] [40] Anton
[2008-08-04 22:46:36]
[ compiler-construction language-agnostic ]
[ https://stackoverflow.com/questions/1669/learning-to-write-a-compiler ]

Preferred languages: C/C++, Java, and Ruby.

I am looking for some helpful books/tutorials on how to write your own compiler simply for educational purposes. I am most familiar with C/C++, Java, and Ruby, so I prefer resources that involve one of those three, but any good resource is acceptable.

ANTLR all the way. All the resources proposed below looks like an overkill to me. ANTLR is always a compiler designer best friend. A - A_Var
If your main focus is to learn how compiling ideas work in general - you can check and SICP short for Structured Interpretation of Computer program based in Scheme ( List) but teaches the general principles . mitpress.mit.edu/sicp . I was recommended this book by a veteran who works for a company and does these works compilation and interpretation for a living ! - Nishant
A shameless plug: my answer on a similar question. - 9000
I wrote an article on creating a compiler on my blog: orangejuiceliberationfront.com/how-to-write-a-compiler It focuses on the very basics and getting started, really. There's a bunch more compiler/codegen/parser/language design-related articles on there. - uliwitness
[+1034] [2008-08-04 22:52:00] Michael Stum [ACCEPTED]

Big List of Resources:

Legend:

  • ¶ Link to a PDF file
  • $ Link to a printed book
[1] http://www.cs.indiana.edu/~dyb/pubs/nano-jfp.pdf
[2] https://rads.stackoverflow.com/amzn/click/1558603204
[3] http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf
[4] http://javadude.com/articles/antlr3xtut
[5] http://www.diku.dk/~torbenm/Basics/
[6] http://www.onlamp.com/pub/a/onlamp/2004/04/15/parrot_compiler_construction.html
[7] http://www.cs.man.ac.uk/~pjj/farrell/compmain.html
[8] https://rads.stackoverflow.com/amzn/click/0201403536
[9] https://rads.stackoverflow.com/amzn/click/0442275366
[10] https://rads.stackoverflow.com/amzn/click/0805321667
[11] http://craftinginterpreters.com/
[12] http://www.holub.com/software/compiler.design.in.c.html
[13] https://rads.stackoverflow.com/amzn/click/0321486811
[14] http://en.wikipedia.org/wiki/Compilers:_Principles%2C_Techniques%2C_and_Tools
[15] https://rads.stackoverflow.com/amzn/click/012088478X
[16] http://www.cs.indiana.edu/eopl/
[17] http://flipcode.com/archives/articles.shtml
[18] https://rads.stackoverflow.com/amzn/click/1931841578
[19] http://www.codeproject.com/KB/recipes/B32Machine1/VMCS.pdf
[20] http://research.microsoft.com/~simonpj/papers/pj-lester-book/
[21] http://www1.digitalgrammars.com/ipl-book/
[22] http://www.codeproject.com/KB/recipes/programminglanguagetoools.aspx
[23] http://en.wikipedia.org/wiki/Interpreter_pattern
[24] https://rads.stackoverflow.com/amzn/click/0201633612
[25] http://pragprog.com/titles/tpdsl/language-implementation-patterns
[26] http://compilers.iecc.com/crenshaw/
[27] http://www.stack.nl/~marcov/compiler.pdf
[28] http://books.google.com/books?id=Id9cYsIdjIwC&lpg=PP1&ots=IxFkFWJ-8V&dq=%22linkers%20and%20loaders%22&pg=PA215#v=onepage&q=%22linkers%20and%20loaders%22&f=false
[29] https://rads.stackoverflow.com/amzn/click/0521562473
[30] http://llvm.org/docs/tutorial/
[31] https://rads.stackoverflow.com/amzn/click/0521607647
[32] https://rads.stackoverflow.com/amzn/click/052182060X
[33] https://rads.stackoverflow.com/amzn/click/0521607655
[34] https://rads.stackoverflow.com/amzn/click/013630740X
[35] http://www.dickgrune.com/Books/PTAPG_1st_Edition/
[36] http://www-old.oberon.ethz.ch/WirthPubl/ProjectOberon.pdf
[37] https://rads.stackoverflow.com/amzn/click/0137302673
[38] http://www.cs.brown.edu/~sk/Publications/Books/ProgLangs/
[39] http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AITR-474.pdf
[40] http://web.archive.org/web/20141221110345/http://cm.bell-labs.com/who/ken/trust.html
[41] http://msdn.microsoft.com/en-us/magazine/cc136756.aspx
[42] http://mitpress.mit.edu/sicp/
[43] http://www.cis.upenn.edu/~bcpierce/tapl/
[44] http://prog21.dadgum.com/30.html
[45] http://www.hokstad.com/writing-a-compiler-in-ruby-bottom-up-step-1.html

(19) I've read Let's Build a Compiler [compilers.iecc.com/crenshaw/] series, it is really nice writeup and is a good starting point. - TheVillageIdiot
(4) I think one worth mentioning is Coursera's compilers course. It has nice videos and walks through creating a java like language / simple compiler. Coursera Compilers Link - QuantumKarl
(1) I wanted to keep this answer posted to being as original as possible so I decided to post this reference here: tutorialspoint.com/compiler_design/index.htm What I liked about this site is that it doesn't get involved with actually writing any code to create a compiler, but it does break down the compiler into its parts: phases and stages. It does describe the logic and algorithmic design approach without any specific language paradigm as it expresses the notations of an arbitrary language and alphabet. It is a quick read, but gives you the concepts of what is needed for each part. - Francis Cugler
1
[+69] [2009-07-20 23:01:20] mrduclaw

This is a pretty vague question, I think; just because of the depth of the topic involved. A compiler can be decomposed into two separate parts, however; a top-half and a bottom-one. The top-half generally takes the source language and converts it into an intermediate representation, and the bottom half takes care of the platform specific code generation.

Nonetheless, one idea for an easy way to approach this topic (the one we used in my compilers class, at least) is to build the compiler in the two pieces described above. Specifically, you'll get a good idea of the entire process by just building the top-half.

Just doing the top half lets you get the experience of writing the lexical analyzer and the parser and go to generating some "code" (that intermediate representation I mentioned). So it will take your source program and convert it to another representation and do some optimization (if you want), which is the heart of a compiler. The bottom half will then take that intermediate representation and generate the bytes needed to run the program on a specific architecture. For example, the the bottom half will take your intermediate representation and generate a PE executable.

Some books on this topic that I found particularly helpful was Compilers Principles and Techniques [1] (or the Dragon Book, due to the cute dragon on the cover). It's got some great theory and definitely covers Context-Free Grammars in a really accessible manner. Also, for building the lexical analyzer and parser, you'll probably use the *nix tools lex and yacc. And uninterestingly enough, the book called " lex and yacc [2]" picked up where the Dragon Book left off for this part.

[1] http://rads.stackoverflow.com/amzn/click/0321486811
[2] http://rads.stackoverflow.com/amzn/click/1565920007

2
[+55] [2008-08-10 07:54:32] Dominic Cooney

I think Modern Compiler Implementation in ML [1] is the best introductory compiler writing text. There's a Java version [2] and a C version [3] too, either of which might be more accessible given your languages background. The book packs a lot of useful basic material (scanning and parsing, semantic analysis, activation records, instruction selection, RISC and x86 native code generation) and various "advanced" topics (compiling OO and functional languages, polymorphism, garbage collection, optimization and single static assignment form) into relatively little space (~500 pages).

I prefer Modern Compiler Implementation to the Dragon book because Modern Compiler implementation surveys less of the field--instead it has really solid coverage of all the topics you would need to write a serious, decent compiler. After you work through this book you'll be ready to tackle research papers directly for more depth if you need it.

I must confess I have a serious soft spot for Niklaus Wirth's Compiler Construction. [4] It is available online [5] as a PDF. I find Wirth's programming aesthetic simply beautiful, however some people find his style too minimal (for example Wirth favors recursive descent parsers, but most CS courses focus on parser generator tools; Wirth's language designs are fairly conservative.) Compiler Construction is a very succinct distillation of Wirth's basic ideas, so whether you like his style or not or not, I highly recommend reading this book.

[1] http://rads.stackoverflow.com/amzn/click/0521607647
[2] http://rads.stackoverflow.com/amzn/click/052182060X
[3] http://rads.stackoverflow.com/amzn/click/0521607655
[4] http://rads.stackoverflow.com/amzn/click/0201403536
[5] http://www-old.oberon.ethz.ch/WirthPubl/CBEAll.pdf

Compiler Construction PDF ethoberon.ethz.ch/WirthPubl/CBEAll.pdf - matepal297
I strongly recommend against the C version of "Modern Compiler Implementation", it's crippled by low-level details due to C. It completely clutters the book. Java 1st is not too good as its OO design is poor, Java 2nd ed is no longer about the Tiger language. So I strongly recommend the ML one: it is not necessary to be fluent in ML to understand it. ML is definitely well suited for the job. - akim
3
[+46] [2008-08-04 23:08:18] user316

I concur with the Dragon Book reference; IMO, it is the definitive guide to compiler construction. Get ready for some hardcore theory, though.

If you want a book that is lighter on theory, Game Scripting Mastery [1] might be a better book for you. If you are a total newbie at compiler theory, it provides a gentler introduction. It doesn't cover more practical parsing methods (opting for non-predictive recursive descent without discussing LL or LR parsing), and as I recall, it doesn't even discuss any sort of optimization theory. Plus, instead of compiling to machine code, it compiles to a bytecode that is supposed to run on a VM that you also write.

It's still a decent read, particularly if you can pick it up for cheap on Amazon. If you only want an easy introduction into compilers, Game Scripting Mastery is not a bad way to go. If you want to go hardcore up front, then you should settle for nothing less than the Dragon Book.

[1] http://rads.stackoverflow.com/amzn/click/1931841578

(1) Game Scripting Mastery is a great learning resource because when you're done you will have a playable, scriptable 2D adventure game. This makes every exercise focused on a specific purpose, and keeps the reader motivated. - Dour High Arch
(1) Dragon is a bit overly focussed on grammar based parsing. If you are not trying to parse something sheer impossible like C++ or so using parser generators, but can use e.g. a handcrafted LL grammar you might want to look out for something that treats a higher percentage compiler fields other than grammar transformation and proving - Marco van de Voort
4
[+28] [2008-08-04 22:56:30] saniul

"Let's Build a Compiler" [1] is awesome, but it's a bit outdated. (I'm not saying it makes it even a little bit less valid.)

Or check out SLANG [2]. This is similar to "Let's Build a Compiler" but is a much better resource especially for beginners. This comes with a pdf tutorial which takes a 7 step approach at teaching you a compiler. Adding the quora link as it have the links to all the various ports of SLANG, in C++, Java and JS, also interpreters in python and java, originally written using C# and the .NET platform.

[1] http://compilers.iecc.com/crenshaw/
[2] https://www.quora.com/As-a-self-taught-programmer-how-can-I-learn-about-compilers/answer/Akhil-Kooliyatt?srid=TAGM

(4) I agree that this series is a bit outdated, although it is still useful. However, my biggest gripe with it is the fact that it tries to output straight to assembly language rather than building any type of parse tree, which means (contrary to what is stated in the first article) that it isn't very useful for writing an interpreter. - a_m0d
5
[+24] [2008-08-04 23:13:49] Peter Burns

If you're looking to use powerful, higher level tools rather than building everything yourself, going through the projects and readings for this course [1] is a pretty good option. It's a languages course by the author of the Java parser engine ANTLR. You can get the book for the course as a PDF from the Pragmatic Programmers [2].

The course goes over the standard compiler compiler stuff that you'd see elsewhere: parsing, types and type checking, polymorphism, symbol tables, and code generation. Pretty much the only thing that isn't covered is optimizations. The final project is a program that compiles a subset of C [3]. Because you use tools like ANTLR and LLVM, it's feasible to write the entire compiler in a single day (I have an existence proof of this, though I do mean ~24 hours). It's heavy on practical engineering using modern tools, a bit lighter on theory.

LLVM, by the way, is simply fantastic. Many situations where you might normally compile down to assembly, you'd be much better off compiling to LLVM's Intermediate Representation [4] instead. It's higher level, cross platform, and LLVM is quite good at generating optimized assembly from it.

[1] http://www.antlr.org/wiki/display/CS652/CS652+Home
[2] http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
[3] http://www.antlr.org/wiki/display/CS652/C+subset+compiler
[4] http://llvm.org/docs/LangRef.html

The first link is dead. - Lynn
6
[+20] [2010-08-28 23:14:41] jochenleidner

If you have little time, I recommend Niklaus Wirth's "Compiler Construction" (Addison-Wesley. 1996) [1], a tiny little booklet that you can read in a day, but it explains the basics (including how to implement lexers, recursive descent parsers, and your own stack-based virtual machines). After that, if you want a deep dive, there's no way around the Dragon book as other commenters suggest.

[1] http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

Father of Pascal - Orhan Cinar
If you have not much time, don' write a compiler. - Ingo
7
[+17] [2009-07-20 22:47:04] Zachary Murray

You might want to look into Lex/Yacc (or Flex/Bison, whatever you want to call them). Flex is a lexical analyzer, which will parse and identify the semantic components ("tokens") of your language, and Bison will be used to define what happens when each token is parsed. This could be, but is definitely not limited to, printing out C code, for a compiler that would compile to C, or dynamically running the instructions.

This FAQ [1] should help you, and this tutorial [2] looks quite useful.

[1] http://tldp.org/HOWTO/Lex-YACC-HOWTO-5.html
[2] http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/

8
[+16] [2009-07-21 10:37:00] user141335

Generally speaking, there's no five minutes tutorial for compilers, because it's a complicated topic and writing a compiler can take months. You will have to do your own search.

Python and Ruby are usually interpreted. Perhaps you want to start with an interpreter as well. It's generally easier.

The first step is to write a formal language description, the grammar of your programming language. Then you have to transform the source code that you want to compile or interpret according to the grammar into an abstract syntax tree, an internal form of the source code that the computer understands and can operate on. This step is usually called parsing and the software that parses the source code is called a parser. Often the parser is generated by a parser generator which transform a formal grammar into source oder machine code. For a good, non-mathematical explanation of parsing I recommend Parsing Techniques - A Practical Guide. Wikipedia has a comparison of parser generators from which you can choose that one that is suitable for you. Depending on the parser generator you chose, you will find tutorials on the Internet and for really popular parser generators (like GNU bison) there are also books.

Writing a parser for your language can be really hard, but this depends on your grammar. So I suggest to keep your grammar simple (unlike C++); a good example for this is LISP.

In the second step the abstract syntax tree is transformed from a tree structure into a linear intermediate representation. As a good example for this Lua's bytecode is often cited. But the intermediate representation really depends on your language.

If you are building an interpreter, you will simply have to interpret the intermediate representation. You could also just-in-time-compile it. I recommend LLVM and libjit for just-in-time-compilation. To make the language usable you will also have to include some input and output functions and perhaps a small standard library.

If you are going to compile the language, it will be more complicated. You will have to write backends for different computer architectures and generate machine code from the intermediate representation in those backends. I recommend LLVM for this task.

There are a few books on this topic, but I can recommend none of them for general use. Most of them are too academic or too practical. There's no "Teach yourself compiler writing in 21 days" and thus, you will have to buy several books to get a good understanding of this entire topic. If you search the Internet, you will come across some some online books and lecture notes. Maybe there's a university library nearby you where you can borrow books on compilers.

I also recommend a good background knowledge in theoretical computer science and graph theory, if you are going to make your project serious. A degree in computer science will also be helpful.


++ You're right that it's good to know all those things, and it can be a big job, but I also learned from some experts how not to make things a big deal. It's good to know things, and it's even better to know when not to use them, which is most of the time. - Mike Dunlavey
9
[+14] [2010-05-17 23:38:29] Taylor Leese

Take a look at the book below. The author is the creator of ANTLR [1].

Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages [2].

alt text

[1] http://www.antlr.org/
[2] http://pragprog.com/titles/tpdsl/language-implementation-patterns

10
[+11] [2008-08-18 20:18:32] Ben Combee

One book not yet suggested but very important is "Linkers and Loaders" [1] by John Levine. If you're not using an external assembler, you'll need a way to output a object file that can be linked into your final program. Even if you're using an external assembler, you'll probably need to understand relocations and how the whole program loading process works to make a working tool. This book collects a lot of the random lore around this process for various systems, including Win32 and Linux.

[1] http://books.google.com/books?id=h34d_jr2iikC&dq=%22linkers+and+loaders%22&pg=PP1&ots=IxFkFWJ-8V&sig=GSlclmkezTRL6YYguGJmZsnkM3c&hl=en&sa=X&oi=book_result&resnum=1&ct=result

11
[+10] [2008-08-05 16:16:01] Chris Bunch

The Dragon Book [1] is definitely the "building compilers" book, but if your language isn't quite as complicated as the current generation of languages, you may want to look at the Interpreter pattern from Design Patterns [2].

The example in the book designs a regular expression-like language and is well thought through, but as they say in the book, it's good for thinking through the process but is effective really only on small languages. However, it is much faster to write an Interpreter for a small language with this pattern than having to learn about all the different types of parsers, yacc and lex, et cetera...

[1] http://rads.stackoverflow.com/amzn/click/0321486811
[2] http://rads.stackoverflow.com/amzn/click/0201633612

12
[+10] [2008-08-20 10:01:17] wvdschel

If you're willing to use LLVM, check this out: http://llvm.org/docs/tutorial/. It teaches you how to write a compiler from scratch using LLVM's framework, and doesn't assume you have any knowledge about the subject.

The tutorial suggest you write your own parser and lexer etc, but I advise you to look into bison and flex once you get the idea. They make life so much easier.


But the documentation for setting it up of Visual Studio is badly written, plus no examples - SpicyWeenie
13
[+10] [2010-08-09 18:33:42] Lothar

I found the Dragon book much too hard to read with too much focus on language theory that is not really required to write a compiler in practice.

I would add the Oberon [1] book which contains the full source of an amazingly fast and simple Oberon compiler Project Oberon [2].

Alt text

[1] http://en.wikipedia.org/wiki/Oberon_%28programming_language%29
[2] http://www.amazon.de/Project-Oberon-Design-Operating-Compiler/dp/0201544288/ref=sr_1_3?ie=UTF8&s=books-intl-de&qid=1281378762&sr=8-3

14
[+9] [2008-08-20 11:28:58] bootload

"... Let's Build a Compiler ..."

I'd second http://compilers.iecc.com/crenshaw/ by @sasb [1]. Forget buying more books for the moment.

Why? Tools & language.

The language required is Pascal and if I remember correctly is based on Turbo-Pascal. It just so happens if you go to http://www.freepascal.org/ and download the Pascal compiler all the examples work straight from the page ~ http://www.freepascal.org/download.var The beaut thing about Free Pascal is you can use it almost whatever processor or OS you can care for.

Once you have mastered the lessons then try the more advanced " Dragon Book [2]" ~ http://en.wikipedia.org/wiki/Dragon_book

[1] https://stackoverflow.com/questions/1669/learning-to-write-a-compiler#1678
[2] http://en.wikipedia.org/wiki/Dragon_book

15
[+9] [2008-12-30 23:01:20] dbones

I am looking into the same concept, and found this promising article by Joel Pobar,

Create a Language Compiler for the .NET Framework - not sure where this has gone [1]

Create a Language Compiler for the .NET Framework - pdf copy of the original doc [2]

he discusses a high level concept of a compiler and proceeds to invent his own langauge for the .Net framework. Although its aimed at the .Net Framework, many of the concepts should be able to be reproduced. The Article covers:

  1. Langauge definition
  2. Scanner
  3. Parser (the bit im mainly interested in)
  4. Targeting the .Net Framework The
  5. Code Generator

there are other topics, but you get the just.

Its aimed to people starting out, written in C# (not quite Java)

HTH

bones

[1] http://msdn.microsoft.com/en-us/magazine/cc136756.aspx
[2] http://msdn.microsoft.com/en-us/magazine/cc136756.aspx

What does "not quite Java" mean? - Hejazzman
haha, sorry, i meant its written for .Net, which in principal is similar to java. Both are JIT in style. :) - dbones
16
[+9] [2009-08-06 22:37:05] Pandafox

I remember asking this question about seven years ago when I was rather new to programming. I was very careful when I asked and surprisingly I didn't get as much criticism as you are getting here. They did however point me in the direction of the " Dragon Book [1]" which is in my opinion, a really great book that explains everything you need to know to write a compiler (you will of course have to master a language or two. The more languages you know, the merrier.).

And yes, many people say reading that book is crazy and you won't learn anything from it, but I disagree completely with that.

Many people also say that writing compilers is stupid and pointless. Well, there are a number of reasons why compiler development are useful: - Because it's fun. - It's educational, when learning how to write compilers you will learn a lot about computer science and other techniques that are useful when writing other applications. - If nobody wrote compilers the existing languages wouldn't get any better.

I didn't write my own compiler right away, but after asking I knew where to start. And now, after learning many different languages and reading the Dragon Book, writing isn't that much of a problem. (I'm also studying computer engineering atm, but most of what I know about programming is self taught.)

In conclusion: - The Dragon Book is a great "tutorial". But spend some time mastering a language or two before attempting to write a compiler. Don't expect to be a compiler guru within the next decade or so though.

The book is also good if you want to learn how to write parsers/interpreters.

[1] http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools

17
[+8] [2008-08-20 09:56:34] Peter Stuifzand

An easy way to create a compiler is to use bison and flex (or similar), build a tree (AST) and generate code in C. With generating C code being the most important step. By generating C code, your language will automatically work on all platforms that have a C compiler.

Generating C code is as easy as generating HTML (just use print, or equivalent), which in turn is much easier than writing a C parser or HTML parser.


18
[+8] [2008-10-05 16:00:52] Kragen Javier Sitaker

You should check out Darius Bacon's " ichbins [1]", which is a compiler for a small Lisp dialect, targeting C, in just over 6 pages of code. The advantage it has over most toy compilers is that the language is complete enough that the compiler is written in it. (The tarball also includes an interpreter to bootstrap the thing.)

There's more stuff about what I found useful in learning to write a compiler on my Ur-Scheme [2] web page.

[1] https://github.com/darius/ichbins
[2] http://www.canonical.org/~kragen/sw/urscheme/

19
[+8] [2010-06-26 20:17:04] joe snyder

From the comp.compilers FAQ [1]:

"Programming a Personal Computer" by Per Brinch Hansen Prentice-Hall 1982 ISBN 0-13-730283-5

This unfortunately-titled book explains the design and creation of a single-user programming environment for micros, using a Pascal-like language called Edison. The author presents all source code and explanations for the step-by-step implementation of an Edison compiler and simple supporting operating system, all written in Edison itself (except for a small supporting kernel written in a symbolic assembler for PDP 11/23; the complete source can also be ordered for the IBM PC).

The most interesting things about this book are: 1) its ability to demonstrate how to create a complete, self-contained, self-maintaining, useful compiler and operating system, and 2) the interesting discussion of language design and specification problems and trade-offs in Chapter 2.

"Brinch Hansen on Pascal Compilers" by Per Brinch Hansen Prentice-Hall 1985 ISBN 0-13-083098-4

Another light-on-theory heavy-on-pragmatics here's-how-to-code-it book. The author presents the design, implementation, and complete source code for a compiler and p-code interpreter for Pascal- (Pascal "minus"), a Pascal subset with boolean and integer types (but no characters, reals, subranged or enumerated types), constant and variable definitions and array and record types (but no packed, variant, set, pointer, nameless, renamed, or file types), expressions, assignment statements, nested procedure definitions with value and variable parameters, if statements, while statements, and begin-end blocks (but no function definitions, procedural parameters, goto statements and labels, case statements, repeat statements, for statements, and with statements).

The compiler and interpreter are written in Pascal* (Pascal "star"), a Pascal subset extended with some Edison-style features for creating software development systems. A Pascal* compiler for the IBM PC is sold by the author, but it's easy to port the book's Pascal- compiler to any convenient Pascal platform.

This book makes the design and implementation of a compiler look easy. I particularly like the way the author is concerned with quality, reliability, and testing. The compiler and interpreter can easily be used as the basis for a more involved language or compiler project, especially if you're pressed to quickly get something up and running.

[1] http://www.faqs.org/faqs/compilers/faq/

20
[+7] [2008-08-12 11:25:34] yeruham

Python comes bundled with a python compiler written in Python. You can see the source code, and it includes all phases, from parsing, abstract syntax tree, emitting code, etc. Hack it.


21
[+7] [2008-09-16 16:21:23] mfx

The LCC compiler ( wikipedia [1]) ( project homepage [2]) of Fraser and Hanson is described in their book "A Retargetable C Compiler: Design and Implementation". It is quite readable and explains the whole compiler, down to code generation.

[1] http://en.wikipedia.org/wiki/Local_C_compiler
[2] http://www.cs.princeton.edu/software/lcc/

This seems like an extremely good resource thanks. - gideon
22
[+7] [2009-04-25 17:23:55] eKek0

Sorry, it is in Spanish, but this is the bibliography of a course called "Compiladores e Intérpretes" (Compilers and Interpreters) in Argentina.

The course was from formal language theory to compiler construction, and these are the topics you need to build, at least, a simple compiler:

  • Compilers Design in C.
    Allen I. Holub

    Prentice-Hall. 1990.

  • Compiladores. Teoría y Construcción.
    Sanchís Llorca, F.J. , Galán Pascual, C. Editorial Paraninfo. 1988.

  • Compiler Construction.
    Niklaus Wirth

    Addison-Wesley. 1996.

  • Lenguajes, Gramáticas y Autómatas. Un enfoque práctico.
    Pedro Isasi Viñuela, Paloma Martínez Fernández, Daniel Borrajo Millán. Addison-Wesley Iberoamericana (España). 1997.

  • The art of compiler design. Theory and practice.
    Thomas Pittman, James Peters.

    Prentice-Hall. 1992.

  • Object-Oriented Compiler Construction.
    Jim Holmes.
    Prentice Hall, Englewood Cliffs, N.J. 1995

  • Compiladores. Conceptos Fundamentales.
    B. Teufel, S. Schmidt, T. Teufel.

    Addison-Wesley Iberoamericana. 1995.

  • Introduction to Automata Theory, Languages, and Computation.

    John E. Hopcroft. Jeffref D. Ullman.
    Addison-Wesley. 1979.

  • Introduction to formal languages.
    György E. Révész.

    Mc Graw Hill. 1983.

  • Parsing Techniques. A Practical Guide.
    Dick Grune, Ceriel Jacobs.
    Impreso por los autores. 1995
    http://www.cs.vu.nl/~dick/PTAPG.html

  • Yacc: Yet Another Compiler-Compiler.
    Stephen C. Johnson
    Computing Science Technical Report Nº 32, 1975. Bell Laboratories. Murray Hill, New
    Jersey.

  • Lex: A Lexical Analyzer Generator.
    M. E. Lesk, E. Schmidt. Computing Science Technical Report Nº 39, 1975. Bell Laboratories. Murray Hill, New Jersey.

  • lex & yacc.
    John R. Levine, Tony Mason, Doug Brown.
    O’Reilly & Associates. 1995.

  • Elements of the theory of computation.
    Harry R. Lewis, Christos H. Papadimitriou. Segunda Edición. Prentice Hall. 1998.

  • Un Algoritmo Eficiente para la Construcción del Grafo de Dependencia de Control.
    Salvador V. Cavadini.
    Trabajo Final de Grado para obtener el Título de Ingeniero en Computación.
    Facultad de Matemática Aplicada. U.C.S.E. 2001.


23
[+6] [2009-07-20 22:44:40] Sam Harwell
  1. This is a vast subject. Do not underestimate this point. And do not underestimate my point to not underestimate it.
  2. I hear the Dragon Book [1] is a (the?) place to start, along with searching. :) Get better at searching, eventually it will be your life.
  3. Building your own programming language is absolutely a good exercise! But know that it will never be used for any practical purpose in the end. Exceptions to this are few and very far between.
[1] http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools

(4) If you haven't read the Dragon book. Please don't recommend it. In fact, have you ever implemented a compiler? - anon
Yeah, as the name implies, the Dragon Book is a monster. Very in-depth, but a very good resource nonetheless. I wouldn't recommend it for beginners, though... - Zachary Murray
(2) @Neil: You haven't google'd me, have you? lol. blog.280z28.org But no, I haven't read that book. - Sam Harwell
I'm reading it (the dragon book) presently, and also Lex/Yacc at the same time, I'm finding the book quite good. Personally. - Simeon Pilgrim
If you like it all well and good. My problem is with people that blindly recommend it whenever the word "compiler" is mentioned. Particularly if they haven't actually read it! - anon
Neil, what do you find bad about the book? I haven't read it yet, I just keep hearing it's a good book, until now. - GManNickG
(1) To be fair, I prefaced it with "I hear...". :) #1 and #3 are the points I feel are extremely important to know going in but aren't mentioned as often. - Sam Harwell
@GMan I personally find it unreadable & I disagree with its approach. Aho (main author) has always believed in the table-driven, automated approach to compilers. I disagree, particularly when teaching people how to write them. Having said that, as I hated whatever edition I've got (can't be arsed looking, but early I think) , the later editions may be worthwhile, I suppose. - anon
It's still worth reading the Dragon Book even if you disagree with its approach. Compiler design is a very sticky subject and it's important to understand all the strange issues one has to contend with. - Ether
24
[+6] [2009-08-28 00:01:07] Ira Baxter

Not a book, but a technical paper and an enormously fun learning experience if you want to know more about compilers (and metacompilers)... This website walks you through building a completely self-contained compiler system that can compile itself and other languages:

Tutorial: Metacompilers Part 1 [1]

This is all based on an amazing little 10-page technical paper:

Val Schorre META II: A Syntax-Oriented Compiler Writing Language

from honest-to-god 1964. I learned how to build compilers from this back in 1970. There's a mind-blowing moment when you finally grok how the compiler can regenerate itself....

I know the website author from my college days, but I have nothing to do with the website.

[1] http://www.bayfronttechnologies.com/mc_tutorial.html

As others say, is BIG argument, I think sushi a task is a final work for bachelor, it requires to know a LOT of concepts of math, computer science and so on. - ingconti
If you don't know these topics, you shouldn't really be trying to build a serious compiler. However, if you have 2-3 years undergraduate computer science education (programming, data structures, assembly language), the MetaII paper will work for you. - Ira Baxter
25
[+5] [2008-09-18 23:31:14] tovare

There's a lot of good answers here, so i thought I'd just add one more to the list:

I got a book called Project Oberon more than a decade ago, which has some very well written text on the compiler. The book really stands out in the sense that the source and explanations is very hands on and readable. The complete text (the 2005 edition) has been made available in pdf, so you can download right now. The compiler is discussed in chapter 12:

http://www-old.oberon.ethz.ch/WirthPubl/ProjectOberon.pdf

Niklaus Wirth, Jürg Gutknecht

(The treatment is not as extensive as his book on compilers)

I've read several books on compilers, and i can second the dragon book, time spent on this book is very worthwhile.


26
[+5] [2008-10-01 09:30:26] Mark Reid

If you are interested in writing a compiler for a functional language (rather than a procedural one) Simon Peyton-Jones and David Lester's " Implementing functional languages: a tutorial [1]" is an excellent guide.

The conceptual basics of how functional evaluation works is guided by examples in a simple but powerful functional language called "Core". Additionally, each part of the Core language compiler is explained with code examples in Miranda (a pure functional language very similar to Haskell).

Several different types of compilers are described but even if you only follow the so-called template compiler for Core you will have an excellent understanding of what makes functional programming tick.

[1] http://research.microsoft.com/~simonpj/papers/pj-lester-book/

27
[+4] [2008-08-22 15:57:16] dmckee

I liked the Crenshaw tutorial [1] too, because it makes it absolutely clear that a compiler is just another program that reads some input and writes some out put.

Read it.

Work it if you want, but then look at another reference on how bigger and more complete compilers are really written.

And read On Trusting Trust [2], to get a clue about the unobvious things that can be done in this domain.

[1] http://compilers.iecc.com/crenshaw/
[2] http://cm.bell-labs.com/who/ken/trust.html

28
[+4] [2011-07-14 15:42:33] timaschew

You can use BCEL [1] by the Apache Software Foundation. With this tool you can generate assembler-like code, but it's Java with the BCEL API. You can learn how you can generate intermediate language code (in this case byte code).

Simple example

  1. Create a Java class with this function:

    public String maxAsString(int a, int b) {
        if (a > b) {
            return Integer.valueOf(a).toString();
        } else if (a < b) {
            return Integer.valueOf(b).toString();
        } else {
            return "equals";
        }
    }
    

Now run BCELifier with this class

BCELifier bcelifier = new BCELifier("MyClass", System.out);
bcelifier.start();

You can see the result on the console for the whole class (how to build byte code MyClass.java). The code for the function is this:

private void createMethod_1() {
  InstructionList il = new InstructionList();
  MethodGen method = new MethodGen(ACC_PUBLIC, Type.STRING, new Type[] { Type.INT, Type.INT }, new String[] { "arg0", "arg1" }, "maxAsString", "MyClass", il, _cp);

  il.append(InstructionFactory.createLoad(Type.INT, 1)); // Load first parameter to address 1
  il.append(InstructionFactory.createLoad(Type.INT, 2)); // Load second parameter to adress 2
    BranchInstruction if_icmple_2 = InstructionFactory.createBranchInstruction(Constants.IF_ICMPLE, null); // Do if condition (compare a > b)
  il.append(if_icmple_2);
  il.append(InstructionFactory.createLoad(Type.INT, 1)); // Load value from address 1 into the stack
  il.append(_factory.createInvoke("java.lang.Integer", "valueOf", new ObjectType("java.lang.Integer"), new Type[] { Type.INT }, Constants.INVOKESTATIC));
  il.append(_factory.createInvoke("java.lang.Integer", "toString", Type.STRING, Type.NO_ARGS, Constants.INVOKEVIRTUAL));
  il.append(InstructionFactory.createReturn(Type.OBJECT));
  InstructionHandle ih_13 = il.append(InstructionFactory.createLoad(Type.INT, 1));
  il.append(InstructionFactory.createLoad(Type.INT, 2));
    BranchInstruction if_icmpge_15 = InstructionFactory.createBranchInstruction(Constants.IF_ICMPGE, null); // Do if condition (compare a < b)
  il.append(if_icmpge_15);
  il.append(InstructionFactory.createLoad(Type.INT, 2));
  il.append(_factory.createInvoke("java.lang.Integer", "valueOf", new ObjectType("java.lang.Integer"), new Type[] { Type.INT }, Constants.INVOKESTATIC));
  il.append(_factory.createInvoke("java.lang.Integer", "toString", Type.STRING, Type.NO_ARGS, Constants.INVOKEVIRTUAL));
  il.append(InstructionFactory.createReturn(Type.OBJECT));
  InstructionHandle ih_26 = il.append(new PUSH(_cp, "equals")); // Return "equals" string
  il.append(InstructionFactory.createReturn(Type.OBJECT));
  if_icmple_2.setTarget(ih_13);
  if_icmpge_15.setTarget(ih_26);
  method.setMaxStack();
  method.setMaxLocals();
  _cg.addMethod(method.getMethod());
  il.dispose();
}
[1] http://jakarta.apache.org/bcel/

29
[+4] [2014-03-12 16:05:11] magneto12321

Not included in the list so far is this book:

Basics of Compiler Design (Torben Mogensen) [1] (from the dept. of Computer Science, University of Copenhagen)

I'm also interested in learning about compilers and plan to enter that industry in the next couple of years. This book is the ideal theory book to begin learning compilers as far as I can see. It's FREE to copy and reproduce, cleanly and carefully written and gives it to you in plain English without any code but still presents the mechanics by way of instructions and diagrams etc. Worth a look imo.

[1] http://www.diku.dk/~torbenm/Basics/

Added it to the list thanks :) - Anton
30
[+3] [2008-10-01 20:24:42] user9529

The Dragon Book is too complicated. So ignore it as a starting point. It is good and makes you think a lot once you already have a starting point, but for starters, perhaps you should simply try to write an math/logical expression evaluator using RD, LL or LR parsing techniques with everything (lexing/parsing) written by hand in perhaps C/Java. This is interesting in itself and gives you an idea of the problems involved in a compiler. Then you can jump in to your own DSL using some scripting language (since processing text is usually easier in these) and like someone said, generate code in either the scripting language itself or C. You should probably use flex/bison/antlr etc to do the lexing/parsing if you are going to do it in c/java.


I wouldn't say "too complicated", I would say "badly written". - anon
31
[+3] [2009-01-13 22:31:48] Steve A

I have written an online tutorial on compiler design, titled "Let's build a scripting Engine-Compiler, as well as a native code compiler called Bxbasm. The Online doc's are at: http://geocities.com/blunt_axe_basic/tutor/Bxb-Tutor.doc

The docs, support files and compiler, in zip form, are at: http://geocities.com/blunt_axe_basic

Also: http://tech.groups.yahoo.com/group/QDepartment

Steve A.


32
[+3] [2009-07-21 00:20:14] greyfade

I'm surprised it hasn't been mentioned, but Donald Knuth's The Art of Computer Programming was originally penned as a sort of tutorial on compiler writing.

Of course, Dr. Knuth's propensity for going in-depth on topics has led to the compiler-writing tutorial being expanded to an estimated 9 volumes, only three of which have actually been published. It's a rather complete exposition on programming topics, and covers everything you would ever need to know about writing a compiler, in minute detail.


33
[+3] [2010-06-26 23:43:23] Jay

Missing from the list: Garbage Collection: Algorithms for Automatic Dynamic Memory Management, by Jones and Lins.

(Assuming you're writing the compiler and runtime system, and that you're implementing a garbage collected language.


34
[+2] [2008-08-20 11:16:20] Romjin

As an starting point, it will be good to create a recursive descent parser (RDP) (let's say you want to create your own flavour of BASIC and build a BASIC interpreter) to understand how to write a compiler. I found the best information in Herbert Schild's C Power Users, chapter 7. This chapter refers to another book of H. Schildt "C The complete Reference" where he explains how to create a calculator (a simple expression parser). I found both books on eBay very cheap. You can check the code for the book if you go to www.osborne.com or check in www.HerbSchildt.com [1] I found the same code but for C# in his latest book

[1] http://www.HerbSchildt.com

35
[+2] [2011-04-01 19:39:19] Bubbles

The quickest approach is through two books:

1990 version of An Introduction to Compiling Techniques, a First Course using ANSI C, LeX, and YaCC [1] by JP Bennett - a perfect balance of example code, parsing theory and design- it contains a complete compiler written in C, lex and yacc for a simple grammar

Dragon Book [2] (older version) - mostly a detailed reference for the features not covered in the former book

[1] http://rads.stackoverflow.com/amzn/click/007709221X
[2] http://rads.stackoverflow.com/amzn/click/0201000229

36
[+1] [2009-02-05 02:55:28] Anru

If you are like me, who has no formal computer science education, and is interested in building/want to know how a compiler works:

I am recommend "Programming Language Processors in Java: Compilers and Interpreters", an amazing book for a self-taught computer programmer.

From my point of view, understanding those basic language theory, automate machine, and set theory is not a big problem. The problem is how to turn those things into code. The above book tells you how to write a parser, analysis context, and generate code. If you can not understand this book, then I have to say, give up building a compiler. The book is best programming book I have ever read.

There is an other book, also good, Compiler Design in C. There is a lot of code, and it tells you everything about how to build a compiler and lexer tools.

Building a compiler is a fun programming practice and can teach you heaps of programming skills.

Do not buy the Dragon book [1]. It was a waste of money and time and is not for a practitioner.

[1] http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools#Second_edition

37
[+1] [2009-07-21 00:42:47] Mike Dunlavey

Whenever I want to try out a new language idea, I just write a simple parser, and have it generate some language that's easy to get good compilers for, like C.

How do you think C++ was done?


38
[+1] [2014-04-02 12:39:16] uliwitness

If you're not just looking for books, but also interested in web sites that have articles on the topic, I've blogged about various aspects of creating a programming language. Most of the posts can be found in my blog's "Language Design" category [1].

In particular, I cover generating Intel machine code manually, automatically generating machine- or bytecode, creating a bytecode interpreter, writing an object-oriented runtime, creating a simple loader, and writing a simple mark/sweep garbage collector. All of this in a very practical and pragmatic way instead of boring you with lots of theory.

Would appreciate feedback on these.

[1] http://web.archive.org/web/20161024215644/http://orangejuiceliberationfront.com/category/language-design/

39
[0] [2009-07-20 23:01:42] soulmerge
  • Start by making sure you can answer most of the questions tagged C++ here on Stack Overflow.
  • After that, you should make sure you understand how other compilers work and understand [parts of] their source code.
  • You'll notice you need assembler and will start learning assembler until you can answer many questions with that tag.
  • If you've come this far, you'll find that several years have passed and realize how big such a project is and possibly smile at your own question from back then (if this page still exists at that time) ...

(5) Not to be rude but, it sounds like you probably haven't written a simple compiler. - mrduclaw
(4) Yes, I am currently working on my second simple language, in fact. I know assembler basics, I have used lex, yacc & bison, I know C++, I know a bit about the compilation of C++ and I messed with the inner workings of the PHP interpreter. I would say I have at least an understanding of how complex compilers/interpreters are. - soulmerge
saying it that way clearly shows you don't :') - Theophile Dano
40