Preferred languages: C/C++, Java, and Ruby.
I am looking for some helpful books/tutorials on how to write your own compiler simply for educational purposes. I am most familiar with C/C++, Java, and Ruby, so I prefer resources that involve one of those three, but any good resource is acceptable.
Legend:
Let's Build a Compiler
[compilers.iecc.com/crenshaw/] series, it is really nice writeup and is a good starting point. - TheVillageIdiot
This is a pretty vague question, I think; just because of the depth of the topic involved. A compiler can be decomposed into two separate parts, however; a top-half and a bottom-one. The top-half generally takes the source language and converts it into an intermediate representation, and the bottom half takes care of the platform specific code generation.
Nonetheless, one idea for an easy way to approach this topic (the one we used in my compilers class, at least) is to build the compiler in the two pieces described above. Specifically, you'll get a good idea of the entire process by just building the top-half.
Just doing the top half lets you get the experience of writing the lexical analyzer and the parser and go to generating some "code" (that intermediate representation I mentioned). So it will take your source program and convert it to another representation and do some optimization (if you want), which is the heart of a compiler. The bottom half will then take that intermediate representation and generate the bytes needed to run the program on a specific architecture. For example, the the bottom half will take your intermediate representation and generate a PE executable.
Some books on this topic that I found particularly helpful was Compilers Principles and Techniques [1] (or the Dragon Book, due to the cute dragon on the cover). It's got some great theory and definitely covers Context-Free Grammars in a really accessible manner. Also, for building the lexical analyzer and parser, you'll probably use the *nix tools lex and yacc. And uninterestingly enough, the book called " lex and yacc [2]" picked up where the Dragon Book left off for this part.
[1] https://rads.stackoverflow.com/amzn/click/com/0321486811I think Modern Compiler Implementation in ML [1] is the best introductory compiler writing text. There's a Java version [2] and a C version [3] too, either of which might be more accessible given your languages background. The book packs a lot of useful basic material (scanning and parsing, semantic analysis, activation records, instruction selection, RISC and x86 native code generation) and various "advanced" topics (compiling OO and functional languages, polymorphism, garbage collection, optimization and single static assignment form) into relatively little space (~500 pages).
I prefer Modern Compiler Implementation to the Dragon book because Modern Compiler implementation surveys less of the field--instead it has really solid coverage of all the topics you would need to write a serious, decent compiler. After you work through this book you'll be ready to tackle research papers directly for more depth if you need it.
I must confess I have a serious soft spot for Niklaus Wirth's Compiler Construction. [4] It is available online [5] as a PDF. I find Wirth's programming aesthetic simply beautiful, however some people find his style too minimal (for example Wirth favors recursive descent parsers, but most CS courses focus on parser generator tools; Wirth's language designs are fairly conservative.) Compiler Construction is a very succinct distillation of Wirth's basic ideas, so whether you like his style or not or not, I highly recommend reading this book.
[1] https://rads.stackoverflow.com/amzn/click/com/0521607647I concur with the Dragon Book reference; IMO, it is the definitive guide to compiler construction. Get ready for some hardcore theory, though.
If you want a book that is lighter on theory, Game Scripting Mastery [1] might be a better book for you. If you are a total newbie at compiler theory, it provides a gentler introduction. It doesn't cover more practical parsing methods (opting for non-predictive recursive descent without discussing LL or LR parsing), and as I recall, it doesn't even discuss any sort of optimization theory. Plus, instead of compiling to machine code, it compiles to a bytecode that is supposed to run on a VM that you also write.
It's still a decent read, particularly if you can pick it up for cheap on Amazon. If you only want an easy introduction into compilers, Game Scripting Mastery is not a bad way to go. If you want to go hardcore up front, then you should settle for nothing less than the Dragon Book.
[1] https://rads.stackoverflow.com/amzn/click/com/1931841578"Let's Build a Compiler" [1] is awesome, but it's a bit outdated. (I'm not saying it makes it even a little bit less valid.)
Or check out SLANG [2]. This is similar to "Let's Build a Compiler" but is a much better resource especially for beginners. This comes with a pdf tutorial which takes a 7 step approach at teaching you a compiler. Adding the quora link as it have the links to all the various ports of SLANG, in C++, Java and JS, also interpreters in python and java, originally written using C# and the .NET platform.
[1] http://compilers.iecc.com/crenshaw/If you're looking to use powerful, higher level tools rather than building everything yourself, going through the projects and readings for this course [1] is a pretty good option. It's a languages course by the author of the Java parser engine ANTLR. You can get the book for the course as a PDF from the Pragmatic Programmers [2].
The course goes over the standard compiler compiler stuff that you'd see elsewhere: parsing, types and type checking, polymorphism, symbol tables, and code generation. Pretty much the only thing that isn't covered is optimizations. The final project is a program that compiles a subset of C [3]. Because you use tools like ANTLR and LLVM, it's feasible to write the entire compiler in a single day (I have an existence proof of this, though I do mean ~24 hours). It's heavy on practical engineering using modern tools, a bit lighter on theory.
LLVM, by the way, is simply fantastic. Many situations where you might normally compile down to assembly, you'd be much better off compiling to LLVM's Intermediate Representation [4] instead. It's higher level, cross platform, and LLVM is quite good at generating optimized assembly from it.
[1] http://www.antlr.org/wiki/display/CS652/CS652+HomeIf you have little time, I recommend Niklaus Wirth's "Compiler Construction" (Addison-Wesley. 1996) [1], a tiny little booklet that you can read in a day, but it explains the basics (including how to implement lexers, recursive descent parsers, and your own stack-based virtual machines). After that, if you want a deep dive, there's no way around the Dragon book as other commenters suggest.
[1] http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdfYou might want to look into Lex/Yacc (or Flex/Bison, whatever you want to call them). Flex is a lexical analyzer, which will parse and identify the semantic components ("tokens") of your language, and Bison will be used to define what happens when each token is parsed. This could be, but is definitely not limited to, printing out C code, for a compiler that would compile to C, or dynamically running the instructions.
This FAQ [1] should help you, and this tutorial [2] looks quite useful.
[1] http://tldp.org/HOWTO/Lex-YACC-HOWTO-5.htmlGenerally speaking, there's no five minutes tutorial for compilers, because it's a complicated topic and writing a compiler can take months. You will have to do your own search.
Python and Ruby are usually interpreted. Perhaps you want to start with an interpreter as well. It's generally easier.
The first step is to write a formal language description, the grammar of your programming language. Then you have to transform the source code that you want to compile or interpret according to the grammar into an abstract syntax tree, an internal form of the source code that the computer understands and can operate on. This step is usually called parsing and the software that parses the source code is called a parser. Often the parser is generated by a parser generator which transform a formal grammar into source oder machine code. For a good, non-mathematical explanation of parsing I recommend Parsing Techniques - A Practical Guide. Wikipedia has a comparison of parser generators from which you can choose that one that is suitable for you. Depending on the parser generator you chose, you will find tutorials on the Internet and for really popular parser generators (like GNU bison) there are also books.
Writing a parser for your language can be really hard, but this depends on your grammar. So I suggest to keep your grammar simple (unlike C++); a good example for this is LISP.
In the second step the abstract syntax tree is transformed from a tree structure into a linear intermediate representation. As a good example for this Lua's bytecode is often cited. But the intermediate representation really depends on your language.
If you are building an interpreter, you will simply have to interpret the intermediate representation. You could also just-in-time-compile it. I recommend LLVM and libjit for just-in-time-compilation. To make the language usable you will also have to include some input and output functions and perhaps a small standard library.
If you are going to compile the language, it will be more complicated. You will have to write backends for different computer architectures and generate machine code from the intermediate representation in those backends. I recommend LLVM for this task.
There are a few books on this topic, but I can recommend none of them for general use. Most of them are too academic or too practical. There's no "Teach yourself compiler writing in 21 days" and thus, you will have to buy several books to get a good understanding of this entire topic. If you search the Internet, you will come across some some online books and lecture notes. Maybe there's a university library nearby you where you can borrow books on compilers.
I also recommend a good background knowledge in theoretical computer science and graph theory, if you are going to make your project serious. A degree in computer science will also be helpful.
Take a look at the book below. The author is the creator of ANTLR [1].
Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages [2].
[1] http://www.antlr.org/One book not yet suggested but very important is "Linkers and Loaders" [1] by John Levine. If you're not using an external assembler, you'll need a way to output a object file that can be linked into your final program. Even if you're using an external assembler, you'll probably need to understand relocations and how the whole program loading process works to make a working tool. This book collects a lot of the random lore around this process for various systems, including Win32 and Linux.
[1] http://books.google.com/books?id=h34d_jr2iikC&dq=%22linkers+and+loaders%22&pg=PP1&ots=IxFkFWJ-8V&sig=GSlclmkezTRL6YYguGJmZsnkM3c&hl=en&sa=X&oi=book_result&resnum=1&ct=resultIf you're willing to use LLVM, check this out: http://llvm.org/docs/tutorial/. It teaches you how to write a compiler from scratch using LLVM's framework, and doesn't assume you have any knowledge about the subject.
The tutorial suggest you write your own parser and lexer etc, but I advise you to look into bison and flex once you get the idea. They make life so much easier.
The Dragon Book [1] is definitely the "building compilers" book, but if your language isn't quite as complicated as the current generation of languages, you may want to look at the Interpreter pattern from Design Patterns [2].
The example in the book designs a regular expression-like language and is well thought through, but as they say in the book, it's good for thinking through the process but is effective really only on small languages. However, it is much faster to write an Interpreter for a small language with this pattern than having to learn about all the different types of parsers, yacc and lex, et cetera...
[1] https://rads.stackoverflow.com/amzn/click/com/0321486811The LCC compiler ( wikipedia [1]) ( project homepage [2]) ( github.com/drh/lcc [3]) of Fraser and Hanson is described in their book "A Retargetable C Compiler: Design and Implementation". It is quite readable and explains the whole compiler, down to code generation.
[1] http://en.wikipedia.org/wiki/Local_C_compilerI found the Dragon book much too hard to read with too much focus on language theory that is not really required to write a compiler in practice.
I would add the Oberon [1] book which contains the full source of an amazingly fast and simple Oberon compiler Project Oberon [2].
[1] http://en.wikipedia.org/wiki/Oberon_%28programming_language%29I am looking into the same concept, and found this promising article by Joel Pobar,
Create a Language Compiler for the .NET Framework - not sure where this has gone [1]
Create a Language Compiler for the .NET Framework - pdf copy of the original doc [2]
he discusses a high level concept of a compiler and proceeds to invent his own langauge for the .Net framework. Although its aimed at the .Net Framework, many of the concepts should be able to be reproduced. The Article covers:
there are other topics, but you get the just.
Its aimed to people starting out, written in C# (not quite Java)
HTH
bones
[1] http://msdn.microsoft.com/en-us/magazine/cc136756.aspxI remember asking this question about seven years ago when I was rather new to programming.
I was very careful when I asked and surprisingly I didn't get as much criticism as you are getting here. They did however point me in the direction of the " Dragon Book [1]" which is in my opinion, a really great book that explains everything you need to know to write a compiler (you will of course have to master a language or two. The more languages you know, the merrier.).
And yes, many people say reading that book is crazy and you won't learn anything from it, but I disagree completely with that.
Many people also say that writing compilers is stupid and pointless. Well, there are a number of reasons why compiler development are useful:
I didn't write my own compiler right away, but after asking I knew where to start. And now, after learning many different languages and reading the Dragon Book, writing isn't that much of a problem. (I'm also studying computer engineering atm, but most of what I know about programming is self taught.)
In conclusion, The Dragon Book is a great "tutorial". But spend some time mastering a language or two before attempting to write a compiler. Don't expect to be a compiler guru within the next decade or so though.
The book is also good if you want to learn how to write parsers/interpreters.
[1] http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools"... Let's Build a Compiler ..."
I'd second http://compilers.iecc.com/crenshaw/ by @sasb [1]. Forget buying more books for the moment.
Why? Tools & language.
The language required is Pascal and if I remember correctly is based on Turbo-Pascal. It just so happens if you go to http://www.freepascal.org/ and download the Pascal compiler all the examples work straight from the page ~ http://www.freepascal.org/download.var The beaut thing about Free Pascal is you can use it almost whatever processor or OS you can care for.
Once you have mastered the lessons then try the more advanced " Dragon Book [2]" ~ http://en.wikipedia.org/wiki/Dragon_book
[1] https://stackoverflow.com/questions/1669/learning-to-write-a-compiler#1678An easy way to create a compiler is to use bison and flex (or similar), build a tree (AST) and generate code in C. With generating C code being the most important step. By generating C code, your language will automatically work on all platforms that have a C compiler.
Generating C code is as easy as generating HTML (just use print, or equivalent), which in turn is much easier than writing a C parser or HTML parser.
You should check out Darius Bacon's " ichbins [1]", which is a compiler for a small Lisp dialect, targeting C, in just over 6 pages of code. The advantage it has over most toy compilers is that the language is complete enough that the compiler is written in it. (The tarball also includes an interpreter to bootstrap the thing.)
There's more stuff about what I found useful in learning to write a compiler on my Ur-Scheme [2] web page.
[1] https://github.com/darius/ichbinsFrom the comp.compilers FAQ [1]:
"Programming a Personal Computer" by Per Brinch Hansen Prentice-Hall 1982 ISBN 0-13-730283-5
This unfortunately-titled book explains the design and creation of a single-user programming environment for micros, using a Pascal-like language called Edison. The author presents all source code and explanations for the step-by-step implementation of an Edison compiler and simple supporting operating system, all written in Edison itself (except for a small supporting kernel written in a symbolic assembler for PDP 11/23; the complete source can also be ordered for the IBM PC).
The most interesting things about this book are: 1) its ability to demonstrate how to create a complete, self-contained, self-maintaining, useful compiler and operating system, and 2) the interesting discussion of language design and specification problems and trade-offs in Chapter 2.
"Brinch Hansen on Pascal Compilers" by Per Brinch Hansen Prentice-Hall 1985 ISBN 0-13-083098-4
Another light-on-theory heavy-on-pragmatics here's-how-to-code-it book. The author presents the design, implementation, and complete source code for a compiler and p-code interpreter for Pascal- (Pascal "minus"), a Pascal subset with boolean and integer types (but no characters, reals, subranged or enumerated types), constant and variable definitions and array and record types (but no packed, variant, set, pointer, nameless, renamed, or file types), expressions, assignment statements, nested procedure definitions with value and variable parameters, if statements, while statements, and begin-end blocks (but no function definitions, procedural parameters, goto statements and labels, case statements, repeat statements, for statements, and with statements).
The compiler and interpreter are written in Pascal* (Pascal "star"), a Pascal subset extended with some Edison-style features for creating software development systems. A Pascal* compiler for the IBM PC is sold by the author, but it's easy to port the book's Pascal- compiler to any convenient Pascal platform.
This book makes the design and implementation of a compiler look easy. I particularly like the way the author is concerned with quality, reliability, and testing. The compiler and interpreter can easily be used as the basis for a more involved language or compiler project, especially if you're pressed to quickly get something up and running.
[1] http://www.faqs.org/faqs/compilers/faq/Python comes bundled with a python compiler written in Python. You can see the source code, and it includes all phases, from parsing, abstract syntax tree, emitting code, etc. Hack it.
Sorry, it is in Spanish, but this is the bibliography of a course called "Compiladores e Intérpretes" (Compilers and Interpreters) in Argentina.
The course was from formal language theory to compiler construction, and these are the topics you need to build, at least, a simple compiler:
Compilers Design in C.
Allen I. Holub
Prentice-Hall. 1990.Compiladores. Teoría y Construcción.
Sanchís Llorca, F.J. , Galán Pascual, C. Editorial Paraninfo. 1988.Compiler Construction.
Niklaus Wirth
Addison-Wesley. 1996.Lenguajes, Gramáticas y Autómatas. Un enfoque práctico.
Pedro Isasi Viñuela, Paloma Martínez Fernández, Daniel Borrajo Millán. Addison-Wesley Iberoamericana (España). 1997.The art of compiler design. Theory and practice.
Thomas Pittman, James Peters.
Prentice-Hall. 1992.Object-Oriented Compiler Construction.
Jim Holmes.
Prentice Hall, Englewood Cliffs, N.J. 1995Compiladores. Conceptos Fundamentales.
B. Teufel, S. Schmidt, T. Teufel.
Addison-Wesley Iberoamericana. 1995.Introduction to Automata Theory, Languages, and Computation.
John E. Hopcroft. Jeffref D. Ullman.
Addison-Wesley. 1979.Introduction to formal languages.
György E. Révész.
Mc Graw Hill. 1983.Parsing Techniques. A Practical Guide.
Dick Grune, Ceriel Jacobs.
Impreso por los autores. 1995
http://www.cs.vu.nl/~dick/PTAPG.htmlYacc: Yet Another Compiler-Compiler.
Stephen C. Johnson
Computing Science Technical Report Nº 32, 1975. Bell Laboratories. Murray Hill, New
Jersey.Lex: A Lexical Analyzer Generator.
M. E. Lesk, E. Schmidt. Computing Science Technical Report Nº 39, 1975. Bell Laboratories. Murray Hill, New Jersey.lex & yacc.
John R. Levine, Tony Mason, Doug Brown.
O’Reilly & Associates. 1995.Elements of the theory of computation.
Harry R. Lewis, Christos H. Papadimitriou. Segunda Edición. Prentice Hall. 1998.Un Algoritmo Eficiente para la Construcción del Grafo de Dependencia de Control.
Salvador V. Cavadini.
Trabajo Final de Grado para obtener el Título de Ingeniero en Computación.
Facultad de Matemática Aplicada. U.C.S.E. 2001.
There's a lot of good answers here, so i thought I'd just add one more to the list:
I got a book called Project Oberon more than a decade ago, which has some very well written text on the compiler. The book really stands out in the sense that the source and explanations is very hands on and readable. The complete text (the 2005 edition) has been made available in pdf, so you can download right now. The compiler is discussed in chapter 12:
http://www.ethoberon.ethz.ch/WirthPubl/ProjectOberon.pdf
Niklaus Wirth, Jürg Gutknecht
(The treatment is not as extensive as his book on compilers)
I've read several books on compilers, and i can second the dragon book, time spent on this book is very worthwhile.
Edit:
As of July 2024, the original link no longer seems to work. However, the 1st and 2nd editions (2005 and 2013, resp.)) can still be obtained through
https://people.inf.ethz.ch/wirth/ProjectOberon1992.pdf https://people.inf.ethz.ch/wirth/ProjectOberon/index.html
or the Prof. Wirth's university page:
https://people.inf.ethz.ch/wirth/
(Note that Prof. Wirth has deceased in January 2024, so the address might change again or be removed altogether).
Not a book, but a technical paper and an enormously fun learning experience if you want to know more about compilers (and metacompilers)... This website walks you through building a completely self-contained compiler system that can compile itself and other languages:
Tutorial: Metacompilers Part 1 [1]
This is all based on an amazing little 10-page technical paper:
Val Schorre META II: A Syntax-Oriented Compiler Writing Language
from honest-to-god 1964. I learned how to build compilers from this back in 1970. There's a mind-blowing moment when you finally grok how the compiler can regenerate itself....
I know the website author from my college days, but I have nothing to do with the website.
[1] http://www.bayfronttechnologies.com/mc_tutorial.htmlI liked the Crenshaw tutorial [1] too, because it makes it absolutely clear that a compiler is just another program that reads some input and writes some out put.
Read it.
Work it if you want, but then look at another reference on how bigger and more complete compilers are really written.
And read On Trusting Trust [2], to get a clue about the unobvious things that can be done in this domain.
[1] http://compilers.iecc.com/crenshaw/If you are interested in writing a compiler for a functional language (rather than a procedural one) Simon Peyton-Jones and David Lester's " Implementing functional languages: a tutorial [1]" is an excellent guide.
The conceptual basics of how functional evaluation works is guided by examples in a simple but powerful functional language called "Core". Additionally, each part of the Core language compiler is explained with code examples in Miranda (a pure functional language very similar to Haskell).
Several different types of compilers are described but even if you only follow the so-called template compiler for Core you will have an excellent understanding of what makes functional programming tick.
[1] http://research.microsoft.com/~simonpj/papers/pj-lester-book/You can use BCEL [1] by the Apache Software Foundation. With this tool you can generate assembler-like code, but it's Java with the BCEL API. You can learn how you can generate intermediate language code (in this case byte code).
Create a Java class with this function:
public String maxAsString(int a, int b) {
if (a > b) {
return Integer.valueOf(a).toString();
} else if (a < b) {
return Integer.valueOf(b).toString();
} else {
return "equals";
}
}
Now run BCELifier with this class
BCELifier bcelifier = new BCELifier("MyClass", System.out);
bcelifier.start();
You can see the result on the console for the whole class (how to build byte code MyClass.java). The code for the function is this:
private void createMethod_1() {
InstructionList il = new InstructionList();
MethodGen method = new MethodGen(ACC_PUBLIC, Type.STRING, new Type[] { Type.INT, Type.INT }, new String[] { "arg0", "arg1" }, "maxAsString", "MyClass", il, _cp);
il.append(InstructionFactory.createLoad(Type.INT, 1)); // Load first parameter to address 1
il.append(InstructionFactory.createLoad(Type.INT, 2)); // Load second parameter to adress 2
BranchInstruction if_icmple_2 = InstructionFactory.createBranchInstruction(Constants.IF_ICMPLE, null); // Do if condition (compare a > b)
il.append(if_icmple_2);
il.append(InstructionFactory.createLoad(Type.INT, 1)); // Load value from address 1 into the stack
il.append(_factory.createInvoke("java.lang.Integer", "valueOf", new ObjectType("java.lang.Integer"), new Type[] { Type.INT }, Constants.INVOKESTATIC));
il.append(_factory.createInvoke("java.lang.Integer", "toString", Type.STRING, Type.NO_ARGS, Constants.INVOKEVIRTUAL));
il.append(InstructionFactory.createReturn(Type.OBJECT));
InstructionHandle ih_13 = il.append(InstructionFactory.createLoad(Type.INT, 1));
il.append(InstructionFactory.createLoad(Type.INT, 2));
BranchInstruction if_icmpge_15 = InstructionFactory.createBranchInstruction(Constants.IF_ICMPGE, null); // Do if condition (compare a < b)
il.append(if_icmpge_15);
il.append(InstructionFactory.createLoad(Type.INT, 2));
il.append(_factory.createInvoke("java.lang.Integer", "valueOf", new ObjectType("java.lang.Integer"), new Type[] { Type.INT }, Constants.INVOKESTATIC));
il.append(_factory.createInvoke("java.lang.Integer", "toString", Type.STRING, Type.NO_ARGS, Constants.INVOKEVIRTUAL));
il.append(InstructionFactory.createReturn(Type.OBJECT));
InstructionHandle ih_26 = il.append(new PUSH(_cp, "equals")); // Return "equals" string
il.append(InstructionFactory.createReturn(Type.OBJECT));
if_icmple_2.setTarget(ih_13);
if_icmpge_15.setTarget(ih_26);
method.setMaxStack();
method.setMaxLocals();
_cg.addMethod(method.getMethod());
il.dispose();
}
[1] http://jakarta.apache.org/bcel/I'm surprised it hasn't been mentioned, but Donald Knuth's The Art of Computer Programming was originally penned as a sort of tutorial on compiler writing.
Of course, Dr. Knuth's propensity for going in-depth on topics has led to the compiler-writing tutorial being expanded to an estimated 9 volumes, only three of which have actually been published. It's a rather complete exposition on programming topics, and covers everything you would ever need to know about writing a compiler, in minute detail.
Not included in the list so far is this book:
Basics of Compiler Design (Torben Mogensen) [1] (from the dept. of Computer Science, University of Copenhagen)
I'm also interested in learning about compilers and plan to enter that industry in the next couple of years. This book is the ideal theory book to begin learning compilers as far as I can see. It's FREE to copy and reproduce, cleanly and carefully written and gives it to you in plain English without any code but still presents the mechanics by way of instructions and diagrams etc. Worth a look imo.
[1] http://www.diku.dk/~torbenm/Basics/As an starting point, it will be good to create a recursive descent parser (RDP) (let's say you want to create your own flavour of BASIC and build a BASIC interpreter) to understand how to write a compiler. I found the best information in Herbert Schild's C Power Users, chapter 7. This chapter refers to another book of H. Schildt "C The complete Reference" where he explains how to create a calculator (a simple expression parser). I found both books on eBay very cheap. You can check the code for the book if you go to www.osborne.com or check in www.HerbSchildt.com [1] I found the same code but for C# in his latest book
[1] http://www.HerbSchildt.comThe Dragon Book is too complicated. So ignore it as a starting point. It is good and makes you think a lot once you already have a starting point, but for starters, perhaps you should simply try to write an math/logical expression evaluator using RD, LL or LR parsing techniques with everything (lexing/parsing) written by hand in perhaps C/Java. This is interesting in itself and gives you an idea of the problems involved in a compiler. Then you can jump in to your own DSL using some scripting language (since processing text is usually easier in these) and like someone said, generate code in either the scripting language itself or C. You should probably use flex/bison/antlr etc to do the lexing/parsing if you are going to do it in c/java.
Missing from the list: Garbage Collection: Algorithms for Automatic Dynamic Memory Management, by Jones and Lins.
(Assuming you're writing the compiler and runtime system, and that you're implementing a garbage collected language.
The quickest approach is through two books:
1990 version of An Introduction to Compiling Techniques, a First Course using ANSI C, LeX, and YaCC [1] by JP Bennett - a perfect balance of example code, parsing theory and design- it contains a complete compiler written in C, lex and yacc for a simple grammar
Dragon Book [2] (older version) - mostly a detailed reference for the features not covered in the former book
[1] https://rads.stackoverflow.com/amzn/click/com/007709221XIf you're not just looking for books, but also interested in web sites that have articles on the topic, I've blogged about various aspects of creating a programming language. Most of the posts can be found in my blog's "Language Design" category [1].
In particular, I cover generating Intel machine code manually, automatically generating machine- or bytecode, creating a bytecode interpreter, writing an object-oriented runtime, creating a simple loader, and writing a simple mark/sweep garbage collector. All of this in a very practical and pragmatic way instead of boring you with lots of theory.
Would appreciate feedback on these.
[1] http://web.archive.org/web/20161024215644/http://orangejuiceliberationfront.com/category/language-design/If you are like me, who has no formal computer science education, and is interested in building/want to know how a compiler works:
I am recommend "Programming Language Processors in Java: Compilers and Interpreters", an amazing book for a self-taught computer programmer.
From my point of view, understanding those basic language theory, automate machine, and set theory is not a big problem. The problem is how to turn those things into code. The above book tells you how to write a parser, analysis context, and generate code. If you can not understand this book, then I have to say, give up building a compiler. The book is best programming book I have ever read.
There is an other book, also good, Compiler Design in C. There is a lot of code, and it tells you everything about how to build a compiler and lexer tools.
Building a compiler is a fun programming practice and can teach you heaps of programming skills.
Do not buy the Dragon book [1]. It was a waste of money and time and is not for a practitioner.
[1] http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools#Second_editionWhenever I want to try out a new language idea, I just write a simple parser, and have it generate some language that's easy to get good compilers for, like C.
How do you think C++ was done?