Question

I've just finished reading " Coders at works ^[1]", a brilliant book by Peter Seibel with 15 interviews of some of the most interesting computer programmers alive today.
Well, many of the interviewees have (co)invented\implemented a new programming language.
Some examples:

Joe Armstrong: Inventor of Erlang
L. Peter Deutsch: implementer of Smalltalk-80
Brendan Eich: Inventor of JavaScript
Dan Ingalls: Smalltalk implementor and designer
Simon Peyton Jones: Co-inventor of Haskell
Guy Steele: Co-inventor of Scheme

It's without a doubt that their minds have something special and unreachable, and I'm not crazy to think I will ever able to create a new language; I'm just interested in this topic.

So, imagine a funny/grotesque scenario where your crazy boss one day comes to your desk and says, "I want a new programming language with my name on it. Take the time you need and do it."

Which is the right approach to studying this fascinating/intimidating/magical topic?

What kind of knowledge do you need to model, design and implement a brand new programming language?

[1] http://rads.stackoverflow.com/amzn/click/1430219483

Answer 1

I too think that writing programming languages shouldn't be something you take on as an attempt to create the next Erlang or JavaScript. Yet, I find it to be a great exercise for the mind and once you start thinking about languages a lot, you find that:

You start realizing what's wrong with existing languages.
You discover what's great with existing languages.
It becomes even easier to learn new ones.

For my own part I've implemented a subset of JavaScript with a few improvements and also another language with a single datatype (the bit) and a single operator (nand) for proving low-level ideas, as well as a couple of DSLs for templating and more specific code generation.

So, on to what I think you should read about:

Lexer: Transforms a stream of characters to a stream of tokens ("class", "int", "{" and "++" are typical tokens)
- Requires knowledge in: Regular expressions.
- Implementations: There are lexer generators in most languages; Lex for C, Alex for Haskell, ANTLR or JLex for Java, GPLEX for C#, etc.
Parser: Transforms a stream of tokens into an abstract syntax tree (AST).
- Requires knowledge in: ASTs, finite state machines, context-free grammars, Backus-Naur Form.
- Implementations: Just like with lexers, there are parser generators in most languages: Yacc for C, Happy for Haskell, ANTLR or CUP for Java, GPPG for C#, etc.
Generator: Transforms an AST describing your language to an AST of the target language.
- Requires knowledge in: The target language and/or platform (assembler, C, JVM, CLI or another of your favourites).
- Implementations: Compared to the other steps, this is where you'll have to do alot yourself. Get started by looking at what other people did. CoffeeScript (http://jashkenas.github.com/coffee-script) would be my suggestion, it's both cool and quite simple.
Optimization: Transforms the AST you produced into a more optimized one
- Requires knowledge in: Once again, a fair amount of understanding of the target platform. There are lots of algorithms to wrap your head around, depending on how deep in you want to go.

There's a nice tool called BNFC (Backus-Naur Form Converter) developed at my university that can give you a kickstart into the lexer and parser parts. If you'd like to get something up and working quickly I'd recommend it very much. It's a little more limited that using lexer/parser generators directly, but very productive. You'll find it here: http://www.cse.chalmers.se/research/group/Language-technology/BNFC

All of this aside, you should of course learn as many radically different programming languages as you possibly can. All of C, Erlang, Haskell, Prolog, Lisp and Javascript have contributed greatly to widening my own ideas of programming languages. Unless you know them all, pick one and start hacking ;)

And oh, my main advice would probably be to focus on writing lots of imaginary code in your own not-yet-existing language before hacking away too much with an implementation. This works like a spec; it forces you to think about what the language should do, why and how it will feel to use it.

By the way, I too finished reading Coders at Work a couple of days ago. It was a really great read, I'd recommend it to anyone!

Answer 2

A programming language is about a set of abstractions that you use to express a meaningful program. The question is: what abstractions should the language provide? I don't think it is about compiler, lexer and parser.

The beauty of a language comes form the "less is more": a relatively small set of abstraction should be able to be combined to program great libraries, framework and ultimately end-user programs. (Individual abstractions sometimes don't fit with each other, or may even conflict. You will need to decide carefully what goes in or not.)

As C.A.R. Hoare indicates in his famous paper "Hints on Programming Language Design" there are two views on language design:

Part of language design consists of innovation. This activity leads to new language features in isolation.
The most difficult part of language design lies in integration: selecting a limited set of language features and polishing them until the result is a consistent simple framework that has no more rough edges.

1. What will make your language special?

You need to have a vision about your programming language and what it should do. What are the strength and weakness of your language? In which area do you want it to shine (there should be at least one that is the main driving force of your initiative)?

Here is short list of driving forces to consider:

Simplicity - A language with few abstraction is easy to grasp, but may be limited, e.g. don't expect to do pattern matching in Smalltalk. On the other hand, too many abstractions kills it as well.
Modularity - How do you deal with modular development, name clashes, isolation of components, etc.?
Composability - How do the programming language favors/impede composition? E.g. pure functions can naturally be composed, while object can't. Transactions (with transactional memory) can be composed easily, while locks can't.
Safety - How safe is the programming language abstractions? Can you break encapsulation in some way, e.g. if you have meta-programming facilities? Can you provide safety guarantee, e.g. with a type system?
Expressiveness - How easy is it to use the abstraction to express solution to some problem? Does expressiveness conflict with readability?
etc.

2. Abstractions in your language

When you have a vision about your language, then you can start shaping the abstractions that will support it. For instance, Scala's vision was "Let's blend function and OO" so they designed abstractions such as case class. NewSpeak's vision was "Let's make modules a first-class abstraction", so they pushed the concept of nested class to the extreme.

There have been a lot of abstractions proposed to design programming language. To design a new language, you should know a lot of them and decide how your programming language will compare against others. (Read the ECOOP or OOPSLA papers of the last decade and you will get an overview :) Here are few:

Object
Class
Function
Trait
Type system
Scoping/modularity abstractions
Extension mechanism (e.g. open class, extension method)
Security mechanism (e.g. class sealing, final)
State manipulation abstraction (mutation, freezing, immutability, transactions)
Pattern matching abstractions
Exception handling abstractions
Representation independence abstractions (e.g. properties/slot)
Meta-programming abstractions
and a lot more to come ...

3. What you need to design a programming language

To create a programming language, you probably need (1) a vision and (2) a set of tools to implement and experiment with your design, e.g. parser generators (3) formal background for certain area such as type system.

But more important that everything else, I guess you need hard work and passion :)

EDIT: I've added C.A.R. Hoare quote at the beginning.

Answer 3

For starters you would need to know exactly what it should be able to do differently (or better) (and why) then all the languages which are available today.

Some of our best tools available today arguably grew out of frustration with the tools at the time they were conceived / invented.

Answer 4

What kind of knowledge do you need to model, design and implement a brand new programming language?

If you just want to try out a new language idea, you need some knowledge of

Abstract syntax
Concrete syntax and parsing
Formal semantics, typically operational semantics or denotational semantics
Definitional interpreters

And depending on your language idea you may also need to know something about

Static type systems

These topics are covered at an appropriate level of detail in Friedman and Wand's book Essentials of Programming Languages ^[1]. It's a good book. For somebody designing and building a new language, it's much better than the Dragon book, because it's about languages, not compilers.

It will also help you enormously to be familiar with a wide spectrum of existing languages, so that instead of having to reinvent everything, you can steal the parts that have been done before, thereby making it easier to focus your own energy on whatever makes your language new and special. Remember: talent imitates; genius steals. You want to steal from the very best designs, and that means you have to know something about them.

If you write your definitional interpreter in a language like Haskell or ML, you can build something interesting very quickly. I especially recommend Haskell because you can take advantage of "parsing combinators" to deal with concrete syntax—very civilized.

If, on evaluating your idea, you decide you want your implementation to run at native-code speeds, you have a whole host of other stuff to learn. But Icon, Lua, Perl, Python, Ruby, and UCSD Pascal all enjoyed considerable success in their day without necessarily having native-code compilers.

[1] http://eopl3.com/

Answer 5

Knowing more than one existing programming language would be a good start. Even if some of them aren't programming languages per se, the different ways that they do things would be helpful for deciding what you do/do not want your language to do.

Answer 6

You need to know that your language is almost certainly doomed to obscurity and failure. I would guess that 99% of programming languages are never used by anyone except their author. If you can live with this, developing one is fun. Speaking as an author of several (doomed, obscure) myself.

Answer 7

I think you should learn more mainstream programming languages before you make your own. You should try to understand code snippets written in programming language that you did not learn. (if you learned C++, you should be able to understand Java code without learning Java)

Programming language design knowledge is very important. You must know what is the point of making (and using) a programming language. (left as an exercise to the reader, hint: why we don't program Assembly, why we are not Real Programmers?)

(note: the key topics mentioned are in bold, you should Google them for tutorials)

After you gathered ideas, then learn how to parse. Regular languages and formal language theory are musts. Also, learn about lexers such as lex and also learn how to tokenize without a lexer. A tokenizer splits a code to labeled chunks from

function factorial(n) {
  if (n == 0) { return 1; }
  else { return factorial(n - 1) * n }
}

to

[FUNCTION function] [IDENT factorial] [LEFT_PAREN (] [RIGHT_PAREN )] [LEFT_BRACE {]
[IF if] [LEFT_PAREN (] [IDENT n] [EQ ==] [INT 0] ... and so on

After that, learn about context-free grammars and parser generators such as yacc and JavaCC. A parser checks if tokens are places properly according to the set of rules ("grammar") and deal with them.

For example, a while statement is defined as "a while keyword, a left paren, an expression, a right paren, a block." You must transform it into a context-free grammar.

WhileStmt := WHILE LEFT_PAREN Expression RIGHT_PAREN Block

(Expression and Block defined separately) And a parser generator transforms them into a source code that deals with the tokens.

By this time, a good exercise for you is to write a calculator program.

Beyond that, you should learn about abstract syntax tree (AST) generation and interpretation of ASTs. In Java, the tree generation tool is called JJTree. Make a formula calculator ^[1] with your knowledge.

After you mastered making interpreters, learn how to make compilers, and the fun part: bootstrapping: learn how did a Java compiler was written in Java.

I made a LOGO ripoff as an example: http://github.com/SHiNKiROU/DesignScript

Also check my own calculator: http://github.com/SHiNKiROU/ExprParser

And a simple reverse polish notation calculator (Turing-complete) that I made without any effort: http://github.com/SHiNKiROU/Qwerty-RPN

I think you don't need a computer science degree, since I'm grade 9 and I am still able to create a programming language. Google and self-study.

Sorry if my English is too weird, I said I'm grade 9 and I'm not a native English speaker.

Here are some links to some useful resources and examples:

http://www.helsinki.fi/esslli/courses/readers/K10.pdf - PDF, formal language theory and natural language processing
http://jscc.jmksf.com/ - a parser generator in JavaScript, a custom programming language example is included
http://www.engr.mun.ca/~theo/JavaCC-Tutorial/javacc-tutorial.pdf - PDF, introduction to JavaCC and making a calculator with JavaCC

[1] http://en.wikipedia.org/wiki/Formula_calculator

Answer 8

The site I think you absolutely must go to is Ltu - Lambda The Ultimate ^[1]

It will helps you confronting yourself to several other paradigms. And reading other language inventors.

Go ahead !

[1] http://lambda-the-ultimate.org/

Answer 9

Well .. i wonder why no one mentioned the Dragon Book ^[1]

[1] http://en.wikipedia.org/wiki/Dragon_Book_%28computer_science%29

Answer 10

One skill that might help is being a Scandinavian ;)

Answer 11

Writing a programming language, by itself, is not all that out of reach ^[1]. Creating a good one is much harder. I think the best thing to learn first would be a good understanding of the history of programming languages — what's been tried, what's worked, what's failed. Armed with that, you need to know how your language is meant to be used so you know what design suits that best.

[1] http://createyourproglang.com/

Answer 12

To actually invent new programming language the most important knowledge is about the field where that language will provide more natural and/or better expression of problems and solutions.

Most of the people you listed, and many others credited with authoring a programming language that became popular, had very limited knowledge about actual parsers, let alone compiler writing techniques - as evidenced by a lot of awkward syntax which happens when you know what you want but noit how to do it well - so you do it the way you know at that moment.

What they did know however was "their" field - what was itching them, things they needed to exprees in order to feel that it's more natural or easier to use. Thye mostly started just by ripping off whatever they found that was close enoguh to their ideas and wasy enoguh to get into and tweak. Everything else came latter.

You can invent something that is pretty much a new programming language without even writing any parser or compiler - take JQuery as example. The reason why you see so many functional-something languages is that they have virtually no parsing needs that are not already provided. You could literally write your own sub-language in Haskell without ever knowing how real parser works.

Bjarne Stroustrup has been quoted saying that he wish he used recursive descent parser for C++ - which is the lamest parsing technique in the universe. Why? Because it would make his life easier and allowed him to spend most of the time on what he really wanted to do - make a new language :-)

Answer 13

Good question... There is this book i have, and every time i try reading it my head spins like crazy! One day i'll crack it i promise.

its all about languages and how to build one.

Programming language Pragmatics- Micheal .L. Scott ^[1]

Btw, am reading "Coders at work" really nice book.

Gath.

[1] http://rads.stackoverflow.com/amzn/click/0126339511

Answer 14

A lot of the sense of magic (of the incomprehensible variety) will change to a sense of magic (of the wonderful and elegant variety) if you work through, say Chapter 4 of the Wizard Book (Structure and Interpretation of Computer Programs - freely available from MIT online, btw.). There, you implement a metacircular evaluator, as well as two other variations on the theme. Meaning, in essence, you'll have built three languages-- not from scratch, but you'll see what a language is.

And then you can move on to the last chapter of the book, and then to EoPL, and the Dragon Book and LtU, and reading the specifications for your favorite languages, and contributing to those languages (because they're obviously open source, right?) and before you know it, you'll actually have ideas about what you'd want to do in your language, why there it'll do something new and useful, or better than the others, or whatever it is that makes it something you want to build.

Answer 15

You need to know theory of formal languages and grammars in the first place to know what is context-free grammar (most programming languages have context-free grammar). Then, you need to know something about compilers. It's good to know tools like Lex and Yacc and things like Backus-Naur form.

EDIT: I think MSc degree in Computer Science is a good starting point;)

Answer 16

Domain specific knowledge could help to design a language to solve problems that arise in
that domain. A big collection of use cases, problems and solutions so that you have a good picture
of the space problem. That kind of knowledge is needed, how to actually implement the language
is of secondary priority IMHO.

Answer 17

Designing a good language is as much an art as a science. I am interested in the subject myself and I am taking every course on languages at my university. However, what I have learned in school is simply a set of tools. Knowing about type systems and operational semantics would not have helped Yukihiro Matsumoto ^[1] design Ruby ^[2] and I suspect that he did not have much formal training in language design when he began.

I think the best way to learn about language design is to learn as many different languages and paradigms as possible and to learn them well.

Disclaimer: I have written a couple of compilers, but I am still a novice when it comes to actually designing languages. And probably when it comes to writing compilers too ;-)

[1] http://en.wikipedia.org/wiki/Yukihiro_Matsumoto
[2] http://en.wikipedia.org/wiki/Ruby_%28programming_language%29

Answer 18

A firm understanding of denotational semantics ^[1].

[1] http://en.wikipedia.org/wiki/Denotational_semantics

Answer 19

I've had occasion to create perhaps as many as eight or ten little languages in my own professional career (in addition to various others for my own purposes/enjoyment). It's sometimes the best way to solve some domain-specific problem. Reference, for example. ^[1]

It's not particularly miraculous or difficult. In general, the you'll do it because there's no existing language that exactly fits the bill; without that motivation, you can't really expect to be able to design some awesome language, essentially in a vacuum.

So the next time you need one, write it. Nurture it, and let it grow. (Both the language, and your beard.)

[1] http://rads.stackoverflow.com/amzn/click/0471597538

Answer 20

I think that one possibly good way to start would be to try to implement a small DSL using Xtext ^[1] or something similar. Start with something concerning a very small domain. Then, later on, work on a language concerning a larger domain. After a while, you should have no problem implementing very complex languages, and not just DSLs, but GPLs as well.

[1] http://www.eclipse.org/Xtext/

Answer 21

A good approach is to try and write a "toy" interpreter or compiler for your favorite language. That way you can learn about how to implement a language without getting bogged down in language design, and you have the existing grammar to start from with and plenty of test cases. While it's a huge project to design a "industrial strength" compiler or interpreter, it's pretty easy to write something than can compile or run a limited set of test programs.

Then, you just ask yourself, "how can I make this better"?

Answer 22

In 2006, the organizers of the ICFP ^[1] came up with an awesome task that involved you figuring out how to decode and parse a language they created - or rather, an alien programming language found in ancient scrolls ;)

Besides the million hours of fun trying to figure out everything in those alien scrolls (it's like a russian doll, that thing), the part where you create a compiler for a language created for fun is a really interesting way of getting into this thing of creating languages.

The site has the source code of everything, but I recommend trying to figure it out by yourself, it's much more fun. The mailing lists are still up as well, with interesting information about the language and the contest task.

[1] http://www.boundvariable.org/task.shtml

Answer 23

The first step is the hardest - designing your language - the constructs, object orientation, type system etc. It is better to start with very simple constructs and add more as you understand the subject better.

Then write a program that does something simple in your hypothetical (as of now) language. The rest of the steps are towards writing code that will read this program and execute.

Building the implementation requires you to first build a lexical analyzer. The tried and tested route is to use tools like lex/yacc. However there are excellent parser libraries in high-level languages like Ruby/Python and Java. Ruby for example has the awesome Treetop ^[1] library which uses PEG ^[2] to describe languages. For a beginner, it would be much better to use one of these languages since they help you focus on implementing your language than debugging weird memory allocation bugs in your C code.

Once you've an AST ^[3] built for your language, try building an interpreter for it. If you're using a high level language, you could loop through the AST and call routines in your host language for getting the desired functionality.

This approach will yield you a slow interpreter - but you'll have a new language with an implementation. And as you get better in the craft, you can change gear and add code generation ^[4] and make it a compiled language.

[1] http://treetop.rubyforge.org/
[2] http://en.wikipedia.org/wiki/Parsing_expression_grammar
[3] http://en.wikipedia.org/wiki/Abstract_syntax_tree
[4] http://en.wikipedia.org/wiki/Code_generation_%28compiler%29

Answer 24

Ping Rich McConnel or Brian Russel if you can find them. Authors of Clipper, they turned a compiler into a full-blown programming language with Clipper 5.0 back in the day. Brian was fond of saying, you could write Clipper in Clipper.