Stack OverflowWhat parser generator do you recommend
[+40] [11] user51568
[2009-01-09 17:11:00]
[ c++ c parsing parser-generator ]

I'm currently shopping for a FOSS parser generator for a project of mine. It has to support either C or C++.

I've looked at bison/flex and at boost::spirit.

I went from writing my own to spirit to bison to spirit to bison to spirit, each time hit by some feature I found unpleasant.

The thing I hate most about bison/flex is that they actually generate C/C++ source for you. There are a number of disadvantages to this, e.g. debugging. I like spirit from this point of view, but I find it very very heavy on syntax.

I am curious about what you are using, what you would recommend, and general thoughts about the state of the art in parser generators. I am also curious to hear about approaches being used in other languages for parsing problems.

Both Bison and Flex will generate C++ output with the appropriate flags. - Loki Astari
(1) I'm aware and I don't care if the code is C or C++. - user51568
[+26] [2009-01-09 17:23:29] Kevin Loney

Antlr [1] isn't bad and it has a built in debugger. The package also comes with an API [2] for C (among other available languages).


(8) "Not bad" is underestimation, with AntlrWorks the word is "awesome" :) - utku_karatas
+1, Antlr's pretty good :) - orip
[+23] [2009-01-10 03:08:16] Norman Ramsey [ACCEPTED]

Please don't use bison/flex or yacc/lex. They parse very efficiently but are really hard on the programmer. Use a more modern parser generator with a better user interface. ANTLR [1] is a good suggestion, and you might also consider


I'm not familiar with the theory behind packrat and the implementations for C and C++ seem limited. I respect Elkhound very much (esp. for Elsa), but it seems to heavyweight for what I need. +1 - user51568
The main thing is not to limit yourself to LALR grammars. It just makes your grammar less readable and causes you unnecessary suffering. Maybe you'll like ANTLR? - Norman Ramsey
The elkhound parser seems to have moved to if somebody is looking for it now. - dajobe
[+21] [2009-01-10 12:04:58] epatel

I'd recommend looking a little at the Lemon parser generator used in SQLite

Lemon [1]

Lemon is an LALR(1) parser generator for C or C++. It does the same job as "bison" and "yacc". But lemon is not another bison or yacc clone. It uses a different grammar syntax which is designed to reduce the number of coding errors. Lemon also uses a more sophisticated parsing engine that is faster than yacc and bison and which is both reentrant and thread-safe. Furthermore, Lemon implements features that can be used to eliminate resource leaks, making is suitable for use in long-running programs such as graphical user interfaces or embedded controllers.


(2) It seems reasonable, but not impressive. In particular, the documentation could use some work. - user51568
(4) It don't look flashy, but coupled with a decent lexer (I suggest ragel), it is the best parser generator I have used. And that documentation is enough because lemon is really simple to use. - artificialidiot
Thank you very much for this suggestion - very nice piece of software! - simfoo
(1) How is this answer interestingly different than "use YACC or Bison?" Both are LALR parser generators. Both push the entire job of post-parsing (building ASTs, interpreting ASTs, ...) entirely on the user, almost certainly as plain procedural attachments to parsing actions, which is what I suspect the OP is complaining about. - Ira Baxter
(1) Saying "use lemon" really (for me) has the underlying meaning of saying "don't use yacc/bison/flex". They're really old and they're not designed to be a part of a program, they are a foundation for a program, which is usually what you want when you make a command line compiler. I agree with the recommendation of lemon. It's not perfect but it's definitely a lot cleaner than any other parser generator I've ever seen. It's 2 c files! One is the lemon executable and the other is the generated parser template. - doug65536
[+19] [2009-01-10 11:57:04] MattyT

I've been very happy using spirit [1]. Yes, the syntax can take some getting used to but it's flexible and powerful.

If your code is in C++ it's the most elegant solution IMHO since a) it integrates beautifully with your code (particularly with the design of actions) and b) you don't need to run a code generator as a separate build step.

I'd suggest looking into it some more before dismissing it.

Antlr [2] is great if you're using other languages, but when I'm using C++ Antlr feels clunky and awkward compared to using spirit. I've drunk the kool-aid; spirit FTW! ;)


(2) I have looked into it; currently my project is written using spirit. I'm not happy, particularly with the AST stuff. - user51568
Could you kindly elaborate?… - Norman Ramsey
I'd like some further information too... - MattyT
spirit is funny stuff, but a relatively simple LISP-like parser I wrote takes tens of seconds to compile on a 2.7 GHz core. - phresnel
[+14] [2009-01-09 19:40:00] Loki Astari

I use FLEX and Bison.
Both have the ability to generate C++ code (via command line flags or directives in the file).

I hear Antlr is good but have never used it personally.

[+6] [2010-04-02 16:08:16] Ira Baxter

A state-of-the-art parser generator is the DMS Software Reengineering Toolkit [1]. (I'm the architect).

It isn't FOSS, but you asked specifically about state-of-the art.

It isn't so much just a parser generator, as a complete ecosystem for building tools that process formal documents (programs, specifications, hardware designs, anything that has a "formal syntax/semantics").

DMS provides

  • lexers with full Unicode capability and ability to read a huge variety of input encoding formats (ascii, UTF-8/16, EBCDIC, ...)
  • full-context free parsing (infinite lookahead and built-in error recovery)
  • automatically builds abstract syntax trees, determining which productions are lists. The syntax trees capture comments in the text.
  • provides direct support for building tree-structured analzers called "attribute grammar evaluators"
  • provides symbol table construction support that has been proven to be capable of handling nasty languages such as C++
  • provides pretty printers to regenerate valid source text from the trees, including regenerating valid comments
  • source-to-source rewrite rules to allow you to define program transformations using the syntax of the langauge of interest
  • provides control flow, data flow, call graph, and global points-to analysis machinery
  • has tested front front ends for C, C++, Java, and COBOL, all of which build symbol tables and construct the various flow analyses above
  • has front ends for a variety of other langauges, including C# (4.0), PHP, Ada, ...

One of the tests of fire for a "state of the art" parser generator is its ability to parse C++. DMS parses C++, does all the symbol table construction, etc. and has been used to carry out massive transformations automatically on C++ code.

Other "parser generators" tend to provide at best parsing ability and leave you to build your own trees and all of the rest of the above stuff if you have the heart and the years to do it.

ANTLR is a bit better in that it does provide support for tree building, some syntax-directed pattern matching. The C++-trial-by-fire ANTLR sort of passes; there is a C++ front end for ANTLR. To the best of my knowledge, it is incomplete, doesn't have symbol table support, and I don't know of any uses of it for production tasks.

ELSA succeeds at C++ (and symbol tables) by virtue of being focused on parsing C++. The foundation machinery (Elkhound) behind ELSA is the same GLR parsing algorithms used by DMS. But I don't believe that Elkhound is widely used for anything but to support ELSA.

At the risk of being immodest, I would suggest that DMS defines the state of the art. (I'll agree that ANTLR is pretty good for what it does).

You can get more detailed comparisons of DMS to many other systems here [2].


[+5] [2009-01-09 17:29:46] ChrisW

I am curious about what you are using, what you would recommend, and general thoughts about the state of the art in parser generators.

I'm using the GOLD parser at ... because:

  • I'm not experienced with or formally educated in parsing, and I found it easy to learn and use
  • It says that it supports several languages (including C, C++, and C#).

GOLD is still based on the tired old LALR model, for which it is a pain to write the grammar. GOLD's main claim on its web page is that it is easy to support lots of programming languages. That's a good thing, but for C and C++, better alternatives are available. - Norman Ramsey
I don't like it. Seems to be windows-only and the documentation seems unclear. - user51568
I found I was able to make a CSS grammar for it: which (parsing CSS) is what I wanted a parser for. That was my first and only experience with parsing: so presumably CSS is easy to parse, or the tool is good, and/or I was lucky. - ChrisW
So, that's my experience. Other tools might be better, perhaps even more powerful in some ways, but this one suited me. I knew when I chose it that Antlr was the more famous project. - ChrisW
[+1] [2009-01-09 20:37:53] systemsfault

There are plenty of good documentations on Antlr and it has a very nice eclipse plugin. So I recommend it. But unfortunately have no experiences at other options.

[+1] [2009-01-10 12:57:25] Daemin

If you understand the theory of lexing and parsing you can use Flex and Bison to generate the state machine tables for you and implement the lexer and parser yourself (or re-implement the templates that come with Bison and Flex) to get rid of the things you don't like about them.

I've done this at one time, and it's nice in so far as you can have your own lexer and parser written to your specifications, in your application's style, with your own coding standards and debugging features, but you use the well coded algorithms inside Flex and Bison to generate the state transition tables for you. And I'd wager to say that creating the tables is probably the more complicated problem.

So in summary: Use flex and bison to generate your state transition tables, which are then used by your own lexer and parser.

I don't understand why I would want to do that. I'm not working on something particularly complicated; if I need to write some part on my own, I might as well go all the way and write it all. - user51568
Well you said that you don't like the C code that gets generated. Therefore if you use Flex and Bison to generate the tables for you, not the code, you can write your own manipulations (or code what they have done in the template) and replace it with something that fits into your framework better. - Daemin
[0] [2009-01-09 17:49:03] rmeador

Flex has a way to configure it to generate C++ (and perhaps Bison does as well, though I'm unsure of that). I recall trying to use this in the final project for my compilers class and finding it nearly undocumented, so I fell back to using C. That was a year and a half ago, so maybe it's gotten better since then. There's definitely a section in the man page on it though. I'm not sure that's helpful, but at least it's something you can try :)

Its not a how too but the man pages for both tools are very thoughra (though can be heavy going). - Loki Astari
I don't really care if it's C or C++. I want something with nice syntax. - user51568
[0] [2009-01-29 16:24:45] Vinay

Visual ++ Parser

Isn't Visual Parser Java only? - Kevin Loney
It is a C++. I am using it in C++ - Vinay
Can you include a link to it (C++ version)? - user51568