share
Stack OverflowWhat is the best functional language for scientific programming
[+48] [16] leon
[2009-08-28 19:29:30]
[ programming-languages functional-programming scientific-computing language-comparisons ]
[ http://stackoverflow.com/questions/1348896/what-is-the-best-functional-language-for-scientific-programming ] [DELETED]

I am coming from C/C++, Python background and I am looking to learn a functional language that (Hopefully) can do

  1. Serious Matrix Computation
  2. expressive
  3. real world modelling
  4. database integration
  5. concurrency/parallelism
  6. Battery (library) included
  7. Strong integration with other well tuned library in other language
  8. Have a positive future

I have looked into

  1. Clojure
  2. Scala
  3. Haskell
  4. Scheme
  5. Ocaml
  6. Erlang

Clojure seems to nice for concurrency but it is on JVM. Is there other pros and cons for clojure or other language?

Thanks

EDIT: In the comments below, the OP objects to solutions like Mathematica and F# because they are not "free" and "open source".

(7) +1 for Haskell because SO has more "haskell" tagged questions than your other options :) - yairchu
Erlang match your needs, especially parallellism. However, matrix comp. libs and such are a strong point. - psyeugenic
(4) Why are you saying "... but it is on JVM" as if that is a disadvantage? Why do you think that is a disadvantage? - Jesper
(4) should have been tagged with [flame-war]? - yairchu
@Jesper JVM is slower than compiled language after all in floating point computation. - leon
(1) @Leon, according to shootout.alioth.debian.org the JVM probably isn't the biggest killer. Be careful evaluating the C/C++ ones, the implementations appear to be significantly more optimised than the other languages. - kibibu
Do you mean "Batteries included", not "Battery included"? - Andrew Grimm
@leon: In the comments below you object to better solutions because they are "not open source" and "not free". If you want free open source solutions then should should say so in the question. - Jon Harrop
[+43] [2009-08-28 21:37:16] Daniel C. Sobral [ACCEPTED]

I'd advise going with Haskell. Haskell's GHC compiler is on par with C compilers, and will leave in the dust the bytecode-compiled languages, JIT or no JIT. This happens because Haskell's design ensure that a wide range of techniques are possible for optimization.

Now, if you are going to stay on the JVM, I'd recommend either Clojure or Scala. They both can use Java's extensive library support. In fact, they both fit all your requisites (imho, since at least one of them is subjective :).

Scala would be easier to pick up, because you can start with the same style you are already proficient with, and introduce functional features as the time goes. Besides, not being solely functional makes it possible to selectively jump back into imperative style if the performance with a functional style for a particular algorithm isn't making the grade. On Scala, look also at the Scala Query (type safe database manipulation), Scalala (linear algebra, matlab-like) and Scalax (serious functional stuff, Haskell-like) external libraries.

Clojure would have the advantage of accelerating functional learning by virtue of being painful to do anything else. I don't know it enough to recommend any Clojure-specific libraries.

I recommend you take a look at those Scala libraries, and any Clojure libraries people recommend, and answers to questions involving code for both languages to gauge the expressiveness and style. As for Haskell... I recommend diving right in -- trying the waters first might scare you. :-)


(6) @Paul Nathan: Completely false. I use Haskell for serious work with ease. Purity is extremely helpful, and doesn't really get in the way. - jrockway
(11) @Paul: WTF is an "'Ooo, it can do X' language"? You dislike languages that are capable of doing things? - Chuck
@Daniel: How is the performance of Scala in floating point computation? Is it better or worse than Clojure? - leon
(1) @leon Float point computation by itself is the same in all JVM languages, as they use the same primitives; the concern should be how do you manipulate large datasets, and their algorithms, as things like boxing and data copying will slow down the algorithm. I cannot compare Scala to Clojure, but I can assure you that Scala takes this matter seriously, and there's on-going efforts to make it as fast as possible without giving up the expressiveness (if you give it up, you can be as fast as any Java program could ever be). - Daniel C. Sobral
(2) @Paul: Its alright if you dislike a language, but I'm guessing you dismissed Haskell too quickly. I've personally used it in both trivial (fcgi-based web sites) and serious (DICOM image processing, Medically-oriented scripting languages, etc) work. The language is quite capable. - Shaun
(1) You might also take a look at Data Parallel Haskell (haskell.org/haskellwiki/GHC/Data_Parallel_Haskell). This is an extension (still experimental and unstable) for highly optimised number crunching. - Paul Johnson
(6) GHC is not on par with C/C++. Stop referencing the same flawed tests. The only more or less mainstream FP language that can contest with C is OCaml. - Pavel Minaev
(6) @Pavel: sure, show non-flawed tests to the contrary, and I'll remove that statement. - Daniel C. Sobral
(1) @Daniel/Shaun: Haskell appeared(and has not been changed in the years since I investigated) to be a language with many features placed in it because of theoretical shininess, without a focus on getting the job done. I give you the monad concept for IO, and suggest you compare against Perl IO for a language that understands the concept of IO needs to be built in from the ground up instead of the "monad" arcaneity. - Paul Nathan
(3) @Paul: There is no such thing as monad arcaneity. Monads are just a possible way of handling IO without having to alter everything a language has to offer. Other languages just take the easy way out and alter the language to include them. I assure you that once you get use to the IO Monad in Haskell it becomes second nature and you don't have any problems with it! That said Haskell's strong point imho is the backend where there are no states or any other side-effect (that is its ideal for what the asker requires). - Andrew Calleja
(7) As Pavel explained, this post is pure fanboi fantasy and you can easily find overwhelming evidence to the contrary. In reality, Haskell is nowhere near as fast as JVM- or CLR-based languages let alone C. If you're interested in learning the truth, check out the Burrows Wheeler Transform benchmark which is still orders of magnitude slower in Haskell than other languages including OCaml, F# and C. Even generic parallel quicksort is an unsolved problem in Haskell but trivial to implement in other languages. - Jon Harrop
(8) @Jon Harrop: Coming from an OCaml fanboy -- yeah, I can google -- that's a joke. FWIW, I'm not a fan of Haskell at all. But neither am I blinded by prejudice. - Daniel C. Sobral
(1) People, given some claims that have been going on here, I'll point you all to this thread, and let you make your own mind: groups.google.com/group/comp.lang.haskell/browse_thread/thread/… - Daniel C. Sobral
(1) I just got this on twitter as well: community.haskell.org/~ndm/downloads/… :-) - Daniel C. Sobral
I go with Haskell because it has a promising future in the data parallelism. Is there something similar in OCaml? - leon
(3) I'll let OCaml people answer that question, but I'd like to point out that F# is very close to OCaml, to the point of being source-code compatible, and it has some really good -- outstanding, even -- parallelism support. - Daniel C. Sobral
I would like to try F# when Windows became the next open source project..... - leon
For whatever it is worth, F# compiler is/was open source. - Daniel C. Sobral
@daniel does F# perform just as well on mono compare to windows .net? - leon
@leon: I can't tell, sorry. I was pointing that out just for the sake of completeness. - Daniel C. Sobral
(6) Methinks Paul has very little clue what he's talking about, and we all know Jon Harrop is biased from all the nonsense he's posted in the past. Monads are about as hard as learning to sit down on the toilet and drop a log. People make them sound harder than they are. I've worked with Haskell in some very large projects, and I've never had any problem. People are just too afraid to learn something new. - Rayne
(1) @leon: Mono is a lot slower than .NET. Did you mean Haskell's nested data parallelism? If so, I don't think it will be very useful compared to Cilk-style task parallelism. The nearest thing in the OCaml world is my own HLVM project. @Rayne: Please tell us about your "very large" Haskell projects. And, if you think I'm wrong, please provide a competitively-performant generic quicksort in Haskell. - Jon Harrop
FWIW, I have since studied some of the state-of-the-art research work done on parallel Haskell and found it to be a complete joke. I posted a review here: flyingfrogblog.blogspot.com/2010/06/… - Jon Harrop
Andrew Calleja: "I assure you that once you get use to the IO Monad in Haskell it becomes second nature and you don't have any problems with it!" You may be surprised to know that parallelizing quicksort (which is trivial in most languages) is an unsolved problem in Haskell without evading the type system. A state-of-the-art type-unsafe solution is here: flyingfrogblog.blogspot.com/2010/08/… - Jon Harrop
@Daniel: The work on supercompilation by Neil Mitchell at the University of York that you cited never went anywhere. Like Lisp's "sufficiently smart compiler" in the 1980s, it is an evolutionary dead end that resulted from clutching at straws following a bad initial design decision. - Jon Harrop
1
[+21] [2009-08-28 19:46:14] fortran

I like a lot OCaml because it has some very nice killer features that I love:

  • Very fast compiled code
  • Strong typing that is not a pain in the arse through type inference
  • Multi Paradigm (OO, Imperative, Functional)

The worst thing about it is that it's not getting a parallel garbage collector any time soon, so its great performance will be lagging behind other languages implementations that are easier to parallelise automatically as the number of cores increases. It's still possible to do parallel programming explicitly with MPI (but I think that kills the fun).

If you can afford to spend some money, maybe this book could help you: OCaml for Scientists [1].

Anyway, I'd stick with Python/SciPy, it's got great performance and a flexibility hard to find in any other language. Being already proficient in both Python and C, I found this book lovely to go deeper in these topics: Python Scripting for Computational Science [2]

[1] http://www.ffconsultancy.com/products/ocaml_for_scientists/
[2] http://books.google.co.uk/books?id=YEoiYr4H2A0C&lpg=PR2&dq=python%20for%20scientists&pg=PR17#v=onepage&q=python%20for%20scientists&f=false

(2) The OC4MC project provided OCaml with a parallel GC a few months ago but my initial tests indicated that it was not performant enough to be useful, although it may be now. My HLVM project is probably the best bet for high-performance parallelism from OCaml and it is nearing a first useful version... - Jon Harrop
very nice to know about those options :) - fortran
Agree on that :) - Ang
2
[+17] [2009-08-28 20:28:37] alanlcode

I can only speak about Clojure since I have only limited experience with the other languages.

The advantages of Clojure:

  1. Full JVM integration. You can use any library written in Java, so you will have no dearth of well-tuned, mature libraries, such as those from Sun or Apache Commons. Covers your concerns [6] and [7].
  2. Concurrency support. Software transaction memory and Java threads. Covers [5].
  3. Incanter [1]: a statistical computing package modeled after R, written in Clojure. Under the hood it uses the high-performance, multithreaded scientific computing package Parallel Colt [2]. Covers [1] and [7].
  4. It's a Lisp, with all the metaprogramming and expressiveness that it entails. Covers [2].

Disadvantages:

  1. Relatively new to the scene. The community is small, albeit very active and helpful. It may or may not survive the test of time.
  2. It's a Lisp. If you don't like Lisp, then you won't like Clojure, although obviously I'm personally biased and would prefer you gave it a try.
  3. Will probably never match Java in pure speed, pound for pound. It generally comes close, within an order of magnitude.
  4. It is not a pure functional language. Read: side effects and mutable data structures. This may or may not be a disadvantage to you, depending on what you're looking for out of the langauge.
[1] http://incanter.org/
[2] http://sites.google.com/site/piotrwendykier/software/parallelcolt

3
[+13] [2009-09-01 17:23:28] jetxee

Scientific computation is two-fold.

Quick prototyping

On one hand you may need to write a lot of prototype code, and need to write it fast. Often, this code is used just once. So, there is a need in simple and expressive languages with solid library support. In my opinion, Python is the best suited for this purpose. And I hope it will finally dethrone matlab in this area. I don't know any of the functional languages which can compete with Python right now.

Performance computing

On the other hand, you may need to solve computationally intensive problems, and performance is important. So, you need an optimizing compiler and parallel computations (both multi-core and multi-machine). And you need to make it work on clusters (i.e. on Linux) and support standard parallel APIs (MPI and OpenMP).

From you list probably only Scheme is not suitable for performance computing. The others may or may not be OK. I don't know. Anyway, the result will usually be 2 or 3 times slower than pure hand-optimized C/C++/Fortran/Java.

I know that Haskell used to lag behind in this area, but with Data Parallel Haskell [1] this may change. It's status is technology preview right now (and stable in ghc 6.12?).

There is also a field of symbolic computing, which I am fairly remote from. I expect that some of the functional languages may really shine in this area if there are suitable libraries.

Shootout

I think you can also consult shootout.alioth.debian.org to see the performance limits in similar number-crunching tasks on a multi-core CPU. Sure, pure C rules them all, but most of the compiled functional languages are good enough:

  1. Double-precision N-body simulation [2]

  2. Eigenvalue using the power method [3]

Libraries

Scientific computing depends on the existence of libraries in your domain (unless you are ready to write your own). Just for the reference:

Numeric and scientific libs for Python [4]

Haskell math libraries [5]

OCaml math libraries [6]

[1] http://www.haskell.org/haskellwiki/GHC/Data%5FParallel%5FHaskell
[2] http://shootout.alioth.debian.org/u32q/benchmark.php?test=nbody&lang=all
[3] http://shootout.alioth.debian.org/u32q/benchmark.php?test=spectralnorm&lang=all
[4] http://wiki.python.org/moin/NumericAndScientific
[5] http://hackage.haskell.org/packages/archive/pkg-list.html#cat%3Amath
[6] http://caml.inria.fr/pub/old%5Fcaml%5Fsite/humps/caml%5FMathematics.html

(2) Some of our numerical routines in F# are several times faster than Intel's Fortran code on Intel's own hardware. Vanilla C and C++ don't handle multicores at all well. - Jon Harrop
I agree, that C and C++ are not the easiest languages to write parallel code, and sequential code cannot compete on SMP machines. Pure functional programming is surely more multi-core friendly, but I am not so sure if unpure F# is really capable of automatic parallelism. (Manually parallelized code may be efficient in any language). I expect F# to be very competitive though, with a performance pattern of OCaml more or less (less in Linux). I don't know if F# math libraries are good. What is against F# for scientific computing is its subpar performance in Linux. - jetxee
Absolutely, nothing is capable of automatic parallelization to any useful degree. F# just makes it easy to express efficient parallel algorithms but you still need to know exactly what you're doing to take full advantage of it. - Jon Harrop
4
[+11] [2009-08-28 20:40:48] Hynek -Pichi- Vychodil

After Erlang and OCaml you should also look at J [1]. It is ultimate scientific language.

[1] http://www.jsoftware.com/

J was going to be my answer as well. I've never used it, but I like what it has to offer. It is a language will teach you thing you can use just by reading the docs. - John F. Miller
5
[+8] [2009-09-01 18:02:37] Warren Young

Options you didn't list, but should look into:

Because of the GUI front end and strong mathematical typography support, many don't realize that Mathematica is in fact a programming language. You don't even have to use the GUI: you can run the evaluation engine in the background, just as they do for Wolfram Alpha [3]. As you'd expect of a mathematical programming language, it can be used in a purely functional manner, though it also has procedural and OO features. It's not side-effect-free, as you can have variables that vary, but you can program without using those parts of the language, if you want. It has everything you asked for. The only question is whether you can cope with the license cost.

R is also a functional language, in the same impure way as, say, JavaScript: functions are first-class objects, but there are also variables-that-vary. It's more what the average programmer would think of as a "language" than Mathematica, but pretty much across the board not as powerful except in its primary domain, statistics. It's a general-purpose programming language, not tied to statistics or even math and science, but because so few people use it outside these domains, getting good info on how to use it that way can be difficult.

[1] http://wolfram.com/products/mathematica/
[2] http://r-project.org
[3] http://www.wolframalpha.com/

(2) coming from a statistics and pure maths background, I have in fact tried both. I do not like mathematica is because 1) slow 2) irregular syntax (I started playing with mathematica even before I started to learn C and Java) 3) not open source 4) not free 5) not wildly accepted in business world R is nicer and it is getting bigger in time. However R is 1) Ill documented (I tried to find an intro book for R programming language. However everything I found is just application book in statistcs. I had to dig through language specification which is not ideal. - leon
(1) 2) irregular object models that have almost no raedable documentation The only doc I found on web was the original implementation specification... like something I can read in few hours to get it right. 3) librarys are written by statistician rather than experienced programmer. The API are not properly designed for smooth work flow. 4) No good IDE that works well with R. Tried Eclipes and Emacs ESS but they are far from production level. correct me if i am wrong. - leon
(1) 4) Lastly... took me forever to do string manipulation... still failed to get it write. I spent a day to dig doc but i spent 2 hours to implement dataframe and linear reg in Python.... I wish someone can write a better interpreter and organise the Docs... then I will dive into more. - leon
@leon: While you should always be looking for better ways to do things, don't get trapped looking for the perfect tool. No such thing. Regarding books specifically about R the programming language, I like S Programming by Venables and Ripley best. Not a thing in it about statistics. (They have a different book for that.) Despite the title, it also covers the differences in R. It's 9 years old, but the core language hasn't changed much since then. Chambers' Software for Data Analysis is similar, only a year old, and covers R specifically, but I don't think it's actually better. - Warren Young
(1) The original intend is to look for a FP for scientific programming. The in-scalability of R and Mathematica is enough to be eliminated on the list. I will look into S programming though. Thanks - leon
@leon: What do you mean by the "inscalability" of Mathematica? - Jon Harrop
6
[+7] [2009-08-28 19:33:11] Matthew Vines

There probably isn't a definitive answer here, but you certainly aren't the first to ask.

http://lambda-the-ultimate.org/node/2720

http://www.programming4scientists.com/2008/08/a-big-list-of-programming-languages/

I have grown fond of F# which is based on OCaml, but I would think most any functional language would get you to the goal.


(2) F# is on .net which is something i would like to avoid for life.... and I am on Linux for the most part. - leon
Btw is ocaml good for concurrency/Parallelism? - leon
(2) @leon: From what I've heard, F# works fine on Mono as well. - Tom Lokhorst
(2) ocaml is reputed to have trouble with concurrency. Writing a concurrent garbage collector is more work than the ocaml core team wants to bother with. However, F# is based on it and doesn't have the same problems. - mwt
(1) @mwt: Wrong way around. OCaml has trouble with parallelism but concurrency works great and some of OCaml's largest commercial success stories (e.g. Wink) are concurrent. Writing a concurrent GC that is performant on OCaml-like code seems to be impossible but it is actually really easy to write a (mostly) concurrent GC, e.g. VCGC. - Jon Harrop
7
[+6] [2009-08-30 22:51:19] yonkeltron

I think that Scala can win big in this area. At work I use Scala for data analysis and it works out quite well being a hybrid of functional and object-oriented programming languages. You get all of the Java goodness for free:

  • Java standard library
  • Huge ecosystem of the Java world (including many numerical, statistical and simulation packages)
  • JVM (JIT compilation, garbage collection, etc)

Plus, you get the good stuff which Scala offers.

  • Strong and expressive type system (actually type-safe, unlike Java)
  • Pattern matching
  • Actors for concurrency (like Erlang)
  • Flexible syntax for quick-and-easy mini languages (and good parsers for DSLs)
  • Native XML type

In my opinion, Scala does a great job of bridging the gap between many different paradigms while still being able to fit in with existing infrastructure.


All the Java-related advantages are also true for Clojure and for JVM-based implementations of Common Lisp and Scheme. - Jay
+1. Fascinating. - Jon Harrop
8
[+5] [2009-08-28 19:36:10] John Millikin

Clojure and Scala will have the best library support, as they can call into libraries written in any JVM-supported language.

Haskell and OCaml are both mature, well-tested languages. They can interface with existing code written in C, FORTRAN, etc through their FFIs.

Erlang is mature, but while it's often used in fault-tolerant distributed systems, I've never heard it praised on merits of performance. Maybe I just missed the memo? Would probably match any other language if the computations are parallelizable.

Scheme's a great language, but it doesn't have many libraries, and it's not very fast in my experience.


(1) That really depends on your implementation. Chicken and Gambit scheme are pretty darn speedy (they compile down to C), and have a decent set of libraries. PLT schemes library base is massive. - Jonathan Arkell
Erlang's parallel performance is awful. - Jon Harrop
@Jonathan: Seconded -- but I would add something: Chicken has lots of extensions and libraries, and supports more SRFIs than Gambit. - Jay
9
[+5] [2009-08-28 21:45:33] Paul Nathan

F# has the most potential, as a first-class citizen in .NET/Windows world.

Nearly all functional languages exist mostly as academic curiosities. If that suits your needs, fine. I lean towards the "if its popular, it's not bad." school of thought.


The question was about scientific programming. You didn't explain how .NET is better for this task (performance, scalability, special domain-specific libraries, availability on scientific hardware (see top500)). => -1. - jetxee
Why, no. No, I didn't. I suggested F# as the functional language with the brightest future. If it really does have the brightest future, the technical questions that you mention will be resolved over time. - Paul Nathan
Visual Basic had a bright past, but it never became important for scientific computing. I don't see any guarantee that even if F# is successful as a default windows/.net functional language, it becomes useful for scientific computing. And I believe that today F# is not ready for the task. - jetxee
(1) @jextee: F# is already widely used for scientitic computing. - Jon Harrop
10
[+2] [2009-08-28 19:35:21] Justin Niessner

Could always go one up from Clojure and learn Common Lisp (or Scheme for that matter).

...if not Lisp, then I'd probably settle for Haskell (and then Erlang).


Hardly "one up" given the absence of a decent garbage collector in all existing Common Lisp implementations, both free and commercial. - Jon Harrop
11
[+2] [2009-11-07 11:29:09] Jay

OK, I know I'm late... But it seems that Scheme can actually be a good option. There was recently some discussion in the chicken-users mailing list [1], and it seems that Chicken Scheme [2] can be vary fast for numeric computing using the "Crunch" extension. And Scheme is nice because it's a very expressive language, has support for interactive development and is conceptually very simple.

And Chicken supports lots of SRFIs! There are also other extensions and libraries available (see the Eggs index [3]).

[1] http://mail.nongnu.org/mailman/listinfo/chicken-users
[2] http://call-with-current-continuation.org
[3] http://chicken.wiki.br/chicken-projects/egg-index-4.html

12
[+2] [2009-12-21 06:57:25] Yin Zhu

a late comment:)

In our data mining research group, several students are using Python on Hadoop to do parallel computing. The result is promising. Since you have a background in Python, this method is faster than learning a new language.

If you really want to learn a FP language, I think clojure/incanter [1](on JVM) and F#(on .Net) are the best.

Also notice one thing, FPs does not support numberical computing theirselves, they use C/Fortrain library bindings and a stable binding usually take years!(see Numpy/Scipy)

I am using F# as I am familiar with Microsoft's platforms, also because it is strongly supported by a company experienced in platforms and tools.

[1] http://incanter.org/

13
[+2] [2010-07-10 11:08:46] Jan Rychter

I use Clojure for similar types of computation. We do a lot of matrix work, as well as processing of other data structures.

I don't think your question has a single answer. There is no "best" functional language. It all depends on what you do and which libraries you intend to use. For us, the JVM was a huge plus, as it is a mature VM with a good JIT, a number of garbage collectors, and parallelism. All are important to us.

We use incanter for most of mathematical work and Clojuratica/Mathematica for arcane stuff and prototyping.


14
[+1] [2009-08-31 03:46:47] Gene T

Implicit is you want library functionality at least equal scipy / numpy/ matplotlib , along with the huge number of C / linux tools available: R, GSL, sage, octave. Also tools to integrate relational DB, key-value and doc stores, hadoop, etc. Probably only java and .NET libs are going to give that kind of batteries included.

Erlang is the only FP language I've learned in anger. It's "mature" for its traditional server/middleware core competency, but there's a recognition that it could do a lot more. For example, web app frameworks need decent regex engine to do URL recognition and generation, and there was that Tim Bray todo about Erlang and apache logfiles (WideFinder) so the erlang core team is working on it (Robert Virding's libraries).

So it's today not a language known for matrix and statistical math, map-reduce and SIMD data analytics, but given its push into new types of apps, it could surprise you, and the VM's ability to spawn, manage, and terminate ten of thousands of processes and more (gracefully) is unrivaled.


15
[0] [2009-08-28 19:34:07] Shaun

Any of these would do, I think (though I don't have a scientific background, so YMMV). Of these my favorites are, in order:

1: Haskell 2: Erlang 3: OCaml / F# (pulled a fast one on this one). 4: Scala


16