Following the interest in this question [1], I thought it would be interesting to make answers a bit more objective and quantitative by proposing a contest.
The idea is simple: I have generated a binary file containing 50 million gaussian-distributed doubles (average: 0, stdev 1). The goal is to make a program that will sort these in memory as fast as possible. A very simple reference implementation in python takes 1m4s to complete. How low can we go?
The rules are as follows: answer with a program that opens the file "gaussian.dat" and sorts the numbers in memory (no need to output them), and instructions for building and running the program. The program must be able to work on my Arch Linux machine (meaning you can use any programming language or library that is easily installable on this system).
The program must be reasonably readable, so that I can make sure it is safe to launch (no assembler-only solution, please!).
I will run the answers on my machine (quad core, 4 Gigabytes of RAM). The fastest solution will get the accepted answer and a 100 points bounty :)
The program used to generate the numbers:
#!/usr/bin/env python
import random
from array import array
from sys import argv
count=int(argv[1])
a=array('d',(random.gauss(0,1) for x in xrange(count)))
f=open("gaussian.dat","wb")
a.tofile(f)
The simple reference implementation:
#!/usr/bin/env python
from array import array
from sys import argv
count=int(argv[1])
a=array('d')
a.fromfile(open("gaussian.dat"),count)
print "sorting..."
b=sorted(a)
EDIT: only 4 GB of RAM, sorry
EDIT#2: Note that the point of the contest is to see if we can use prior information about the data. it is not supposed to be a pissing match between different programming language implementations!
generator.py
and didpython generator.py 1000000
. However, as I saved the second snippet assort.py
and entered in the consolepython sort.py 1000000
, I got a EOFError. I know nearly nothing in Python, so if you could help... Edit : Thegaussian.dat
file is indeed at the right place, but the serialization made it look like garbage in my editor. Normal behaviour with python'stofile()
? - Fezvez