share
Stack OverflowWhat are your favorite programming-related academic papers?
[+79] [41] namin
[2008-12-10 23:47:35]
[ computer-science education ]
[ http://stackoverflow.com/questions/358033/what-are-your-favorite-programming-related-academic-papers ] [DELETED]

What are your favorite programming-related / cs academic published papers?

It could be a functional pearl, a programming-language paper like those often cited on lambda-the-ultimate.org [1], etc. Really, anything vaguely related to programming. Please also explain why you like the paper.

codinghorror.com/blog - BlueRaja - Danny Pflughoeft
@sdcwc: Hi sdcwc! Thank you I haven't seen that question. it is a very similar question. however the variation of asking for only one paper and the listing bring a new air to the topic, what do you think? +1 Thank you again, SD - SDReyes
[+27] [2008-12-11 05:15:15] Norman Ramsey [ACCEPTED]

For this audience I think I have to go with John Hughes's seminal paper Why Functional Programming Matters [1] because it is densely packed with great ideas and because the topic of the paper is programming. I like the paper because any programmer can pick it up and read it and most will come away excited by the ideas. Also, there are lots of examples in the paper, so you can immediately pick up some kind of Haskell implementation and start trying the ideas for yourself right away.

For the academics in the crowd, I'll also give a shout-out to Tony Hoare's great paper Proof of Correctness of Data Representations [2]. It's intensely mathematical, but it is the canonical reference for a critically important and poorly understood technique: abstract data types. If we all understood about abstraction functions and representation invariants, the world would be a far better place. Heck, I'd just settle for getting people to document their representation invariants and leaving me to figure out the abstraction functions for myself. (Some of Tony's best papers have been collected into a book called Essays in Computing Science [3]. Tony is one of my heroes and this collection has some fabulous chapters including Tony's wonderful Turing lecture on 'The Emperor's New Clothes; a talk he gave to a lay audience on what programming is; a proof of a simple program, finding the kth largest element in an array; and many other nice things. Even if you're not mathy you can pick up this book and skip the mathy chapters and still enjoy it a lot.)

[1] http://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf
[2] http://tinyurl.com/6fazcz
[3] http://portal.acm.org/citation.cfm?id=SERIES11430.63445

The John Hughes paper link is dead. Here's a new one: Why Functional Programming Matters (PDF) - Ergwun
1
[+17] [2008-12-11 03:22:00] namin

Here are the papers that come to mind:

[1] http://www.cs.brown.edu/~sk/Publications/Papers/Published/sk-automata-macros/
[2] http://research.microsoft.com/~simonpj/papers/stm/#beautiful
[3] http://www.cs.tufts.edu/~nr/pubs/pmonad-abstract.html
[4] http://github.com/namin/spots/tree/master/probabilisticModeling
[5] http://scheme2006.cs.uchicago.edu/12-byrd.pdf
[6] http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10663
[7] http://www.cs.nott.ac.uk/~gmh/bib.html#fold
[8] http://www.cs.nott.ac.uk/~gmh/bib.html#pearl
[9] http://research.microsoft.com/~emeijer/Papers/fpca91.pdf

2
[+14] [2010-05-19 19:18:20] Pratik Deoghare
(1) That's not an academic paper. - Paul Nathan
(2) The question asks for published papers. - Norman Ramsey
3
[+13] [2008-12-10 23:56:58] ShreevatsaR

I have in mind a somewhat forgotten paper: Structured Programming with go to Statements [1], Donald Knuth, ACM Computing Surveys, Vol 6, No. 4, Dec. 1974.

It's actually a rebuttal to the "GOTO considered harmful" creed (saying that "GOTO is sometimes okay, really"), a view that is now out of fashion (to say the least!), but it also says some important things about the practice of programming. Most famously, it is the source [2] of the quote "Premature optimization is the root of all evil". :-)
[It's also instructive to read the quote in its original context, to see what Knuth did not mean.]

Quite related is Knuth's Computer Programming as an Art, Knuth’s Turing Award lecture (1974), printed in Communications of the ACM, Volume 17, Issue 12, Dec. 1974, which is available e.g. here [3].

[1] http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf
[2] http://shreevatsa.wordpress.com/2008/05/16/premature-optimization-is-the-root-of-all-evil/
[3] http://www.paulgraham.com/knuth.html

I just found out that the excellent Mark Jason Dominus (mjd, blog.plover.com) also considers this paper (Structured programming with go to Statements) "my single all-time favorite computer science paper": blog.plover.com/prog/Hoare-logic.html#fn3 - ShreevatsaR
4
[+10] [2008-12-11 00:04:03] Jules

Oh there are so many fantastic papers. Purely Functional Data Structures [1] is great. I'm still reading it, but so far it has been an excellent read. The implementations of the data structures are short and beautiful.

[1] http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

Chris turned his thesis into a book. It's a great book. - Norman Ramsey
5
[+9] [2009-12-13 17:24:48] Graphics Noob

I can't believe nobody has included the (possibly) most referenced paper on this site:
Go To Statement Considered Harmful [1] - Edsger W. Dijkstra

Some of my favorites:
What Every Computer Scientist Should Know About Floating-Point Arithmetic [2] - David Goldberg
A Personal Computer For Children of All Ages [3] - Alan Kay (written in 1972, also a Wired interview with Kay [4])
Skip Lists: A Probabilistic Alternative to Balanced Trees [5] - William Pugh

[1] http://www.u.arizona.edu/~rubinson/copyright%5Fviolations/Go%5FTo%5FConsidered%5FHarmful.html
[2] http://docs.sun.com/source/806-3568/ncg%5Fgoldberg.html
[3] http://www.mprove.de/diplom/gui/Kay72a.pdf
[4] http://www.wired.com/gadgetlab/2008/11/museum-celebrat/
[5] ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf

(1) +1 for the skip lists paper. I was so pleased when I ran across that one a few years ago. - Jason S
6
[+8] [2008-12-11 00:43:15] Rob Wells

G'day,

While not purely academic Edsger Dijkstra's paper " The Humble Progammer [1]", his Turing Lecture from 1972, is an excellent paper to read. My favourite quote is

The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.

HTH

cheers,

Rob

[1] http://www.cs.utexas.edu/users/EWD/transcriptions/EWD03xx/EWD340.html

(1) +1 Dijkstra's archive @ utexas is worth combing through slowly - just somebody
7
[+8] [2008-12-11 01:49:56] Darron

Big Ball of Mud [1]

Abstract:

While much attention has been focused on high-level software architectural patterns, what is, in effect, the de-facto standard software architecture is seldom discussed. This paper examines this most frequently deployed of software architectures: the BIG BALL OF MUD. A BIG BALL OF MUD is a casually, even haphazardly, structured system. Its organization, if one can call it that, is dictated more by expediency than design. Yet, its enduring popularity cannot merely be indicative of a general disregard for architecture.

These patterns explore the forces that encourage the emergence of a BIG BALL OF MUD, and the undeniable effectiveness of this approach to software architecture. What are the people who build them doing right? If more high-minded architectural approaches are to compete, we must understand what the forces that lead to a BIG BALL OF MUD are, and examine alternative ways to resolve them.

A number of additional patterns emerge out of the BIG BALL OF MUD. We discuss them in turn. Two principal questions underlie these patterns: Why are so many existing systems architecturally undistinguished, and what can we do to improve them?

Why do I like it? It's an irreverent but useful look at the software development world as it actually exists.

[1] http://www.laputan.org/mud/

8
[+7] [2008-12-11 04:01:52] tgamblin

Gotta throw in Leslie Lamport's Time, Clocks, and the Ordering of Events in a Distributed System [1]. Not so much for the clock sync part, but for the distributed event ordering. First paper I know of to talk about what causality really means in a distributed system. None of today's big webapps would really work without that :-).

[1] http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf

9
[+7] [2008-12-11 07:58:16] JayRan

Don't mean to get a little too classic on you, but I've gotta say Turing's On Computable Numbers, with an Application to the Entscheidungsproblem [1] has got to be, at least I would consider, one of the most significant (even if only historically) papers in CS. And Charles Petzold's The Annotated Turing [2] does an excellent job of illustrating the significance, as well as making the mathematics understandable, even to a mildly-mathematical programmer such as myself. In fact, I'm pretty sure Jeff mentioned it in one of the Stackoverflow podcasts.

[1] http://www.thocp.net/biographies/papers/turing_oncomputablenumbers_1936.pdf
[2] http://www.theannotatedturing.com/

10
[+7] [2008-12-11 09:32:48] Sean

I like Google's MapReduce [1] paper. If you've ever wondered how they process all that data then this is the place to go. It touches on functional programming and distributed processing, which are two of the big topics these days.

[1] http://labs.google.com/papers/mapreduce.html

+1, easily digestible paper - just somebody
11
[+7] [2010-05-19 19:10:09] Matt Ball

Ken Thompsons's Reflections on Trusting Trust [1] gets me every time I read it.

[1] http://cm.bell-labs.com/who/ken/trust.html

12
[+7] [2010-05-19 19:33:28] Tom Bushell

Fred Brook's No Silver Bullet — Essence and Accidents of Software Engineering [1] is probably my personal favorite.

Edit: Changed the link to point to the actual paper

[1] http://www.virtualschool.edu/mon/SoftwareEngineering/BrooksNoSilverBullet.html

When ever I am being interviewed for a new job, one of the questions that I ask my prospective manager is which texts have most influenced his or her attitudes towards managing software projects. If Mythical Man-Month and No Silver Bullet aren't mentioned, I am very likely to pass on the position. - Adam Crossland
13
[+5] [2010-05-19 19:01:46] Adam Crossland

C.A.R. Hoare's 1980 Turing Award Lecture: The Emperor's Old Clothes [1]

[1] http://www.cs.yale.edu/homes/arvind/cs422/doc/hoare.pdf

(1) I had never seen this before, just read it... absolutely brilliant. It's good to see someone so smart/famous stating pretty much every principle of programming I've learned (the hard way) over the years. - rmeador
@rmeador: I was awed when I first read this paper, and whenever I revisit it, I am impressed by Tony Hoare and his at-times almost painful honesty. Anyone who contemplates managing a software project needs to read it, and sleep on it, and read it again. - Adam Crossland
I found it very interesting that - after a major project failure - he adopted what we would today call an Agile approach - way back in 1965! - Tom Bushell
14
[+4] [2010-05-19 19:09:54] jwismar

The one that I find I refer people to most often when trying to encourage sound OO fundamentals is Robert C. Martin's "The Open-Closed Principle" [1].

[1] http://www.objectmentor.com/resources/articles/ocp.pdf

15
[+4] [2010-05-19 19:40:20] Josh Stodola

Things You Should Never Do, Part I [1] by Joel Spolsky

[1] http://www.joelonsoftware.com/articles/fog0000000069.html

16
[+3] [2008-12-11 00:32:01] luqui

By far: Functional Reactive Animation [1] by Conal Elliott and Paul Hudak. It was the paper that got me interested in functional reactive programming, but more importantly it is a fantastic exposition of the design methodology called Semantic Design [2], which has had a profound impact on the way I think about software.

Pickler Combinators [3] is another one of my favorites, for its minimal elegance in solving a practical problem.

[1] http://www.haskell.org/yale/papers/icfp97/index.html
[2] http://lukepalmer.wordpress.com/2008/07/18/semantic-design/
[3] http://research.microsoft.com/~akenn/fun/picklercombinators.pdf

To be clear, functional reactive animation is since outdated with much research in the FRP community, but it is still my favorite paper on the subject. - luqui
The FRP paper was voted the most influential paper from ICFP 1997, I believe. - Norman Ramsey
17
[+3] [2008-12-11 04:02:34] Sven

of course, the answer would be any computer generated scientific paper [1]!

but joking aside, I'm (like many other probably) really intrigued by the whole P != NP [2] problem.

The first time I heard about (this was in my first bachelor year at VUB) it had a huge influence on me. I still can remember how the problem got my attention for several weeks... I kept reading and looking up information about, even foolishly tried to come up with some clever algorithms on my own. Good times it were back then!

here is an example [3] of such a paper, but there are many, many others just like it [4]

[1] http://pdos.csail.mit.edu/scigen/
[2] http://en.wikipedia.org/wiki/P_%3D_NP_problem
[3] http://publications.ias.edu/files/2007/11/w06.pdf
[4] http://www.google.be/search?hl=en&client=firefox-a&rls=com.ubuntu%3Aen-US%3Aunofficial&hs=SqH&q=np+paper+complexity&btnG=Search&meta=

18
[+3] [2008-12-11 09:25:26] qyb2zm302

Trace Trees [PDF] [1] by Andreas Gal, Michael Bebenita, Mason Chang, Michael Franz

Used by Mozilla Tracemonkey javascript interpreter, and contributed to the "javascript interpreter wars."

see also:

[1] http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-07-12.pdf
[2] http://ejohn.org/blog/tracemonkey/
[3] http://www.reddit.com/r/programming/comments/6xkuw/tracemonkey_justintime_tracing_for_javascript/
[4] http://weblogs.mozillazine.org/roadmap/archives/2008/08/tracemonkey_javascript_lightsp.html
[5] http://www.reddit.com/r/programming/comments/6xkg4/tracemonkey_javascript_lightspeed_firefoxs/
[6] http://serv2.ist.psu.edu:8080/viewdoc/summary?doi=10.1.1.85.2412

19
[+3] [2010-05-19 19:02:07] Paul

A case against the GO TO statement, by E.W. Dijkstra

http://userweb.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EWD215.html

Any of Dijkstra's papers really, they influenced so much of modern computer science.


Whats funny is that at the machine level, GOTO is essential; however, I agree that in higher level languages it shouldn't be used. - Nate
(1) @Nate IMHO never using goto is a bad idea. Breaking out of nested loops without goto leads to unreadable code. Understanding why goto shouldn't be used in the majority of cases is the best idea. - Yacoby
@Yacoby: it would be interesting to take a survey to see how many programmers would claim to never use GOTO and who also think nothing of using BREAK to get out of loops. - Adam Crossland
What I meant is that in high level languages there are better ways than using a GOTO If you find your self using GOTO alot, you may want to restructure your code. I'm not against using break; or continue; but lots of VB6 code is chuck full of GOTOs when simply changing the structure of the code would have made it more clear. - Nate
(1) I think the argument is really structured-goto vs unstructured-goto. break is an example of the former, the goto keyword is in most languages the latter. - rmeador
20
[+3] [2010-05-19 19:10:37] Gabe Moothart

MapReduce: Simplified Data Processing on Large Clusters [1]

Google's paper on a technique for massively distributed computation. This kind of thing will become a lot more important as we enter the multicore era.

[1] http://labs.google.com/papers/mapreduce.html

+1. Studied it in database course as personnal project. Great article :-). - p4bl0
21
[+3] [2010-05-19 19:33:25] Covar

My link was introduced to me from Coding Horror. Having been a TA of an introductory Computer Science course I found it fascinating.

The Camel Has Two Humps [1]

[1] http://www.eis.mdx.ac.uk/research/PhDArea/saeed/paper1.pdf

22
[+3] [2010-05-19 20:32:02] OscarRyz

How To Become A Hacker [1] by Eric S.Raymond

[1] http://catb.org/~esr/faqs/hacker-howto.html

23
[+3] [2010-05-19 20:42:10] NealB

A rational design process: How and why to fake it [1] by DL Parnas (IEEE Transactions on Software Engineering. Volume 12 , Issue 2, February 1986). Old but still relevant today.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.8358&rep=rep1&type=pdf

(1) We had an engineering manager print out copies of this and hand it out. Very good read. - Tony Arkles
24
[+2] [2008-12-11 04:28:23] mipadi

I really enjoyed A History of Erlang [1] by Joe Armstrong, Erlang's creator. It's a really interesting inside look at the creation of a new (and exciting) programming language.

[1] http://www.cs.chalmers.se/Cs/Grundutb/Kurser/ppxt/HT2007/general/languages/armstrong-erlang_history.pdf

25
[+2] [2009-12-04 00:42:00] RMatthews

It would be hard to not include "No Silver Bullet" by Fred Brooks.


26
[+2] [2010-05-19 22:39:05] mVChr

Six ways to write more comprehensible code [1]
aka How to keep your code from destroying you
by Jeff Vogel

Summary: As a developer, time is your most valuable resource. These six tips on how to write maintainable code are guaranteed to save you time and frustration: one minute spent writing comments can save you an hour of anguish.

Tips:

  1. Comment like a smart person.
  2. Define constants a lot. No, a LOT.
  3. Don't use variable names that will mock you.
  4. Do error checking. You make errors. Yes, you.
  5. "Premature optimization is the root of all evil." - Donald Knuth
  6. Don't be too clever by half.

Mostly for intermediate or those moving into professional environments, but a good read regardless.

[1] http://www.ibm.com/developerworks/linux/library/l-clear-code/

27
[+2] [2010-05-20 04:35:28] Jay

Structured Programming with go to Statements [1] by Donald Knuth.

God save the goto.

[1] http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf

28
[+2] [2010-05-21 00:17:15] Artefacto

The Real Reason Why Software Engineers Need Math [1] [pdf]

Explains why mathematics is important in the education of a software engineer.

[1] http://www.thelavinagency.com/articles_covers/Devlin/devlinarticle1.pdf

29
[+1] [2008-12-11 01:46:33] thursdaysgeek

It might not seem programming related, but it really is something that programmers should be aware of: http://www.apa.org/journals/features/psp7761121.pdf ("Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments")

Oftentimes, we only see our own knowledge, and the corresponding ignorance of someone else in our own field. We mock the "PhD Professor" who can't remember how to get their email. What we fail to see is our own ignorance in their and other fields. Once we approach other people, our customers, knowing that they have knowledge that we don't, and we have knowledge that they don't, and that our goal is to help them do their jobs better, we can communicate without arrogance and condescension.


Interesting read. - Paul Nathan
A very good paper to put (anonymously) in someones inbox... Pure evil. - Per Mildner
(1) Sure, but they probably won't understand why it was in their box! - thursdaysgeek
30
[+1] [2008-12-11 03:58:06] Scott Wisniewski

The page rank citation ranking system [1]

It describes how Google works.

Google uses it for web documents, but it's useful for pretty much any directed graph structure. For non directed graph's it's subject to manipulation, but if you "trust" the nodes in the graph, then it will work for non directed graphs too.

One interesting application of it to programing would be to index libraries in source code as a ranking system in a code / api documentation search.

It would be really cool, for example, to build an index of something like source forge (or code plex), using "page rank" between types and methods as a relevance metric.

The easiest language to do it for would be Java (because Java namespaces use reverse domain names by convention).

I wrote a prototype of something like this when I worked at Microsoft. It indexed VB source code, api documentation, and compiled .net assemblies. It also did stemming of identifier names, recogonzing camel casing, pascal casing, and underscores, and split things up into multiple groups.

It got pretty good results.

I was pretty new though (I had only been there a couple weeks), I worked on a different team (the VB team), so I wasn't very successful in convincing the MSDN help folks to adopt it. They ended up just using Windows Live Search.

In any case....

Another use for it would be for Stack Overflow.

You could, for example, treat stack overflow as a large graph, with edges from people to documents, and from documents to people. If you then weighted the edges based on reputation, you could compute page rank over the graph, and then use the page rank to sort search results.

This would produce different results than what Google does (because the edges in normal page rank are not weighted), but I think for the purposes of searching Stack Overflow questions it would yield better results than Google.

I'm willing to bet that most links into stack overflow from the "outside" are to the main page, not to individual questions.

This means from Google's point of view, most stack overflow questions and answers are pretty much equivalent.

But, if you calculated page rank, using "reputation" to weight edges, you would get relevance results that reflected the values of the Stack Overflow community.

In any case... they paper is really good. You should read it.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.9606&rep=rep1&type=pdf

Most google searches don't go to the main page. I'm consistantly finding SO questions popping up in search results - Elizabeth Buckwalter
I didn't say most Google Searches go to the main page. I was saying that the Google Results may not be as good as they could be, because Google doesn't take reputation into account. They results may very well be "good enough". I was just expressing a way that I thought they could be "better". - Scott Wisniewski
31
[+1] [2010-05-19 19:01:11] Francisco Garcia

The IEEE Software's 25th-Anniversary is a great selection:

http://www.computer.org/portal/web/computingnow/software/toppicks

But one of my all time favorites is The Humble Programmer [1] by Dijkstra

[1] http://userweb.cs.utexas.edu/~EWD/transcriptions/EWD03xx/EWD340.html

32
[+1] [2010-05-19 19:03:37] Gabe Moothart

Why Functional Programming Matters [1], by John Hughes. It's a dense read but really goes into the details of what advantages FP has.

[1] http://www.cs.chalmers.se/~rjmh/Papers/whyfp.html

There's also a great article by Reginald Braithwaite about why that paper is still relevant after all those years. It's called: Why Why Functional Programming Matters Matters. - Jörg W Mittag
33
[+1] [2010-05-19 19:25:55] Elizabeth Buckwalter

It's a slide show, which I don't usually go for, but I found it to be a great reference for database design: Join-fu: The Art of SQL Tuning for MySQL [1]

[1] http://www.slideshare.net/ZendCon/joinfu-the-art-of-sql-tuning-for-mysql-presentation

Great resource, not a paper. but certainly a nice presentation : ). Thank you Elizabeth ("even bears practice Joinfu" hahhaah)+ 1 - SDReyes
34
[+1] [2010-05-19 20:58:26] Jens Granlund

Big Ball of Mud [1]
I changed my mind, Big Ball of Mud is good, but this is probably my favorite:
Portrait of a N00b [2]

[1] http://www.laputan.org/mud/
[2] http://steve-yegge.blogspot.com/2008/02/portrait-of-n00b.html

35
[+1] [2010-05-21 01:45:59] Tim Schaeffer

Knuth: On the Translation of Languages from Left to Right

Dijkstra: Goto considered harmful

Towards a Mathematical Science of Computation http://www-formal.stanford.edu/jmc/towards.html

Recursive Functions of Symbolic Expressions and their Computation by Machine (Part I) http://www-formal.stanford.edu/jmc/recursive.html


36
[+1] [2010-05-21 13:12:01] Peter Boothe

My two "forgotten gems" are:

Engineering a sort() function - Bentley & McIlroy - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&rep=rep1&type=pdf [1]

Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas - Minsky - http://danreetz.com/ongoing_oversight_and_observation/minsky_1967.pdf

Especially the second paper.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&rep=rep1&type=pdf

37
[+1] [2010-05-21 17:21:06] Anon

Claude Shannon - "A Mathematical Theory of Communication"


Remember some stuff about Shannon from school.. but I never read the paper; maybe I should. - Nils
38
[+1] [2010-05-21 22:55:21] Hugh S. Myers

Selected Papers in Digital Typography by D. Knuth. Actually I remember a very early issue of DDJ with an article by DK that related why he was unhappy with the state of the art of textbook typography. I believe that TeX followed not long after.


39
[+1] [2011-12-11 02:39:57] mcint

A book on Warehouse Scale Computing and an over view of considerations to account for in designing, building, running and maintaining datacenters with specific examples of choices Google has made for their datacenters.

http://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006 (97 page book, published as free resource)


40
[0] [2010-05-21 11:29:26] knoopx

I particularly enjoy with whatever related to computer vision. Face detection, OCR, perspective projection... There are a lot of papers explaining strategies and algorithms and I practically find them all damn interesting.


41