What are your favorite programming-related / cs academic published papers?
It could be a functional pearl, a programming-language paper like those often cited on lambda-the-ultimate.org [1], etc. Really, anything vaguely related to programming. Please also explain why you like the paper.
For this audience I think I have to go with John Hughes's seminal paper Why Functional Programming Matters [1] because it is densely packed with great ideas and because the topic of the paper is programming. I like the paper because any programmer can pick it up and read it and most will come away excited by the ideas. Also, there are lots of examples in the paper, so you can immediately pick up some kind of Haskell implementation and start trying the ideas for yourself right away.
For the academics in the crowd, I'll also give a shout-out to Tony Hoare's great paper Proof of Correctness of Data Representations [2]. It's intensely mathematical, but it is the canonical reference for a critically important and poorly understood technique: abstract data types. If we all understood about abstraction functions and representation invariants, the world would be a far better place. Heck, I'd just settle for getting people to document their representation invariants and leaving me to figure out the abstraction functions for myself. (Some of Tony's best papers have been collected into a book called Essays in Computing Science [3]. Tony is one of my heroes and this collection has some fabulous chapters including Tony's wonderful Turing lecture on 'The Emperor's New Clothes; a talk he gave to a lay audience on what programming is; a proof of a simple program, finding the kth largest element in an array; and many other nice things. Even if you're not mathy you can pick up this book and skip the mathy chapters and still enjoy it a lot.)
[1] http://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdfHere are the papers that come to mind:
fold
, you'll be after reading this paper.I have in mind a somewhat forgotten paper: Structured Programming with go to Statements [1], Donald Knuth, ACM Computing Surveys, Vol 6, No. 4, Dec. 1974.
It's actually a rebuttal to the "GOTO considered harmful" creed (saying that "GOTO is sometimes okay, really"), a view that is now out of fashion (to say the least!), but it also says some important things about the practice of programming. Most famously, it is the
source
[2] of the quote "Premature optimization is the root of all evil". :-)
[It's also instructive to read the quote in its original context, to see what Knuth did not mean.]
Quite related is Knuth's Computer Programming as an Art, Knuth’s Turing Award lecture (1974), printed in Communications of the ACM, Volume 17, Issue 12, Dec. 1974, which is available e.g. here [3].
[1] http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdfOh there are so many fantastic papers. Purely Functional Data Structures [1] is great. I'm still reading it, but so far it has been an excellent read. The implementations of the data structures are short and beautiful.
[1] http://www.cs.cmu.edu/~rwh/theses/okasaki.pdfI can't believe nobody has included the (possibly) most referenced paper on this site:
Go To Statement Considered Harmful
[1] - Edsger W. Dijkstra
Some of my favorites:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
[2] - David Goldberg
A Personal Computer For Children of All Ages
[3] - Alan Kay (written in 1972, also a
Wired interview with Kay
[4])
Skip Lists: A Probabilistic Alternative to Balanced Trees
[5] - William Pugh
G'day,
While not purely academic Edsger Dijkstra's paper " The Humble Progammer [1]", his Turing Lecture from 1972, is an excellent paper to read. My favourite quote is
The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.
HTH
cheers,
Rob
[1] http://www.cs.utexas.edu/users/EWD/transcriptions/EWD03xx/EWD340.htmlBig Ball of Mud [1]
Abstract:
While much attention has been focused on high-level software architectural patterns, what is, in effect, the de-facto standard software architecture is seldom discussed. This paper examines this most frequently deployed of software architectures: the BIG BALL OF MUD. A BIG BALL OF MUD is a casually, even haphazardly, structured system. Its organization, if one can call it that, is dictated more by expediency than design. Yet, its enduring popularity cannot merely be indicative of a general disregard for architecture.
These patterns explore the forces that encourage the emergence of a BIG BALL OF MUD, and the undeniable effectiveness of this approach to software architecture. What are the people who build them doing right? If more high-minded architectural approaches are to compete, we must understand what the forces that lead to a BIG BALL OF MUD are, and examine alternative ways to resolve them.
A number of additional patterns emerge out of the BIG BALL OF MUD. We discuss them in turn. Two principal questions underlie these patterns: Why are so many existing systems architecturally undistinguished, and what can we do to improve them?
Why do I like it? It's an irreverent but useful look at the software development world as it actually exists.
[1] http://www.laputan.org/mud/Gotta throw in Leslie Lamport's Time, Clocks, and the Ordering of Events in a Distributed System [1]. Not so much for the clock sync part, but for the distributed event ordering. First paper I know of to talk about what causality really means in a distributed system. None of today's big webapps would really work without that :-).
[1] http://research.microsoft.com/users/lamport/pubs/time-clocks.pdfDon't mean to get a little too classic on you, but I've gotta say Turing's On Computable Numbers, with an Application to the Entscheidungsproblem [1] has got to be, at least I would consider, one of the most significant (even if only historically) papers in CS. And Charles Petzold's The Annotated Turing [2] does an excellent job of illustrating the significance, as well as making the mathematics understandable, even to a mildly-mathematical programmer such as myself. In fact, I'm pretty sure Jeff mentioned it in one of the Stackoverflow podcasts.
[1] http://www.thocp.net/biographies/papers/turing_oncomputablenumbers_1936.pdfI like Google's MapReduce [1] paper. If you've ever wondered how they process all that data then this is the place to go. It touches on functional programming and distributed processing, which are two of the big topics these days.
[1] http://labs.google.com/papers/mapreduce.htmlKen Thompsons's Reflections on Trusting Trust [1] gets me every time I read it.
[1] http://cm.bell-labs.com/who/ken/trust.htmlFred Brook's No Silver Bullet — Essence and Accidents of Software Engineering [1] is probably my personal favorite.
Edit: Changed the link to point to the actual paper
[1] http://www.virtualschool.edu/mon/SoftwareEngineering/BrooksNoSilverBullet.htmlC.A.R. Hoare's 1980 Turing Award Lecture: The Emperor's Old Clothes [1]
[1] http://www.cs.yale.edu/homes/arvind/cs422/doc/hoare.pdfThe one that I find I refer people to most often when trying to encourage sound OO fundamentals is Robert C. Martin's "The Open-Closed Principle" [1].
[1] http://www.objectmentor.com/resources/articles/ocp.pdfThings You Should Never Do, Part I [1] by Joel Spolsky
[1] http://www.joelonsoftware.com/articles/fog0000000069.htmlBy far: Functional Reactive Animation [1] by Conal Elliott and Paul Hudak. It was the paper that got me interested in functional reactive programming, but more importantly it is a fantastic exposition of the design methodology called Semantic Design [2], which has had a profound impact on the way I think about software.
Pickler Combinators [3] is another one of my favorites, for its minimal elegance in solving a practical problem.
[1] http://www.haskell.org/yale/papers/icfp97/index.htmlof course, the answer would be any computer generated scientific paper [1]!
but joking aside, I'm (like many other probably) really intrigued by the whole P != NP [2] problem.
The first time I heard about (this was in my first bachelor year at VUB) it had a huge influence on me. I still can remember how the problem got my attention for several weeks... I kept reading and looking up information about, even foolishly tried to come up with some clever algorithms on my own. Good times it were back then!
here is an example [3] of such a paper, but there are many, many others just like it [4]
[1] http://pdos.csail.mit.edu/scigen/Trace Trees [PDF] [1] by Andreas Gal, Michael Bebenita, Mason Chang, Michael Franz
Used by Mozilla Tracemonkey javascript interpreter, and contributed to the "javascript interpreter wars."
see also:
August 22nd, 2008: TraceMonkey Just-in-Time Tracing for JavaScript (6-22x faster than Firefox 3) [2] ( via reddit [3])
August 23, 2008: TraceMonkey: JavaScript Lightspeed. Firefox's Tracing JIT JavaScript engine has landed [4] (via reddit [5])
The CiteSeer entry [6] for the paper
A case against the GO TO statement, by E.W. Dijkstra
http://userweb.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EWD215.html
Any of Dijkstra's papers really, they influenced so much of modern computer science.
GOTO
If you find your self using GOTO
alot, you may want to restructure your code. I'm not against using break;
or continue;
but lots of VB6 code is chuck full of GOTO
s when simply changing the structure of the code would have made it more clear. - Nate
break
is an example of the former, the goto
keyword is in most languages the latter. - rmeador
MapReduce: Simplified Data Processing on Large Clusters [1]
Google's paper on a technique for massively distributed computation. This kind of thing will become a lot more important as we enter the multicore era.
[1] http://labs.google.com/papers/mapreduce.htmlMy link was introduced to me from Coding Horror. Having been a TA of an introductory Computer Science course I found it fascinating.
[1] http://www.eis.mdx.ac.uk/research/PhDArea/saeed/paper1.pdfA rational design process: How and why to fake it [1] by DL Parnas (IEEE Transactions on Software Engineering. Volume 12 , Issue 2, February 1986). Old but still relevant today.
[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.8358&rep=rep1&type=pdfI really enjoyed A History of Erlang [1] by Joe Armstrong, Erlang's creator. It's a really interesting inside look at the creation of a new (and exciting) programming language.
[1] http://www.cs.chalmers.se/Cs/Grundutb/Kurser/ppxt/HT2007/general/languages/armstrong-erlang_history.pdfIt would be hard to not include "No Silver Bullet" by Fred Brooks.
Six ways to write more comprehensible code
[1]
aka How to keep your code from destroying you
by Jeff Vogel
Summary: As a developer, time is your most valuable resource. These six tips on how to write maintainable code are guaranteed to save you time and frustration: one minute spent writing comments can save you an hour of anguish.
Tips:
Mostly for intermediate or those moving into professional environments, but a good read regardless.
[1] http://www.ibm.com/developerworks/linux/library/l-clear-code/Structured Programming with go to Statements [1] by Donald Knuth.
God save the goto.
[1] http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdfThe Real Reason Why Software Engineers Need Math [1] [pdf]
Explains why mathematics is important in the education of a software engineer.
[1] http://www.thelavinagency.com/articles_covers/Devlin/devlinarticle1.pdfIt might not seem programming related, but it really is something that programmers should be aware of: http://www.apa.org/journals/features/psp7761121.pdf ("Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments")
Oftentimes, we only see our own knowledge, and the corresponding ignorance of someone else in our own field. We mock the "PhD Professor" who can't remember how to get their email. What we fail to see is our own ignorance in their and other fields. Once we approach other people, our customers, knowing that they have knowledge that we don't, and we have knowledge that they don't, and that our goal is to help them do their jobs better, we can communicate without arrogance and condescension.
The page rank citation ranking system [1]
It describes how Google works.
Google uses it for web documents, but it's useful for pretty much any directed graph structure. For non directed graph's it's subject to manipulation, but if you "trust" the nodes in the graph, then it will work for non directed graphs too.
One interesting application of it to programing would be to index libraries in source code as a ranking system in a code / api documentation search.
It would be really cool, for example, to build an index of something like source forge (or code plex), using "page rank" between types and methods as a relevance metric.
The easiest language to do it for would be Java (because Java namespaces use reverse domain names by convention).
I wrote a prototype of something like this when I worked at Microsoft. It indexed VB source code, api documentation, and compiled .net assemblies. It also did stemming of identifier names, recogonzing camel casing, pascal casing, and underscores, and split things up into multiple groups.
It got pretty good results.
I was pretty new though (I had only been there a couple weeks), I worked on a different team (the VB team), so I wasn't very successful in convincing the MSDN help folks to adopt it. They ended up just using Windows Live Search.
In any case....
Another use for it would be for Stack Overflow.
You could, for example, treat stack overflow as a large graph, with edges from people to documents, and from documents to people. If you then weighted the edges based on reputation, you could compute page rank over the graph, and then use the page rank to sort search results.
This would produce different results than what Google does (because the edges in normal page rank are not weighted), but I think for the purposes of searching Stack Overflow questions it would yield better results than Google.
I'm willing to bet that most links into stack overflow from the "outside" are to the main page, not to individual questions.
This means from Google's point of view, most stack overflow questions and answers are pretty much equivalent.
But, if you calculated page rank, using "reputation" to weight edges, you would get relevance results that reflected the values of the Stack Overflow community.
In any case... they paper is really good. You should read it.
[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.9606&rep=rep1&type=pdfThe IEEE Software's 25th-Anniversary is a great selection:
http://www.computer.org/portal/web/computingnow/software/toppicks
But one of my all time favorites is The Humble Programmer [1] by Dijkstra
[1] http://userweb.cs.utexas.edu/~EWD/transcriptions/EWD03xx/EWD340.htmlWhy Functional Programming Matters [1], by John Hughes. It's a dense read but really goes into the details of what advantages FP has.
[1] http://www.cs.chalmers.se/~rjmh/Papers/whyfp.htmlIt's a slide show, which I don't usually go for, but I found it to be a great reference for database design: Join-fu: The Art of SQL Tuning for MySQL [1]
[1] http://www.slideshare.net/ZendCon/joinfu-the-art-of-sql-tuning-for-mysql-presentation
Big Ball of Mud
[1]
I changed my mind, Big Ball of Mud is good, but this is probably my favorite:
Portrait of a N00b
[2]
Knuth: On the Translation of Languages from Left to Right
Dijkstra: Goto considered harmful
Towards a Mathematical Science of Computation http://www-formal.stanford.edu/jmc/towards.html
Recursive Functions of Symbolic Expressions and their Computation by Machine (Part I) http://www-formal.stanford.edu/jmc/recursive.html
My two "forgotten gems" are:
Engineering a sort() function - Bentley & McIlroy - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&rep=rep1&type=pdf [1]
Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas - Minsky - http://danreetz.com/ongoing_oversight_and_observation/minsky_1967.pdf
Especially the second paper.
[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&rep=rep1&type=pdfClaude Shannon - "A Mathematical Theory of Communication"
Selected Papers in Digital Typography by D. Knuth. Actually I remember a very early issue of DDJ with an article by DK that related why he was unhappy with the state of the art of textbook typography. I believe that TeX followed not long after.
A book on Warehouse Scale Computing and an over view of considerations to account for in designing, building, running and maintaining datacenters with specific examples of choices Google has made for their datacenters.
http://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006 (97 page book, published as free resource)
I particularly enjoy with whatever related to computer vision. Face detection, OCR, perspective projection... There are a lot of papers explaining strategies and algorithms and I practically find them all damn interesting.