share
Stack OverflowOverwhelmed by Machine Learning---is there an ML101 book?
[+187] [13] StackUnderflow
[2009-02-28 21:34:52]
[ machine-learning data-mining ]
[ http://stackoverflow.com/questions/598726/overwhelmed-by-machine-learning-is-there-an-ml101-book ] [DELETED]

It seems like there are so many subfields linked to Machine Learning. Is there a book or a blog that gives an overview of those different fields and what each of them do, maybe how to get started, and what background knowledge is required?

(30) +1 good question. I would be interested in this as well - Erik Ahlswede
(12) It's laughable how many good, useful questions are closed on SO. This question has 155 upvotes and 234 stars at the time of this writing, and the accepted answer has 153 upvotes. - weberc2
If your not into math and are into programming, I suggest you look at this: karpathy.github.io/neuralnets - Karl Morrison
[+174] [2009-02-28 22:08:10] Jeff Moser [ACCEPTED]

Here's the best description I've ever heard of Machine Learning:

Machine learning is actually a software method. It's a way to generate software. So, it uses statistics but it's fundamentally... it's almost like a compiler. You use data to produce programs. - John Platt, Distinguished Scientist at Microsoft Research [1] in his Future of AI series talk (2:17:53) [2]

Some even argue that " everything that algorithms was to computer science 15 years ago, machine learning is today [3]."

For more details, I'd recommend starting out with a fun intro to what's possible such as Peter Norvig's Theorizing from Data [4] talk, a peek at what DeepMind is doing [5], or more recently the Future of AI series of talks [6] (that I quoted from above).

Next get your hands dirty with Jeremy Howard's " Getting In Shape For The Sport of Data Science [7]." It's a great pragmatic overview of actually working with data.

Once you've played around a bit, watch Ben Hamner's " Machine Learning Gremlins [8]" for a nice pragmatic disclaimer of what can easily go wrong when doing machine learning.

I wrote a blog post " Computing Your Skill [9]" after spending months trying to understand TrueSkill [10], the ML system that does matchmaking and ranking on Xbox Live. The post goes into some foundational statistics needed for further study in machine learning.

Perhaps the best way to learn is to just try it. One approach is to try a Kaggle [11] competition that sounds interesting to you. Even though I don't do great on the leaderboards there, I always learn things when I try a competition.

After that you've done the above, I'd then recommend something more formal like Andrew Ng's online class [12]. It's at the college level, but approachable. If you've done all the above steps, you'll be more motivated to not give up when you hit some harder things.

As you continue, you'll learn about things such as R [13] and its many packages [14], SciPy [15], Cross Validation [16], Bayesian thinking [17], Deep [18] Learning [19], and much [20] much more [21].

DISCLAIMER: I work at Kaggle and several of the above links mention Kaggle, but I believe they're a fantastic place to start.

[1] http://research.microsoft.com/en-us/people/jplatt/
[2] http://new.livestream.com/gigaom/FutureofAI
[3] http://nlpers.blogspot.com/2014/10/machine-learning-is-new-algorithms.html?m=1
[4] http://www.facebook.com/video/video.php?v=644326502463
[5] https://www.youtube.com/watch?v=EfGD2qveGdQ
[6] http://new.livestream.com/gigaom/FutureofAI
[7] http://blog.kaggle.com/2011/03/23/getting-in-shape-for-the-sport-of-data-sciencetalk-by-jeremy-howard/
[8] https://www.youtube.com/watch?v=tleeC-KlsKA&feature=youtu.be
[9] http://www.moserware.com/2010/03/computing-your-skill.html
[10] http://research.microsoft.com/en-us/projects/trueskill/
[11] https://www.kaggle.com/
[12] http://www.ml-class.org/
[13] https://www.coursera.org/course/compdata
[14] http://cran.us.r-project.org/
[15] http://www.scipy.org/
[16] http://en.wikipedia.org/wiki/Cross-validation_(statistics)
[17] http://www.greenteapress.com/thinkbayes/
[18] http://neuralnetworksanddeeplearning.com/
[19] http://karpathy.github.io/neuralnets/
[20] https://www.youtube.com/playlist?list=PLZSO_6-bSqHQCIYxE3ycGLXHMjK3XV7Iz
[21] http://hunch.net/?p=2714

1
[+42] [2009-03-01 03:56:24] Imran

videolectures.net has a large collection of Machine Learning videos [1] . One very good technical introductory lecture on the site is Machine Learning, Probability and Graphical Models [2] by Sam Roweis.

A good overview of the field is Tom Mitchell's seminar The Discipline and Future of Machine Learning [3]. Here is a direct link to the video [mov] [4]. And the Syllabus [5] page has a good list of recommended texts:

[1] http://videolectures.net/Top/Computer_Science/Machine_Learning/
[2] http://videolectures.net/mlss06tw_roweis_mlpgm/
[3] http://calendar.cs.cmu.edu/mlSeminar/3403.html
[4] http://www.ml.cmu.edu/seminars/Mitchell_lecture.3.07.mov
[5] http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-867Fall-2006/Syllabus/index.htm
[6] http://rads.stackoverflow.com/amzn/click/0198538642
[7] http://rads.stackoverflow.com/amzn/click/0471056693
[8] http://rads.stackoverflow.com/amzn/click/0387952845
[9] http://www.inference.phy.cam.ac.uk/mackay/itila/book.html
[10] http://rads.stackoverflow.com/amzn/click/0070428077

(1) +1 for The Elements of Statistical Learning. That book is a great resource. Just looking at the pictures gives you an overview of how these techniques work, and then you can dive into the math when you feel up to it. - Zach
2
[+15] [2010-03-19 00:06:51] dmcer

Ethem Alpaydin's Introduction to Machine Learning [1] is a pretty accessible overview of the field.

If you're feeling overwhelmed by the other options you might want to start with it first.

[1] http://rads.stackoverflow.com/amzn/click/0262012111

This one is really a good book, I'd definitely suggest that. - kolistivra
3
[+12] [2009-02-28 21:43:36] Mr Fooz

Two of the best textbooks out there are:

Another good resource is MIT's Open CourseWare site for their Machine Learning class [3].

[1] http://rads.stackoverflow.com/amzn/click/0471056693
[2] http://rads.stackoverflow.com/amzn/click/0387310738
[3] http://dspace.mit.edu/handle/1721.1/46320

4
[+7] [2012-02-15 18:54:12] Tirrell Payton

I found " Programming Collective Intelligence [1]" to be the book that really helped me (with practical examples) and an "Algorithm Beastiary" at the end.

[1] http://rads.stackoverflow.com/amzn/click/0596529325

5
[+7] [2012-07-06 18:38:40] Volatil3

Dr Yaser Abu Mustafa's Intro course is also in detailed and he explained it quite well

http://work.caltech.edu/telecourse.html


6
[+6] [2009-04-01 20:44:46] theycallmemorty

Artificial Intelligence: A Modern Approach [1] is the most common text book for introductory AI courses.

Witten and Frank's book on Data Mining [2] is a little easier to digest if that topic is what appeals to you.

[1] http://rads.stackoverflow.com/amzn/click/0137903952
[2] http://rads.stackoverflow.com/amzn/click/0120884070

7
[+6] [2012-09-28 09:56:59] Matias Rasmussen

I really like the Machine Learning course on Coursera [1]. I find the short lectures very easy to digest.

[1] https://class.coursera.org/ml/lecture/preview/index

8
[+4] [2009-02-28 22:03:19] Pete

You are right to feel that there are lots of sub-fields to ML.

Machine Learning in general is basically just the idea of Algorithms which improve over time. If you're simply curious, some random topics that come to mind include:

Classification, Association analysis, Clustering, Decision Trees, Genetic Algorithms, Concept Learning

As far as books go:

I'm currently using Introduction to Data Mining [1] for a course right now. It covers quite a few of the topics I've listed above and usually has examples of algorithms/uses in each section.

You don't need too much background knowledge to understand a lot of the topics. Most algorithms have some math underlying them which is used to improve the results, and you obviously need to be comfortable with general programming/data structures.

[1] http://rads.stackoverflow.com/amzn/click/0321321367

9
[+4] [2011-12-05 14:18:43] Genjuro

i'd recommand you take a look at ml-class.org.


10
[+3] [2011-09-12 17:29:38] vikram360

I've been using 'Machine Learning: An algorithmic Perspective' by Stephen Marsland. And I think the approach is awesome. The author has put up the python code on his site. So you can actually download the code and look at it just to take a peek at how things work.

http://www-ist.massey.ac.nz/smarsland/MLbook.html


11
[+3] [2012-02-12 14:51:03] lmsasu

Try A First Encounter with Machine Learning [1], it's a freely available course for undergraduate level.

[1] http://www.ics.uci.edu/~welling/teaching/273ASpring10/IntroMLBook.pdf

12
[+2] [2009-07-29 19:56:43] unj2

The Machine Learning subreddit [1] has interesting links for all levels.

[1] http://www.reddit.com/r/MachineLearning/

13