Machine Learning
A Probabilistic Perspective
by Murphy
ISBN: 9780262305242  Copyright 2012
Instructor Requests
Today’s Webenabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and selfcontained introduction to the field of machine learning, based on a unified, probabilistic approach.
The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudocode for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled modelbased approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package—PMTK (probabilistic modeling toolkit)—that is freely available online. The book is suitable for upperlevel undergraduates with an introductorylevel college math background and beginning graduate students.
An astonishing machine learning book: intuitive, full of examples, fun to read but still comprehensive, strong and deep! A great starting point for any university student—and a must have for anybody in the field.
Jan Peters Darmstadt University of Technology; MaxPlanck Institute for Intelligent Systems
Kevin Murphy excels at unraveling the complexities of machine learning methods while motivating the reader with a stream of illustrated examples and real world case studies. The accompanying software package includes source code for many of the figures, making it both easy and very tempting to dive in and explore these methods for yourself. A mustbuy for anyone interested in machine learning or curious about how to extract useful knowledge from big data.
John Winn Microsoft Research, Cambridge
This is a wonderful book that starts with basic topics in statistical modeling, culminating in the most advanced topics. It provides both the theoretical foundations of probabilistic machine learning as well as practical tools, in the form of Matlab code.The book should be on the shelf of any student interested in the topic, and any practitioner working in the field.
Yoram Singer Google Inc.
This book will be an essential reference for practitioners of modern machine learning. It covers the basic concepts needed to understand the field as whole, and the powerful modern methods that build on those concepts. In Machine Learning, the language of probability and statistics reveals important connections between seemingly disparate algorithms and strategies. Thus, its readers will become articulate in a holistic view of the stateoftheart and poised to build the next generation of machine learning algorithms.
David Blei Princeton University
Awards
Winner, 2013 DeGroot Prize awarded by the International Society for Bayesian Analysis
Expand/Collapse All  

Contents (pg. vii)  
Preface (pg. xxvii)  
Introduction (pg. 1)  
1.1 Machine learning: what and why? (pg. 1)  
1.2 Supervised learning (pg. 3)  
1.3 Unsupervised learning (pg. 9)  
1.4 Some basic concepts in machine learning (pg. 16)  
Exercises (pg. 25)  
Probability (pg. 27)  
2.1 Introduction (pg. 27)  
2.2 A brief review of probability theory (pg. 27)  
2.3 Some common discrete distributions (pg. 34)  
2.4 Some common continuous distributions (pg. 38)  
2.5 Joint probability distributions (pg. 44)  
2.6 Transformations of random variables (pg. 49)  
2.7 Monte Carlo approximation (pg. 53)  
2.8 Information theory (pg. 56)  
Exercises (pg. 61)  
Generative models for discrete data (pg. 67)  
3.1 Introduction (pg. 67)  
3.2 Bayesian concept learning (pg. 67)  
3.3 The betabinomial model (pg. 74)  
3.4 The Dirichletmultinomial model (pg. 80)  
3.5 Naive Bayes classifiers (pg. 84)  
Exercises (pg. 91)  
Gaussian models (pg. 99)  
4.1 Introduction (pg. 99)  
4.2 Gaussian discriminant analysis (pg. 103)  
4.3 Inference in jointly Gaussian distributions (pg. 112)  
4.4 Linear Gaussian systems (pg. 121)  
4.5 Digression: The Wishart distribution (pg. 128)  
4.6 Inferring the parameters of an MVN (pg. 129)  
Exercises (pg. 142)  
Bayesian statistics (pg. 151)  
5.1 Introduction (pg. 151)  
5.2 Summarizing posterior distributions (pg. 151)  
5.3 Bayesian model selection (pg. 157)  
5.4 Priors (pg. 167)  
5.5 Hierarchical Bayes (pg. 173)  
5.6 Empirical Bayes (pg. 174)  
5.7 Bayesian decision theory (pg. 178)  
Exercises (pg. 188)  
Frequentist statistics (pg. 193)  
6.1 Introduction (pg. 193)  
6.2 Sampling distribution of an estimator (pg. 193)  
6.3 Frequentist decision theory (pg. 197)  
6.4 Desirable properties of estimators (pg. 202)  
6.5 Empirical risk minimization (pg. 207)  
6.6 Pathologies of frequentist statistics (pg. 214)  
Exercises (pg. 218)  
Linear regression (pg. 219)  
7.1 Introduction (pg. 219)  
7.2 Model specification (pg. 219)  
7.3 Maximum likelihood estimation (least squares) (pg. 219)  
7.4 Robust linear regression (pg. 225)  
7.5 Ridge regression (pg. 227)  
7.6 Bayesian linear regression (pg. 233)  
Exercises (pg. 241)  
Logistic regression (pg. 247)  
8.1 Introduction (pg. 247)  
8.2 Model specification (pg. 247)  
8.3 Model fitting (pg. 248)  
8.4 Bayesian logistic regression (pg. 257)  
8.5 Online learning and stochastic optimization (pg. 264)  
8.6 Generative vs discriminative classifiers (pg. 270)  
Exercises (pg. 279)  
Generalized linear models and the exponential family: Generalized linear models and the exponential family (pg. 283)  
9.1 Introduction (pg. 283)  
9.2 The exponential family (pg. 283)  
9.3 Generalized linear models (GLMs) (pg. 292)  
9.4 Probit regression (pg. 295)  
9.5 Multitask learning (pg. 298)  
9.6 Generalized linear mixed models (pg. 300)  
9.7 Learning to rank (pg. 302)  
Exercises (pg. 307)  
Directed graphical models (Bayes nets) (pg. 309)  
10.1 Introduction (pg. 309)  
10.2 Examples (pg. 313)  
10.3 Inference (pg. 321)  
10.4 Learning (pg. 322)  
10.5 Conditional independence properties of DGMs (pg. 326)  
10.6 Influence (decision) diagrams (pg. 330)  
Exercises (pg. 334)  
Mixture models and the EM algorithm (pg. 339)  
11.1 Latent variable models (pg. 339)  
11.2 Mixture models (pg. 339)  
11.3 Parameter estimation for mixture models (pg. 347)  
11.4 The EM algorithm (pg. 350)  
11.5 Model selection for latent variable models (pg. 372)  
11.6 Fitting models with missing data (pg. 374)  
Exercises (pg. 376)  
Latent linear models (pg. 383)  
12.1 Factor analysis (pg. 383)  
12.2 Principal components analysis (PCA) (pg. 389)  
12.3 Choosing the number of latent dimensions (pg. 400)  
12.4 PCA for categorical data (pg. 404)  
12.5 PCA for paired and multiview data (pg. 406)  
12.6 Independent Component Analysis (ICA) (pg. 409)  
Exercises (pg. 418)  
Sparse linear models (pg. 423)  
13.1 Introduction (pg. 423)  
13.2 Bayesian variable selection (pg. 424)  
13.3 l1 regularization: basics (pg. 431)  
13.4 l1 regularization: algorithms (pg. 443)  
13.5 l1 regularization: extensions (pg. 451)  
13.6 Nonconvex regularizers (pg. 459)  
13.7 Automatic relevance determination (ARD)/sparse Bayesian learning (SBL) (pg. 465)  
13.8 Sparse coding (pg. 470)  
Exercises (pg. 476)  
Kernels (pg. 481)  
14.1 Introduction (pg. 481)  
14.2 Kernel functions (pg. 481)  
14.3 Using kernels inside GLMs (pg. 488)  
14.4 The kernel trick (pg. 490)  
14.5 Support vector machines (SVMs) (pg. 498)  
14.6 Comparison of discriminative kernel methods (pg. 507)  
14.7 Kernels for building generative models (pg. 509)  
Exercises (pg. 514)  
Gaussian processes: Gaussian processes (pg. 517)  
15.1 Introduction (pg. 517)  
15.2 GPs for regression (pg. 518)  
15.3 GPs meet GLMs (pg. 527)  
15.4 Connection with other methods (pg. 534)  
15.5 GP latent variable model (pg. 542)  
15.6 Approximation methods for large datasets (pg. 544)  
Exercises (pg. 544)  
Adaptive basis function models (pg. 545)  
16.1 Introduction (pg. 545)  
16.2 Classification and regression trees (CART) (pg. 546)  
16.3 Generalized additive models (pg. 554)  
16.4 Boosting (pg. 556)  
16.5 Feedforward neural networks (multilayer perceptrons) (pg. 565)  
16.6 Ensemble learning (pg. 582)  
16.7 Experimental comparison (pg. 584)  
16.8 Interpreting blackbox models (pg. 587)  
Exercises (pg. 589)  
Markov and hidden Markov models (pg. 591)  
17.1 Introduction (pg. 591)  
17.2 Markov models (pg. 591)  
17.3 Hidden Markov models (pg. 606)  
17.4 Inference in HMMs (pg. 608)  
17.5 Learning for HMMs (pg. 619)  
17.6 Generalizations of HMMs (pg. 624)  
Exercises (pg. 632)  
State space models (pg. 633)  
18.1 Introduction (pg. 633)  
18.2 Applications of SSMs (pg. 634)  
18.3 Inference in LGSSM (pg. 642)  
18.4 Learning for LGSSM (pg. 648)  
18.5 Approximate online inference for nonlinear, nonGaussian SSMs (pg. 649)  
18.6 Hybrid discrete/continuous SSMs (pg. 657)  
Exercises (pg. 662)  
Undirected graphical models (Markov random fields) (pg. 663)  
19.1 Introduction (pg. 663)  
19.2 Conditional independence properties of UGMs (pg. 663)  
19.3 Parameterization of MRFs (pg. 667)  
19.4 Examples of MRFs (pg. 670)  
19.5 Learning (pg. 678)  
19.6 Conditional random fields (CRFs) (pg. 686)  
19.7 Structural SVMs (pg. 696)  
Exercises (pg. 705)  
Exact inference for graphical models (pg. 709)  
20.1 Introduction (pg. 709)  
20.2 Belief propagation for trees (pg. 709)  
20.3 The variable elimination algorithm (pg. 716)  
20.4 The junction tree algorithm (pg. 722)  
20.5 Computational intractability of exact inference in the worst case (pg. 728)  
Exercises (pg. 730)  
Variational inference (pg. 733)  
21.1 Introduction (pg. 733)  
21.2 Variational inference (pg. 733)  
21.3 The mean field method (pg. 737)  
21.4 Structured mean field (pg. 741)  
21.5 Variational Bayes (pg. 744)  
21.6 Variational Bayes EM (pg. 751)  
21.7 Variational message passing and VIBES (pg. 758)  
21.8 Local variational bounds (pg. 758)  
Exercises (pg. 766)  
More variational inference (pg. 769)  
22.1 Introduction (pg. 769)  
22.2 Loopy belief propagation: algorithmic issues (pg. 769)  
22.3 Loopy belief propagation: theoretical issues (pg. 778)  
22.4 Extensions of belief propagation (pg. 785)  
22.5 Expectation propagation (pg. 789)  
22.6 MAP state estimation (pg. 801)  
Exercises (pg. 814)  
Monte Carlo inference (pg. 817)  
23.1 Introduction (pg. 817)  
23.2 Sampling from standard distributions (pg. 817)  
23.3 Rejection sampling (pg. 819)  
23.4 Importance sampling (pg. 822)  
23.5 Particle filtering (pg. 825)  
23.6 RaoBlackwellised particle filtering (RBPF) (pg. 833)  
Exercises (pg. 837)  
Markov chain Monte Carlo (MCMC)inference (pg. 839)  
24.1 Introduction (pg. 839)  
24.2 Gibbs sampling (pg. 840)  
24.3 Metropolis Hastings algorithm (pg. 850)  
24.4 Speed and accuracy of MCMC (pg. 858)  
24.5 Auxiliary variable MCMC (pg. 865)  
24.6 Annealing methods (pg. 870)  
24.7 Approximating the marginal likelihood (pg. 874)  
Exercises (pg. 875)  
Clustering (pg. 877)  
25.1 Introduction (pg. 877)  
25.2 Dirichlet process mixture models (pg. 881)  
25.3 Affinity propagation (pg. 889)  
25.4 Spectral clustering (pg. 892)  
25.5 Hierarchical clustering (pg. 895)  
25.6 Clustering datapoints and features (pg. 903)  
Graphical model structure learning (pg. 909)  
26.1 Introduction (pg. 909)  
26.2 Structure learning for knowledge discovery (pg. 910)  
26.3 Learning tree structures (pg. 912)  
26.4 Learning DAG structures (pg. 916)  
26.5 Learning DAG structure with latent variables (pg. 924)  
26.6 Learning causal DAGs (pg. 933)  
26.7 Learning undirected Gaussian graphical models (pg. 940)  
26.8 Learning undirected discrete graphical models (pg. 944)  
Exercises (pg. 945)  
Latent variable models for discrete data (pg. 949)  
27.1 Introduction (pg. 949)  
27.2 Distributed state LVMs for discrete data (pg. 950)  
27.3 Latent Dirichlet allocation (LDA) (pg. 954)  
27.4 Extensions of LDA (pg. 965)  
27.5 LVMs for graphstructured data (pg. 974)  
27.6 LVMs for relational data (pg. 979)  
27.7 Restricted Boltzmann machines (RBMs) (pg. 987)  
Exercises (pg. 997)  
Deep learning (pg. 999)  
28.1 Introduction (pg. 999)  
28.2 Deep generative models (pg. 999)  
28.3 Deep neural networks (pg. 1003)  
28.4 Applications of deep networks (pg. 1005)  
28.5 Discussion (pg. 1010)  
Notation (pg. 1013)  
Bibliography (pg. 1019)  
Index to code (pg. 1051)  
Index to keywords (pg. 1054) 
Kevin P. Murphy
Kevin P. Murphy is a Research Scientist at Google. Previously, he was Associate Professor of Computer Science and Statistics at the University of British Columbia.
Instructors  

You must have an instructor account and submit a request to access instructor materials for this book.

eTextbook
Go paperless today! Available online anytime, nothing to download or install.
Features
