## Probabilistic Machine Learning

### An Introduction

by Murphy

| ISBN: 9780262369312 | Copyright 2021

### Instructor Requests

A detailed and up-to-date introduction to machine learning, presented through the unifying lens of probabilistic modeling and Bayesian decision theory.

This book offers a detailed and up-to-date introduction to machine learning (including deep learning) through the unifying lens of probabilistic modeling and Bayesian decision theory. The book covers mathematical background (including linear algebra and optimization), basic supervised learning (including linear and logistic regression and deep neural networks), as well as more advanced topics (including transfer learning and unsupervised learning). End-of-chapter exercises allow students to apply what they have learned, and an appendix covers notation.

Probabilistic Machine Learning grew out of the author's 2012 book, Machine Learning: A Probabilistic Perspective. More than just a simple update, this is a completely new book that reflects the dramatic developments in the field since 2012, most notably deep learning. In addition, the new book is accompanied by online Python code, using libraries such as scikit-learn, JAX, PyTorch, and Tensorflow, which can be used to reproduce nearly all the figures; this code can be run inside a web browser using cloud-based notebooks, and provides a practical complement to the theoretical topics discussed in the book. This introductory text will be followed by a sequel that covers more advanced topics, taking the same probabilistic approach.

Expand/Collapse All
Brief Contents (pg. vii)
Contents (pg. ix)
Preface (pg. xxvii)
1 Introduction (pg. 1)
1.1 What is machine learning? (pg. 1)
1.2 Supervised learning (pg. 1)
1.3 Unsupervised learning (pg. 14)
1.4 Reinforcement learning (pg. 17)
1.5 Data (pg. 19)
1.6 Discussion (pg. 27)
Part I Foundations (pg. 29)
2 Probability: Univariate Models (pg. 31)
2.1 Introduction (pg. 31)
2.2 Random variables (pg. 33)
2.3 Bayes’ rule (pg. 43)
2.4 Bernoulli and binomial distributions (pg. 47)
2.5 Categorical and multinomial distributions (pg. 51)
2.6 Univariate Gaussian (normal) distribution (pg. 55)
2.7 Some other common univariate distributions * (pg. 59)
2.8 Transformations of random variables * (pg. 64)
2.9 Exercises (pg. 71)
3 Probability: Multivariate Models (pg. 75)
3.1 Joint distributions for multiple random variables (pg. 75)
3.2 The multivariate Gaussian (normal) distribution (pg. 79)
3.3 Linear Gaussian systems * (pg. 84)
3.4 The exponential family * (pg. 90)
3.5 Mixture models (pg. 93)
3.6 Probabilistic graphical models * (pg. 96)
3.7 Exercises (pg. 100)
4 Statistics (pg. 103)
4.1 Introduction (pg. 103)
4.2 Maximum likelihood estimation (MLE) (pg. 103)
4.3 Empirical risk minimization (ERM) (pg. 111)
4.4 Other estimation methods * (pg. 112)
4.5 Regularization (pg. 116)
4.6 Bayesian statistics * (pg. 124)
4.7 Frequentist statistics * (pg. 150)
4.8 Exercises (pg. 160)
5 Decision Theory (pg. 163)
5.1 Bayesian decision theory (pg. 163)
5.2 Bayesian hypothesis testing (pg. 175)
5.3 Frequentist decision theory (pg. 182)
5.4 Empirical risk minimization (pg. 186)
5.5 Frequentist hypothesis testing * (pg. 191)
5.6 Exercises (pg. 197)
6 Information Theory (pg. 199)
6.1 Entropy (pg. 199)
6.2 Relative entropy (KL divergence) * (pg. 205)
6.3 Mutual information * (pg. 209)
6.4 Exercises (pg. 218)
7 Linear Algebra (pg. 221)
7.1 Introduction (pg. 221)
7.2 Matrix multiplication (pg. 234)
7.3 Matrix inversion (pg. 241)
7.4 Eigenvalue decomposition (EVD) (pg. 245)
7.5 Singular value decomposition (SVD) (pg. 251)
7.6 Other matrix decompositions * (pg. 256)
7.7 Solving systems of linear equations * (pg. 258)
7.8 Matrix calculus (pg. 261)
7.9 Exercises (pg. 266)
8 Optimization (pg. 269)
8.1 Introduction (pg. 269)
8.2 First-order methods (pg. 276)
8.3 Second-order methods (pg. 283)
8.4 Stochastic gradient descent (pg. 286)
8.5 Constrained optimization (pg. 295)
8.6 Proximal gradient method * (pg. 301)
8.7 Bound optimization * (pg. 306)
8.8 Blackbox and derivative free optimization (pg. 313)
8.9 Exercises (pg. 314)
Part II Linear Models (pg. 315)
9 Linear Discriminant Analysis (pg. 317)
9.1 Introduction (pg. 317)
9.2 Gaussian discriminant analysis (pg. 317)
9.3 Naive Bayes classifiers (pg. 326)
9.4 Generative vs discriminative classifiers (pg. 330)
9.5 Exercises (pg. 332)
10 Logistic Regression (pg. 333)
10.1 Introduction (pg. 333)
10.2 Binary logistic regression (pg. 333)
10.3 Multinomial logistic regression (pg. 344)
10.4 Robust logistic regression * (pg. 353)
10.5 Bayesian logistic regression * (pg. 357)
10.6 Exercises (pg. 361)
11 Linear Regression (pg. 365)
11.1 Introduction (pg. 365)
11.2 Least squares linear regression (pg. 365)
11.3 Ridge regression (pg. 375)
11.4 Lasso regression (pg. 379)
11.5 Regression splines * (pg. 393)
11.6 Robust linear regression * (pg. 396)
11.7 Bayesian linear regression * (pg. 399)
11.8 Exercises (pg. 405)
12 Generalized Linear Models * (pg. 409)
12.1 Introduction (pg. 409)
12.2 Examples (pg. 409)
12.3 GLMs with non-canonical link functions (pg. 411)
12.4 Maximum likelihood estimation (pg. 412)
12.5 Worked example: predicting insurance claims (pg. 413)
Part III Deep Neural Networks (pg. 417)
13 Neural Networks for Structured Data (pg. 419)
13.1 Introduction (pg. 419)
13.2 Multilayer perceptrons (MLPs) (pg. 420)
13.3 Backpropagation (pg. 432)
13.4 Training neural networks (pg. 440)
13.5 Regularization (pg. 448)
13.6 Other kinds of feedforward networks * (pg. 453)
13.7 Exercises (pg. 457)
14 Neural Networks for Images (pg. 461)
14.1 Introduction (pg. 461)
14.2 Common layers (pg. 462)
14.3 Common architectures for image classification (pg. 473)
14.4 Other forms of convolution * (pg. 479)
14.5 Solving other discriminative vision tasks with CNNs * (pg. 482)
14.6 Generating images by inverting CNNs * (pg. 487)
15 Neural Networks for Sequences (pg. 497)
15.1 Introduction (pg. 497)
15.2 Recurrent neural networks (RNNs) (pg. 497)
15.3 1d CNNs (pg. 510)
15.4 Attention (pg. 512)
15.5 Transformers (pg. 520)
15.6 Efficient transformers * (pg. 527)
15.7 Language models and unsupervised representation learning (pg. 531)
Part IV Nonparametric Models (pg. 539)
16 Exemplar-based Methods (pg. 541)
16.1 K nearest neighbor (KNN) classification (pg. 541)
16.2 Learning distance metrics (pg. 545)
16.3 Kernel density estimation (KDE) (pg. 554)
17 Kernel Methods * (pg. 561)
17.1 Mercer kernels (pg. 561)
17.2 Gaussian processes (pg. 568)
17.3 Support vector machines (SVMs) (pg. 579)
17.4 Sparse vector machines (pg. 591)
17.5 Exercises (pg. 595)
18 Trees, Forests, Bagging, and Boosting (pg. 597)
18.1 Classification and regression trees (CART) (pg. 597)
18.2 Ensemble learning (pg. 602)
18.3 Bagging (pg. 603)
18.4 Random forests (pg. 604)
18.5 Boosting (pg. 605)
18.6 Interpreting tree ensembles (pg. 614)
Part V Beyond Supervised Learning (pg. 619)
19 Learning with Fewer Labeled Examples (pg. 621)
19.1 Data augmentation (pg. 621)
19.2 Transfer learning (pg. 622)
19.3 Semi-supervised learning (pg. 632)
19.4 Active learning (pg. 644)
19.5 Meta-learning (pg. 645)
19.6 Few-shot learning (pg. 647)
19.7 Weakly supervised learning (pg. 649)
19.8 Exercises (pg. 649)
20 Dimensionality Reduction (pg. 651)
20.1 Principal components analysis (PCA) (pg. 651)
20.2 Factor analysis * (pg. 660)
20.3 Autoencoders (pg. 673)
20.4 Manifold learning * (pg. 682)
20.5 Word embeddings (pg. 699)
20.6 Exercises (pg. 706)
21 Clustering (pg. 709)
21.1 Introduction (pg. 709)
21.2 Hierarchical agglomerative clustering (pg. 711)
21.3 K means clustering (pg. 716)
21.4 Clustering using mixture models (pg. 723)
21.5 Spectral clustering * (pg. 728)
21.6 Biclustering * (pg. 731)
22 Recommender Systems (pg. 735)
22.1 Explicit feedback (pg. 735)
22.2 Implicit feedback (pg. 741)
22.3 Leveraging side information (pg. 743)
23 Graph Embeddings * (pg. 747)
23.1 Introduction (pg. 747)
23.2 Graph Embedding as an Encoder/Decoder Problem (pg. 748)
23.3 Shallow graph embeddings (pg. 750)
23.4 Graph Neural Networks (pg. 756)
23.5 Deep graph embeddings (pg. 759)
23.6 Applications (pg. 763)
A Notation (pg. 767)
A.1 Introduction (pg. 767)
A.2 Common mathematical symbols (pg. 767)
A.3 Functions (pg. 768)
A.4 Linear algebra (pg. 769)
A.5 Optimization (pg. 770)
A.6 Probability (pg. 771)
A.7 Information theory (pg. 771)
A.8 Statistics and machine learning (pg. 772)
A.9 Abbreviations (pg. 773)
Index (pg. 775)
Bibliography (pg. 793)

#### Kevin P. Murphy

Kevin P. Murphy is a Research Scientist at Google in Mountain View, California, where he works on AI, machine learning, computer vision, and natural language understanding.

Instructors Only
You must have an instructor account and submit a request to access instructor materials for this book.
eTextbook