Probabilistic Machine Learning: Advanced Topics

Advanced Topics

by Murphy

ISBN: 978-0-262-04843-9 | Copyright 2023

Click here to preview

Instructor Requests

Digital Exam/Desk Copy Print Desk Copy Ancillaries
Tabs

An advanced book for researchers and graduate students working in machine learning and statistics who want to learn about deep learning, Bayesian inference, generative models, and decision making under uncertainty.

An advanced counterpart to Probabilistic Machine Learning: An Introduction, this high-level textbook provides researchers and graduate students detailed coverage of cutting-edge topics in machine learning, including deep generative modeling, graphical models, Bayesian inference, reinforcement learning, and causality. This volume puts deep learning into a larger statistical context and unifies approaches based on deep learning with ones based on probabilistic modeling and inference. With contributions from top scientists and domain experts from places such as Google, DeepMind, Amazon, Purdue University, NYU, and the University of Washington, this rigorous book is essential to understanding the vital issues in machine learning.


•Covers generation of high dimensional outputs, such as images, text, and graphs
•Discusses methods for discovering insights about data, based on latent variable models
•Considers training and testing under different distributions
•Explores how to use probabilistic models and inference for causal inference and decision making
•Features online Python code accompaniment

Expand/Collapse All
Brief Contents (pg. vii)
Contents (pg. ix)
Preface (pg. xxix)
1 Introduction (pg. 1)
I Fundamentals (pg. 3)
2 Probability (pg. 5)
2.1 Introduction (pg. 5)
2.2 Some common probability distributions (pg. 8)
2.3 Gaussian joint distributions (pg. 22)
2.4 The exponential family (pg. 33)
2.5 Transformations of random variables (pg. 44)
2.6 Markov chains (pg. 46)
2.7 Divergence measures between probability distributions (pg. 55)
3 Statistics (pg. 63)
3.1 Introduction (pg. 63)
3.2 Bayesian statistics (pg. 63)
3.3 Frequentist statistics (pg. 72)
3.4 Conjugate priors (pg. 83)
3.5 Noninformative priors (pg. 102)
3.6 Hierarchical priors (pg. 107)
3.7 Empirical Bayes (pg. 114)
3.8 Model selection (pg. 118)
3.9 Model checking (pg. 127)
3.10 Hypothesis testing (pg. 131)
3.11 Missing data (pg. 141)
4 Graphical models (pg. 143)
4.1 Introduction (pg. 143)
4.2 Directed graphical models (Bayes nets) (pg. 143)
4.3 Undirected graphical models (Markov random fields) (pg. 164)
4.4 Conditional random fields (CRFs) (pg. 185)
4.5 Comparing directed and undirected PGMs (pg. 193)
4.6 PGM extensions (pg. 201)
4.7 Structural causal models (pg. 211)
5 Information theory (pg. 217)
5.1 KL divergence (pg. 217)
5.2 Entropy (pg. 232)
5.3 Mutual information (pg. 236)
5.4 Data compression (source coding) (pg. 245)
5.5 Error-correcting codes (channel coding) (pg. 249)
5.6 The information bottleneck (pg. 250)
6 Optimization (pg. 255)
6.1 Introduction (pg. 255)
6.2 Automatic differentiation (pg. 255)
6.3 Stochastic optimization (pg. 265)
6.4 Natural gradient descent (pg. 273)
6.5 Bound optimization (MM) algorithms (pg. 281)
6.6 Bayesian optimization (pg. 291)
6.7 Derivative-free optimization (pg. 298)
6.8 Optimal transport (pg. 307)
6.9 Submodular optimization (pg. 316)
II Inference (pg. 337)
7 Inference algorithms: an overview (pg. 339)
7.1 Introduction (pg. 339)
7.2 Common inference patterns (pg. 340)
7.3 Exact inference algorithms (pg. 342)
7.4 Approximate inference algorithms (pg. 342)
7.5 Evaluating approximate inference algorithms (pg. 350)
8 Gaussian filtering and smoothing (pg. 353)
8.1 Introduction (pg. 353)
8.2 Inference for linear-Gaussian SSMs (pg. 357)
8.3 Inference based on local linearization (pg. 369)
8.4 Inference based on the unscented transform (pg. 373)
8.5 Other variants of the Kalman filter (pg. 376)
8.6 Assumed density filtering (pg. 383)
8.7 Other inference methods for SSMs (pg. 390)
9 Message passing algorithms (pg. 395)
9.1 Introduction (pg. 395)
9.2 Belief propagation on chains (pg. 395)
9.3 Belief propagation on trees (pg. 406)
9.4 Loopy belief propagation (pg. 411)
9.5 The variable elimination (VE) algorithm (pg. 422)
9.6 The junction tree algorithm (JTA) (pg. 428)
9.7 Inference as optimization (pg. 429)
10 Variational inference (pg. 433)
10.1 Introduction (pg. 433)
10.2 Gradient-based VI (pg. 439)
10.3 Coordinate ascent VI (pg. 449)
10.4 More accurate variational posteriors (pg. 465)
10.5 Tighter bounds (pg. 467)
10.6 Wake-sleep algorithm (pg. 469)
10.7 Expectation propagation (EP) (pg. 472)
11 Monte Carlo methods (pg. 477)
11.1 Introduction (pg. 477)
11.2 Monte Carlo integration (pg. 477)
11.3 Generating random samples from simple distributions (pg. 480)
11.4 Rejection sampling (pg. 481)
11.5 Importance sampling (pg. 484)
11.6 Controlling Monte Carlo variance (pg. 488)
12 Markov chain Monte Carlo (pg. 493)
12.1 Introduction (pg. 493)
12.2 Metropolis-Hastings algorithm (pg. 494)
12.3 Gibbs sampling (pg. 499)
12.4 Auxiliary variable MCMC (pg. 507)
12.5 Hamiltonian Monte Carlo (HMC) (pg. 510)
12.6 MCMC convergence (pg. 518)
12.7 Stochastic gradient MCMC (pg. 526)
12.8 Reversible jump (transdimensional) MCMC (pg. 530)
12.9 Annealing methods (pg. 533)
13 Sequential Monte Carlo (pg. 537)
13.1 Introduction (pg. 537)
13.2 Particle filtering (pg. 539)
13.3 Proposal distributions (pg. 547)
13.4 Rao-Blackwellized particle filtering (RBPF) (pg. 551)
13.5 Extensions of the particle filter (pg. 557)
13.6 SMC samplers (pg. 557)
III Prediction (pg. 567)
14 Predictive models: an overview (pg. 569)
14.1 Introduction (pg. 569)
14.2 Evaluating predictive models (pg. 572)
14.3 Conformal prediction (pg. 579)
15 Generalized linear models (pg. 583)
15.1 Introduction (pg. 583)
15.2 Linear regression (pg. 588)
15.3 Logistic regression (pg. 602)
15.4 Probit regression (pg. 613)
15.5 Multilevel (hierarchical) GLMs (pg. 617)
16 Deep neural networks (pg. 623)
16.1 Introduction (pg. 623)
16.2 Building blocks of differentiable circuits (pg. 623)
16.3 Canonical examples of neural networks (pg. 632)
17 Bayesian neural networks (pg. 639)
17.1 Introduction (pg. 639)
17.2 Priors for BNNs (pg. 639)
17.3 Posteriors for BNNs (pg. 643)
17.4 Generalization in Bayesian deep learning (pg. 657)
17.5 Online inference (pg. 663)
17.6 Hierarchical Bayesian neural networks (pg. 669)
18 Gaussian processes (pg. 673)
18.1 Introduction (pg. 673)
18.2 Mercer kernels (pg. 675)
18.3 GPs with Gaussian likelihoods (pg. 685)
18.4 GPs with non-Gaussian likelihoods (pg. 693)
18.5 Scaling GP inference to large datasets (pg. 697)
18.6 Learning the kernel (pg. 709)
18.7 GPs and DNNs (pg. 720)
18.8 Gaussian processes for time series forecasting (pg. 724)
19 Beyond the iid assumption (pg. 727)
19.1 Introduction (pg. 727)
19.2 Distribution shift (pg. 727)
19.3 Detecting distribution shifts (pg. 732)
19.4 Robustness to distribution shifts (pg. 737)
19.5 Adapting to distribution shifts (pg. 738)
19.6 Learning from multiple distributions (pg. 743)
19.7 Continual learning (pg. 750)
19.8 Adversarial examples (pg. 756)
IV Generation (pg. 763)
20 Generative models: an overview (pg. 765)
20.1 Introduction (pg. 765)
20.2 Types of generative model (pg. 765)
20.3 Goals of generative modeling (pg. 767)
20.4 Evaluating generative models (pg. 774)
21 Variational autoencoders (pg. 781)
21.1 Introduction (pg. 781)
21.2 VAE basics (pg. 781)
21.3 VAE generalizations (pg. 786)
21.4 Avoiding posterior collapse (pg. 796)
21.5 VAEs with hierarchical structure (pg. 799)
21.6 Vector quantization VAE (pg. 805)
22 Autoregressive models (pg. 811)
22.1 Introduction (pg. 811)
22.2 Neural autoregressive density estimators (NADE) (pg. 812)
22.3 Causal CNNs (pg. 812)
22.4 Transformers (pg. 814)
23 Normalizing flows (pg. 819)
23.1 Introduction (pg. 819)
23.2 Constructing flows (pg. 822)
23.3 Applications (pg. 836)
24 Energy-based models (pg. 839)
24.1 Introduction (pg. 839)
24.2 Maximum likelihood training (pg. 841)
24.3 Score matching (SM) (pg. 846)
24.4 Noise contrastive estimation (pg. 850)
24.5 Other methods (pg. 852)
25 Diffusion models (pg. 857)
25.1 Introduction (pg. 857)
25.2 Denoising diffusion probabilistic models (DDPMs) (pg. 857)
25.3 Score-based generative models (SGMs) (pg. 864)
25.4 Continuous time models using differential equations (pg. 867)
25.5 Speeding up diffusion models (pg. 871)
25.6 Conditional generation (pg. 875)
25.7 Diffusion for discrete state spaces (pg. 877)
26 Generative adversarial networks (pg. 883)
26.1 Introduction (pg. 883)
26.2 Learning by comparison (pg. 884)
26.3 Generative adversarial networks (pg. 894)
26.4 Conditional GANs (pg. 902)
26.5 Inference with GANs (pg. 903)
26.6 Neural architectures in GANs (pg. 904)
26.7 Applications (pg. 908)
V Discovery (pg. 915)
27 Discovery methods: an overview (pg. 917)
27.1 Introduction (pg. 917)
27.2 Overview of part:discovery (pg. 918)
28 Latent factor models (pg. 919)
28.1 Introduction (pg. 919)
28.2 Mixture models (pg. 919)
28.3 Factor analysis (pg. 929)
28.4 LFMs with non-Gaussian priors (pg. 949)
28.5 Topic models (pg. 953)
28.6 Independent components analysis (ICA) (pg. 962)
29 State-space models (pg. 969)
29.1 Introduction (pg. 969)
29.2 Hidden Markov models (HMMs) (pg. 970)
29.3 HMMs: applications (pg. 974)
29.4 HMMs: parameter learning (pg. 980)
29.5 HMMs: generalizations (pg. 987)
29.6 Linear dynamical systems (LDSs) (pg. 996)
29.7 LDS: applications (pg. 997)
29.8 LDS: parameter learning (pg. 1001)
29.9 Switching linear dynamical systems (SLDSs) (pg. 1005)
29.10 Nonlinear SSMs (pg. 1009)
29.11 Non-Gaussian SSMs (pg. 1010)
29.12 Structural time series models (pg. 1012)
29.13 Deep SSMs (pg. 1025)
30 Graph learning (pg. 1031)
30.1 Introduction (pg. 1031)
30.2 Latent variable models for graphs (pg. 1031)
30.3 Graphical model structure learning (pg. 1031)
31 Nonparametric Bayesian models (pg. 1035)
31.1 Introduction (pg. 1035)
32 Representation learning (pg. 1037)
32.1 Introduction (pg. 1037)
32.2 Evaluating and comparing learned representations (pg. 1037)
32.3 Approaches for learning representations (pg. 1044)
32.4 Theory of representation learning (pg. 1057)
33 Interpretability (pg. 1061)
33.1 Introduction (pg. 1061)
33.2 Methods for interpretable machine learning (pg. 1066)
33.3 Properties: the abstraction between context and method (pg. 1074)
33.4 Evaluation of interpretable machine learning models (pg. 1077)
33.5 Discussion: how to think about interpretable machine learning (pg. 1086)
VI Action (pg. 1091)
34 Decision making under uncertainty (pg. 1093)
34.1 Statistical decision theory (pg. 1093)
34.2 Decision (influence) diagrams (pg. 1099)
34.3 A/B testing (pg. 1103)
34.4 Contextual bandits (pg. 1107)
34.5 Markov decision problems (pg. 1116)
34.6 Planning in an MDP (pg. 1120)
34.7 Active learning (pg. 1124)
35 Reinforcement learning (pg. 1133)
35.1 Introduction (pg. 1133)
35.2 Value-based RL (pg. 1138)
35.3 Policy-based RL (pg. 1144)
35.4 Model-based RL (pg. 1151)
35.5 Off-policy learning (pg. 1158)
35.6 Control as inference (pg. 1165)
36 Causality (pg. 1171)
36.1 Introduction (pg. 1171)
36.2 Causal formalism (pg. 1173)
36.3 Randomized control trials (pg. 1180)
36.4 Confounder adjustment (pg. 1181)
36.5 Instrumental variable strategies (pg. 1195)
36.6 Difference in differences (pg. 1202)
36.7 Credibility checks (pg. 1206)
36.8 The do-calculus (pg. 1215)
36.9 Further reading (pg. 1218)
Index (pg. 1218)
Bibliography (pg. 1221)

Kevin P. Murphy

Kevin P. Murphy is a Research Scientist at Google in Mountain View, California, where he works on artificial intelligence, machine learning, and Bayesian modeling.

eTextbook
Go paperless today! Available online anytime, nothing to download or install.

Features

  • Bookmarking
  • Note taking
  • Highlighting
Printed Textbook
Are you looking to purchase a new book? Buy direct and save!