Probabilistic Machine Learning: Advanced Topics
Advanced Topics
by Murphy
 ISBN: 9780262375993  Copyright 2023
Instructor Requests
An advanced book for researchers and graduate students working in machine learning and statistics who want to learn about deep learning, Bayesian inference, generative models, and decision making under uncertainty.
An advanced counterpart to Probabilistic Machine Learning: An Introduction, this highlevel textbook provides researchers and graduate students detailed coverage of cuttingedge topics in machine learning, including deep generative modeling, graphical models, Bayesian inference, reinforcement learning, and causality. This volume puts deep learning into a larger statistical context and unifies approaches based on deep learning with ones based on probabilistic modeling and inference. With contributions from top scientists and domain experts from places such as Google, DeepMind, Amazon, Purdue University, NYU, and the University of Washington, this rigorous book is essential to understanding the vital issues in machine learning.
•Covers generation of high dimensional outputs, such as images, text, and graphs
•Discusses methods for discovering insights about data, based on latent variable models
•Considers training and testing under different distributions
•Explores how to use probabilistic models and inference for causal inference and decision making
•Features online Python code accompaniment
Expand/Collapse All  

Brief Contents (pg. vii)  
Contents (pg. ix)  
Preface (pg. xxix)  
1 Introduction (pg. 1)  
I Fundamentals (pg. 3)  
2 Probability (pg. 5)  
2.1 Introduction (pg. 5)  
2.2 Some common probability distributions (pg. 8)  
2.3 Gaussian joint distributions (pg. 22)  
2.4 The exponential family (pg. 33)  
2.5 Transformations of random variables (pg. 44)  
2.6 Markov chains (pg. 46)  
2.7 Divergence measures between probability distributions (pg. 55)  
3 Statistics (pg. 63)  
3.1 Introduction (pg. 63)  
3.2 Bayesian statistics (pg. 63)  
3.3 Frequentist statistics (pg. 72)  
3.4 Conjugate priors (pg. 83)  
3.5 Noninformative priors (pg. 102)  
3.6 Hierarchical priors (pg. 107)  
3.7 Empirical Bayes (pg. 114)  
3.8 Model selection (pg. 118)  
3.9 Model checking (pg. 127)  
3.10 Hypothesis testing (pg. 131)  
3.11 Missing data (pg. 141)  
4 Graphical models (pg. 143)  
4.1 Introduction (pg. 143)  
4.2 Directed graphical models (Bayes nets) (pg. 143)  
4.3 Undirected graphical models (Markov random fields) (pg. 164)  
4.4 Conditional random fields (CRFs) (pg. 185)  
4.5 Comparing directed and undirected PGMs (pg. 193)  
4.6 PGM extensions (pg. 201)  
4.7 Structural causal models (pg. 211)  
5 Information theory (pg. 217)  
5.1 KL divergence (pg. 217)  
5.2 Entropy (pg. 232)  
5.3 Mutual information (pg. 236)  
5.4 Data compression (source coding) (pg. 245)  
5.5 Errorcorrecting codes (channel coding) (pg. 249)  
5.6 The information bottleneck (pg. 250)  
6 Optimization (pg. 255)  
6.1 Introduction (pg. 255)  
6.2 Automatic differentiation (pg. 255)  
6.3 Stochastic optimization (pg. 265)  
6.4 Natural gradient descent (pg. 273)  
6.5 Bound optimization (MM) algorithms (pg. 281)  
6.6 Bayesian optimization (pg. 291)  
6.7 Derivativefree optimization (pg. 298)  
6.8 Optimal transport (pg. 307)  
6.9 Submodular optimization (pg. 316)  
II Inference (pg. 337)  
7 Inference algorithms: an overview (pg. 339)  
7.1 Introduction (pg. 339)  
7.2 Common inference patterns (pg. 340)  
7.3 Exact inference algorithms (pg. 342)  
7.4 Approximate inference algorithms (pg. 342)  
7.5 Evaluating approximate inference algorithms (pg. 350)  
8 Gaussian filtering and smoothing (pg. 353)  
8.1 Introduction (pg. 353)  
8.2 Inference for linearGaussian SSMs (pg. 357)  
8.3 Inference based on local linearization (pg. 369)  
8.4 Inference based on the unscented transform (pg. 373)  
8.5 Other variants of the Kalman filter (pg. 376)  
8.6 Assumed density filtering (pg. 383)  
8.7 Other inference methods for SSMs (pg. 390)  
9 Message passing algorithms (pg. 395)  
9.1 Introduction (pg. 395)  
9.2 Belief propagation on chains (pg. 395)  
9.3 Belief propagation on trees (pg. 406)  
9.4 Loopy belief propagation (pg. 411)  
9.5 The variable elimination (VE) algorithm (pg. 422)  
9.6 The junction tree algorithm (JTA) (pg. 428)  
9.7 Inference as optimization (pg. 429)  
10 Variational inference (pg. 433)  
10.1 Introduction (pg. 433)  
10.2 Gradientbased VI (pg. 439)  
10.3 Coordinate ascent VI (pg. 449)  
10.4 More accurate variational posteriors (pg. 465)  
10.5 Tighter bounds (pg. 467)  
10.6 Wakesleep algorithm (pg. 469)  
10.7 Expectation propagation (EP) (pg. 472)  
11 Monte Carlo methods (pg. 477)  
11.1 Introduction (pg. 477)  
11.2 Monte Carlo integration (pg. 477)  
11.3 Generating random samples from simple distributions (pg. 480)  
11.4 Rejection sampling (pg. 481)  
11.5 Importance sampling (pg. 484)  
11.6 Controlling Monte Carlo variance (pg. 488)  
12 Markov chain Monte Carlo (pg. 493)  
12.1 Introduction (pg. 493)  
12.2 MetropolisHastings algorithm (pg. 494)  
12.3 Gibbs sampling (pg. 499)  
12.4 Auxiliary variable MCMC (pg. 507)  
12.5 Hamiltonian Monte Carlo (HMC) (pg. 510)  
12.6 MCMC convergence (pg. 518)  
12.7 Stochastic gradient MCMC (pg. 526)  
12.8 Reversible jump (transdimensional) MCMC (pg. 530)  
12.9 Annealing methods (pg. 533)  
13 Sequential Monte Carlo (pg. 537)  
13.1 Introduction (pg. 537)  
13.2 Particle filtering (pg. 539)  
13.3 Proposal distributions (pg. 547)  
13.4 RaoBlackwellized particle filtering (RBPF) (pg. 551)  
13.5 Extensions of the particle filter (pg. 557)  
13.6 SMC samplers (pg. 557)  
III Prediction (pg. 567)  
14 Predictive models: an overview (pg. 569)  
14.1 Introduction (pg. 569)  
14.2 Evaluating predictive models (pg. 572)  
14.3 Conformal prediction (pg. 579)  
15 Generalized linear models (pg. 583)  
15.1 Introduction (pg. 583)  
15.2 Linear regression (pg. 588)  
15.3 Logistic regression (pg. 602)  
15.4 Probit regression (pg. 613)  
15.5 Multilevel (hierarchical) GLMs (pg. 617)  
16 Deep neural networks (pg. 623)  
16.1 Introduction (pg. 623)  
16.2 Building blocks of differentiable circuits (pg. 623)  
16.3 Canonical examples of neural networks (pg. 632)  
17 Bayesian neural networks (pg. 639)  
17.1 Introduction (pg. 639)  
17.2 Priors for BNNs (pg. 639)  
17.3 Posteriors for BNNs (pg. 643)  
17.4 Generalization in Bayesian deep learning (pg. 657)  
17.5 Online inference (pg. 663)  
17.6 Hierarchical Bayesian neural networks (pg. 669)  
18 Gaussian processes (pg. 673)  
18.1 Introduction (pg. 673)  
18.2 Mercer kernels (pg. 675)  
18.3 GPs with Gaussian likelihoods (pg. 685)  
18.4 GPs with nonGaussian likelihoods (pg. 693)  
18.5 Scaling GP inference to large datasets (pg. 697)  
18.6 Learning the kernel (pg. 709)  
18.7 GPs and DNNs (pg. 720)  
18.8 Gaussian processes for time series forecasting (pg. 724)  
19 Beyond the iid assumption (pg. 727)  
19.1 Introduction (pg. 727)  
19.2 Distribution shift (pg. 727)  
19.3 Detecting distribution shifts (pg. 732)  
19.4 Robustness to distribution shifts (pg. 737)  
19.5 Adapting to distribution shifts (pg. 738)  
19.6 Learning from multiple distributions (pg. 743)  
19.7 Continual learning (pg. 750)  
19.8 Adversarial examples (pg. 756)  
IV Generation (pg. 763)  
20 Generative models: an overview (pg. 765)  
20.1 Introduction (pg. 765)  
20.2 Types of generative model (pg. 765)  
20.3 Goals of generative modeling (pg. 767)  
20.4 Evaluating generative models (pg. 774)  
21 Variational autoencoders (pg. 781)  
21.1 Introduction (pg. 781)  
21.2 VAE basics (pg. 781)  
21.3 VAE generalizations (pg. 786)  
21.4 Avoiding posterior collapse (pg. 796)  
21.5 VAEs with hierarchical structure (pg. 799)  
21.6 Vector quantization VAE (pg. 805)  
22 Autoregressive models (pg. 811)  
22.1 Introduction (pg. 811)  
22.2 Neural autoregressive density estimators (NADE) (pg. 812)  
22.3 Causal CNNs (pg. 812)  
22.4 Transformers (pg. 814)  
23 Normalizing flows (pg. 819)  
23.1 Introduction (pg. 819)  
23.2 Constructing flows (pg. 822)  
23.3 Applications (pg. 836)  
24 Energybased models (pg. 839)  
24.1 Introduction (pg. 839)  
24.2 Maximum likelihood training (pg. 841)  
24.3 Score matching (SM) (pg. 846)  
24.4 Noise contrastive estimation (pg. 850)  
24.5 Other methods (pg. 852)  
25 Diffusion models (pg. 857)  
25.1 Introduction (pg. 857)  
25.2 Denoising diffusion probabilistic models (DDPMs) (pg. 857)  
25.3 Scorebased generative models (SGMs) (pg. 864)  
25.4 Continuous time models using differential equations (pg. 867)  
25.5 Speeding up diffusion models (pg. 871)  
25.6 Conditional generation (pg. 875)  
25.7 Diffusion for discrete state spaces (pg. 877)  
26 Generative adversarial networks (pg. 883)  
26.1 Introduction (pg. 883)  
26.2 Learning by comparison (pg. 884)  
26.3 Generative adversarial networks (pg. 894)  
26.4 Conditional GANs (pg. 902)  
26.5 Inference with GANs (pg. 903)  
26.6 Neural architectures in GANs (pg. 904)  
26.7 Applications (pg. 908)  
V Discovery (pg. 915)  
27 Discovery methods: an overview (pg. 917)  
27.1 Introduction (pg. 917)  
27.2 Overview of part:discovery (pg. 918)  
28 Latent factor models (pg. 919)  
28.1 Introduction (pg. 919)  
28.2 Mixture models (pg. 919)  
28.3 Factor analysis (pg. 929)  
28.4 LFMs with nonGaussian priors (pg. 949)  
28.5 Topic models (pg. 953)  
28.6 Independent components analysis (ICA) (pg. 962)  
29 Statespace models (pg. 969)  
29.1 Introduction (pg. 969)  
29.2 Hidden Markov models (HMMs) (pg. 970)  
29.3 HMMs: applications (pg. 974)  
29.4 HMMs: parameter learning (pg. 980)  
29.5 HMMs: generalizations (pg. 987)  
29.6 Linear dynamical systems (LDSs) (pg. 996)  
29.7 LDS: applications (pg. 997)  
29.8 LDS: parameter learning (pg. 1001)  
29.9 Switching linear dynamical systems (SLDSs) (pg. 1005)  
29.10 Nonlinear SSMs (pg. 1009)  
29.11 NonGaussian SSMs (pg. 1010)  
29.12 Structural time series models (pg. 1012)  
29.13 Deep SSMs (pg. 1025)  
30 Graph learning (pg. 1031)  
30.1 Introduction (pg. 1031)  
30.2 Latent variable models for graphs (pg. 1031)  
30.3 Graphical model structure learning (pg. 1031)  
31 Nonparametric Bayesian models (pg. 1035)  
31.1 Introduction (pg. 1035)  
32 Representation learning (pg. 1037)  
32.1 Introduction (pg. 1037)  
32.2 Evaluating and comparing learned representations (pg. 1037)  
32.3 Approaches for learning representations (pg. 1044)  
32.4 Theory of representation learning (pg. 1057)  
33 Interpretability (pg. 1061)  
33.1 Introduction (pg. 1061)  
33.2 Methods for interpretable machine learning (pg. 1066)  
33.3 Properties: the abstraction between context and method (pg. 1074)  
33.4 Evaluation of interpretable machine learning models (pg. 1077)  
33.5 Discussion: how to think about interpretable machine learning (pg. 1086)  
VI Action (pg. 1091)  
34 Decision making under uncertainty (pg. 1093)  
34.1 Statistical decision theory (pg. 1093)  
34.2 Decision (influence) diagrams (pg. 1099)  
34.3 A/B testing (pg. 1103)  
34.4 Contextual bandits (pg. 1107)  
34.5 Markov decision problems (pg. 1116)  
34.6 Planning in an MDP (pg. 1120)  
34.7 Active learning (pg. 1124)  
35 Reinforcement learning (pg. 1133)  
35.1 Introduction (pg. 1133)  
35.2 Valuebased RL (pg. 1138)  
35.3 Policybased RL (pg. 1144)  
35.4 Modelbased RL (pg. 1151)  
35.5 Offpolicy learning (pg. 1158)  
35.6 Control as inference (pg. 1165)  
36 Causality (pg. 1171)  
36.1 Introduction (pg. 1171)  
36.2 Causal formalism (pg. 1173)  
36.3 Randomized control trials (pg. 1180)  
36.4 Confounder adjustment (pg. 1181)  
36.5 Instrumental variable strategies (pg. 1195)  
36.6 Difference in differences (pg. 1202)  
36.7 Credibility checks (pg. 1206)  
36.8 The docalculus (pg. 1215)  
36.9 Further reading (pg. 1218)  
Index (pg. 1218)  
Bibliography (pg. 1221) 
Kevin P. Murphy
eTextbook
Go paperless today! Available online anytime, nothing to download or install.
Features

Printed Textbook
Are you looking to purchase a new book? Buy direct and save!
