Understanding Deep Learning
by Prince
 ISBN: 9780262377096  Copyright 2023
Instructor Requests
An authoritative, accessible, and uptodate treatment of deep learning that strikes a pragmatic middle ground between theory and practice.
Deep learning is a fastmoving field with sweeping relevance in today's increasingly digital world. Understanding Deep Learning provides an authoritative, accessible, and uptodate treatment of the subject, covering all the key topics along with recent advances and cuttingedge concepts. Many deep learning texts are crowded with technical details that obscure fundamentals, but Simon Prince ruthlessly curates only the most important ideas to provide a high density of critical information in an intuitive and digestible form. From machine learning basics to advanced models, each concept is presented in lay terms and then detailed precisely in mathematical form and illustrated visually. The result is a lucid, selfcontained textbook suitable for anyone with a basic background in applied mathematics.
•Uptodate treatment of deep learning covers cuttingedge topics not found in existing texts, such as transformers and diffusion models
•Short, focused chapters progress in complexity, easing students into difficult concepts
•Pragmatic approach straddling theory and practice gives readers the level of detail required to implement naive versions of models
•Streamlined presentation separates critical ideas from background context and extraneous detail
•Minimal mathematical prerequisites, extensive illustrations, and practice problems make challenging material widely accessible
Programming exercises offered in accompanying Python Notebooks
Expand/Collapse All  

Cover (pg. C1)  
Contents (pg. vii)  
Preface (pg. xiii)  
Acknowledgments (pg. xv)  
1 Introduction (pg. 1)  
1.1 Supervised learning (pg. 1)  
1.2 Unsupervised learning (pg. 7)  
1.3 Reinforcement learning (pg. 11)  
1.4 Ethics (pg. 12)  
1.5 Structure of book (pg. 15)  
1.6 Other books (pg. 15)  
1.7 How to read this book (pg. 16)  
2 Supervised learning (pg. 17)  
2.1 Supervised learning overview (pg. 17)  
2.2 Linear regression example (pg. 18)  
2.3 Summary (pg. 22)  
Notes (pg. 23)  
Problems (pg. 24)  
3 Shallow neural networks (pg. 25)  
3.1 Neural network example (pg. 25)  
3.2 Universal approximation theorem (pg. 29)  
3.3 Multivariate inputs and outputs (pg. 30)  
3.4 Shallow neural networks: general case (pg. 33)  
3.5 Terminology (pg. 35)  
3.6 Summary (pg. 36)  
Notes (pg. 36)  
Problems (pg. 39)  
4 Deep neural networks (pg. 41)  
4.1 Composing neural networks (pg. 41)  
4.2 From composing networks to deep networks (pg. 43)  
4.3 Deep neural networks (pg. 45)  
4.4 Matrix notation (pg. 48)  
4.5 Shallow vs. deep neural networks (pg. 49)  
4.6 Summary (pg. 52)  
Notes (pg. 52)  
Problems (pg. 53)  
5 Loss functions (pg. 56)  
5.1 Maximum likelihood (pg. 56)  
5.2 Recipe for constructing loss functions (pg. 60)  
5.3 Example 1: univariate regression (pg. 61)  
5.4 Example 2: binary classification (pg. 64)  
5.5 Example 3: multiclass classification (pg. 67)  
5.6 Multiple outputs (pg. 69)  
5.7 Crossentropy loss (pg. 71)  
5.8 Summary (pg. 72)  
Notes (pg. 73)  
Problems (pg. 74)  
6 Fitting models (pg. 77)  
6.1 Gradient descent (pg. 77)  
6.2 Stochastic gradient descent (pg. 83)  
6.3 Momentum (pg. 86)  
6.4 Adam (pg. 88)  
6.5 Training algorithm hyperparameters (pg. 91)  
6.6 Summary (pg. 91)  
Notes (pg. 91)  
Problems (pg. 94)  
7 Gradients and initialization (pg. 96)  
7.2 Computing derivatives (pg. 97)  
7.1 Problem definitions (pg. 96)  
7.3 Toy example (pg. 100)  
7.4 Backpropagation algorithm (pg. 103)  
7.5 Parameter initialization (pg. 107)  
7.6 Example training code (pg. 111)  
7.7 Summary (pg. 111)  
Notes (pg. 113)  
Problems (pg. 114)  
8 Measuring performance (pg. 118)  
8.2 Sources of error (pg. 120)  
8.1 Training a simple model (pg. 118)  
8.3 Reducing error (pg. 124)  
8.4 Double descent (pg. 127)  
8.5 Choosing hyperparameters (pg. 132)  
8.6 Summary (pg. 133)  
Notes (pg. 133)  
Problems (pg. 136)  
9 Regularization (pg. 138)  
9.2 Implicit regularization (pg. 141)  
9.1 Explicit regularization (pg. 138)  
9.3 Heuristics to improve performance (pg. 144)  
9.4 Summary (pg. 154)  
Notes (pg. 155)  
Problems (pg. 160)  
10 Convolutional networks (pg. 161)  
10.2 Convolutional networks for 1D inputs (pg. 163)  
10.1 Invariance and equivariance (pg. 161)  
10.3 Convolutional networks for 2D inputs (pg. 170)  
10.4 Downsampling and upsampling (pg. 171)  
10.5 Applications (pg. 174)  
10.6 Summary (pg. 179)  
Notes (pg. 180)  
Problems (pg. 184)  
11 Residual networks (pg. 186)  
11.1 Sequential processing (pg. 186)  
11.2 Residual connections and residual blocks (pg. 189)  
11.3 Exploding gradients in residual networks (pg. 192)  
11.4 Batch normalization (pg. 192)  
11.5 Common residual architectures (pg. 195)  
11.6 Why do nets with residual connections perform so well? (pg. 199)  
11.7 Summary (pg. 199)  
Notes (pg. 201)  
Problems (pg. 205)  
12 Transformers (pg. 207)  
12.1 Processing text data (pg. 207)  
12.2 Dotproduct selfattention (pg. 208)  
12.3 Extensions to dotproduct selfattention (pg. 213)  
12.4 Transformers (pg. 215)  
12.5 Transformers for natural language processing (pg. 216)  
12.6 Encoder model example: BERT (pg. 219)  
12.7 Decoder model example: GPT3 (pg. 222)  
12.8 Encoderdecoder model example: machine translation (pg. 226)  
12.9 Transformers for long sequences (pg. 227)  
12.10 Transformers for images (pg. 228)  
12.11 Summary (pg. 232)  
Notes (pg. 232)  
Problems (pg. 239)  
13 Graph neural networks (pg. 240)  
13.1 What is a graph? (pg. 240)  
13.2 Graph representation (pg. 243)  
13.3 Graph neural networks, tasks, and loss functions (pg. 245)  
13.4 Graph convolutional networks (pg. 248)  
13.5 Example: graph classification (pg. 251)  
13.6 Inductive vs. transductive models (pg. 252)  
13.7 Example: node classification (pg. 253)  
13.8 Layers for graph convolutional networks (pg. 256)  
13.9 Edge graphs (pg. 260)  
13.10 Summary (pg. 261)  
Notes (pg. 261)  
Problems (pg. 266)  
14 Unsupervised learning (pg. 268)  
14.1 Taxonomy of unsupervised learning models (pg. 268)  
14.2 What makes a good generative model? (pg. 269)  
14.3 Quantifying performance (pg. 271)  
14.4 Summary (pg. 273)  
Notes (pg. 273)  
15 Generative Adversarial Networks (pg. 275)  
15.1 Discrimination as a signal (pg. 275)  
15.2 Improving stability (pg. 280)  
15.3 Progressive growing, minibatch discrimination, and truncation (pg. 286)  
15.4 Conditional generation (pg. 288)  
15.5 Image translation (pg. 290)  
15.6 StyleGAN (pg. 295)  
15.7 Summary (pg. 297)  
Notes (pg. 298)  
Problems (pg. 302)  
16 Normalizing flows (pg. 303)  
16.1 1D example (pg. 303)  
16.2 General case (pg. 306)  
16.3 Invertible network layers (pg. 308)  
16.4 Multiscale flows (pg. 316)  
16.5 Applications (pg. 317)  
16.6 Summary (pg. 320)  
Notes (pg. 321)  
Problems (pg. 324)  
17 Variational autoencoders (pg. 326)  
17.1 Latent variable models (pg. 326)  
17.2 Nonlinear latent variable model (pg. 327)  
17.3 Training (pg. 330)  
17.4 ELBO properties (pg. 333)  
17.5 Variational approximation (pg. 335)  
17.6 The variational autoencoder (pg. 335)  
17.7 The reparameterization trick (pg. 338)  
17.8 Applications (pg. 339)  
17.9 Summary (pg. 342)  
Notes (pg. 343)  
Problems (pg. 346)  
18 Diffusion models (pg. 348)  
18.1 Overview (pg. 348)  
18.2 Encoder (forward process) (pg. 349)  
18.3 Decoder model (reverse process) (pg. 355)  
18.4 Training (pg. 356)  
18.5 Reparameterization of loss function (pg. 360)  
18.6 Implementation (pg. 362)  
18.7 Summary (pg. 367)  
Notes (pg. 367)  
Problems (pg. 371)  
19 Reinforcement learning (pg. 373)  
19.1 Markov decision processes, returns, and policies (pg. 373)  
19.2 Expected return (pg. 377)  
19.3 Tabular reinforcement learning (pg. 381)  
19.4 Fitted Qlearning (pg. 385)  
19.5 Policy gradient methods (pg. 388)  
19.6 Actorcritic methods (pg. 393)  
19.7 Offline reinforcement learning (pg. 394)  
19.8 Summary (pg. 395)  
Notes (pg. 396)  
Problems (pg. 399)  
20 Why does deep learning work? (pg. 401)  
20.1 The case against deep learning (pg. 401)  
20.2 Factors that influence fitting performance (pg. 402)  
20.3 Properties of loss functions (pg. 406)  
20.4 Factors that determine generalization (pg. 410)  
20.5 Do we need so many parameters? (pg. 414)  
20.6 Do networks have to be deep? (pg. 417)  
20.7 Summary (pg. 418)  
Problems (pg. 419)  
21 Deep learning and ethics (pg. 420)  
21.1 Value alignment (pg. 420)  
21.2 Intentional misuse (pg. 426)  
21.3 Other social, ethical, and professional issues (pg. 428)  
21.4 Case study (pg. 430)  
21.5 The valuefree ideal of science (pg. 431)  
21.6 Responsible AI research as a collective action problem (pg. 432)  
21.7 Ways forward (pg. 433)  
21.8 Summary (pg. 434)  
Problems (pg. 435)  
Appendix A: Notation (pg. 436)  
Appendix B: Mathematics (pg. 439)  
B.1 Functions (pg. 439)  
B.2 Binomial coefficients (pg. 441)  
B.3 Vector, matrices, and tensors (pg. 442)  
B.4 Special types of matrix (pg. 445)  
B.5 Matrix calculus (pg. 447)  
Appendix C: Probability (pg. 448)  
C.1 Random variables and probability distributions (pg. 448)  
C.2 Expectation (pg. 452)  
C.3 Normal probability distribution (pg. 456)  
C.4 Sampling (pg. 459)  
C.5 Distances between probability distributions (pg. 459)  
Bibliography (pg. 462)  
Index (pg. 513) 
Simon J. D. Prince
Instructors Only  

You must have an instructor account and submit a request to access instructor materials for this book.

eTextbook
Go paperless today! Available online anytime, nothing to download or install.
Features
