Understanding Deep Learning

by Prince

ISBN: 9780262048644 | Copyright 2023

Click here to preview

Instructor Requests

Digital Exam/Desk Copy Print Desk Copy Ancillaries
Tabs

An authoritative, accessible, and up-to-date treatment of deep learning that strikes a pragmatic middle ground between theory and practice.

Deep learning is a fast-moving field with sweeping relevance in today's increasingly digital world. Understanding Deep Learning provides an authoritative, accessible, and up-to-date treatment of the subject, covering all the key topics along with recent advances and cutting-edge concepts. Many deep learning texts are crowded with technical details that obscure fundamentals, but Simon Prince ruthlessly curates only the most important ideas to provide a high density of critical information in an intuitive and digestible form. From machine learning basics to advanced models, each concept is presented in lay terms and then detailed precisely in mathematical form and illustrated visually. The result is a lucid, self-contained textbook suitable for anyone with a basic background in applied mathematics.

•Up-to-date treatment of deep learning covers cutting-edge topics not found in existing texts, such as transformers and diffusion models
•Short, focused chapters progress in complexity, easing students into difficult concepts
•Pragmatic approach straddling theory and practice gives readers the level of detail required to implement naive versions of models
•Streamlined presentation separates critical ideas from background context and extraneous detail
•Minimal mathematical prerequisites, extensive illustrations, and practice problems make challenging material widely accessible
Programming exercises offered in accompanying Python Notebooks

Expand/Collapse All
Cover (pg. C1)
Contents (pg. vii)
Preface (pg. xiii)
Acknowledgments (pg. xv)
1 Introduction (pg. 1)
1.1 Supervised learning (pg. 1)
1.2 Unsupervised learning (pg. 7)
1.3 Reinforcement learning (pg. 11)
1.4 Ethics (pg. 12)
1.5 Structure of book (pg. 15)
1.6 Other books (pg. 15)
1.7 How to read this book (pg. 16)
2 Supervised learning (pg. 17)
2.1 Supervised learning overview (pg. 17)
2.2 Linear regression example (pg. 18)
2.3 Summary (pg. 22)
Notes (pg. 23)
Problems (pg. 24)
3 Shallow neural networks (pg. 25)
3.1 Neural network example (pg. 25)
3.2 Universal approximation theorem (pg. 29)
3.3 Multivariate inputs and outputs (pg. 30)
3.4 Shallow neural networks: general case (pg. 33)
3.5 Terminology (pg. 35)
3.6 Summary (pg. 36)
Notes (pg. 36)
Problems (pg. 39)
4 Deep neural networks (pg. 41)
4.1 Composing neural networks (pg. 41)
4.2 From composing networks to deep networks (pg. 43)
4.3 Deep neural networks (pg. 45)
4.4 Matrix notation (pg. 48)
4.5 Shallow vs. deep neural networks (pg. 49)
4.6 Summary (pg. 52)
Notes (pg. 52)
Problems (pg. 53)
5 Loss functions (pg. 56)
5.1 Maximum likelihood (pg. 56)
5.2 Recipe for constructing loss functions (pg. 60)
5.3 Example 1: univariate regression (pg. 61)
5.4 Example 2: binary classification (pg. 64)
5.5 Example 3: multiclass classification (pg. 67)
5.6 Multiple outputs (pg. 69)
5.7 Cross-entropy loss (pg. 71)
5.8 Summary (pg. 72)
Notes (pg. 73)
Problems (pg. 74)
6 Fitting models (pg. 77)
6.1 Gradient descent (pg. 77)
6.2 Stochastic gradient descent (pg. 83)
6.3 Momentum (pg. 86)
6.4 Adam (pg. 88)
6.5 Training algorithm hyperparameters (pg. 91)
6.6 Summary (pg. 91)
Notes (pg. 91)
Problems (pg. 94)
7 Gradients and initialization (pg. 96)
7.2 Computing derivatives (pg. 97)
7.1 Problem definitions (pg. 96)
7.3 Toy example (pg. 100)
7.4 Backpropagation algorithm (pg. 103)
7.5 Parameter initialization (pg. 107)
7.6 Example training code (pg. 111)
7.7 Summary (pg. 111)
Notes (pg. 113)
Problems (pg. 114)
8 Measuring performance (pg. 118)
8.2 Sources of error (pg. 120)
8.1 Training a simple model (pg. 118)
8.3 Reducing error (pg. 124)
8.4 Double descent (pg. 127)
8.5 Choosing hyperparameters (pg. 132)
8.6 Summary (pg. 133)
Notes (pg. 133)
Problems (pg. 136)
9 Regularization (pg. 138)
9.2 Implicit regularization (pg. 141)
9.1 Explicit regularization (pg. 138)
9.3 Heuristics to improve performance (pg. 144)
9.4 Summary (pg. 154)
Notes (pg. 155)
Problems (pg. 160)
10 Convolutional networks (pg. 161)
10.2 Convolutional networks for 1D inputs (pg. 163)
10.1 Invariance and equivariance (pg. 161)
10.3 Convolutional networks for 2D inputs (pg. 170)
10.4 Downsampling and upsampling (pg. 171)
10.5 Applications (pg. 174)
10.6 Summary (pg. 179)
Notes (pg. 180)
Problems (pg. 184)
11 Residual networks (pg. 186)
11.1 Sequential processing (pg. 186)
11.2 Residual connections and residual blocks (pg. 189)
11.3 Exploding gradients in residual networks (pg. 192)
11.4 Batch normalization (pg. 192)
11.5 Common residual architectures (pg. 195)
11.6 Why do nets with residual connections perform so well? (pg. 199)
11.7 Summary (pg. 199)
Notes (pg. 201)
Problems (pg. 205)
12 Transformers (pg. 207)
12.1 Processing text data (pg. 207)
12.2 Dot-product self-attention (pg. 208)
12.3 Extensions to dot-product self-attention (pg. 213)
12.4 Transformers (pg. 215)
12.5 Transformers for natural language processing (pg. 216)
12.6 Encoder model example: BERT (pg. 219)
12.7 Decoder model example: GPT3 (pg. 222)
12.8 Encoder-decoder model example: machine translation (pg. 226)
12.9 Transformers for long sequences (pg. 227)
12.10 Transformers for images (pg. 228)
12.11 Summary (pg. 232)
Notes (pg. 232)
Problems (pg. 239)
13 Graph neural networks (pg. 240)
13.1 What is a graph? (pg. 240)
13.2 Graph representation (pg. 243)
13.3 Graph neural networks, tasks, and loss functions (pg. 245)
13.4 Graph convolutional networks (pg. 248)
13.5 Example: graph classification (pg. 251)
13.6 Inductive vs. transductive models (pg. 252)
13.7 Example: node classification (pg. 253)
13.8 Layers for graph convolutional networks (pg. 256)
13.9 Edge graphs (pg. 260)
13.10 Summary (pg. 261)
Notes (pg. 261)
Problems (pg. 266)
14 Unsupervised learning (pg. 268)
14.1 Taxonomy of unsupervised learning models (pg. 268)
14.2 What makes a good generative model? (pg. 269)
14.3 Quantifying performance (pg. 271)
14.4 Summary (pg. 273)
Notes (pg. 273)
15 Generative Adversarial Networks (pg. 275)
15.1 Discrimination as a signal (pg. 275)
15.2 Improving stability (pg. 280)
15.3 Progressive growing, minibatch discrimination, and truncation (pg. 286)
15.4 Conditional generation (pg. 288)
15.5 Image translation (pg. 290)
15.6 StyleGAN (pg. 295)
15.7 Summary (pg. 297)
Notes (pg. 298)
Problems (pg. 302)
16 Normalizing flows (pg. 303)
16.1 1D example (pg. 303)
16.2 General case (pg. 306)
16.3 Invertible network layers (pg. 308)
16.4 Multi-scale flows (pg. 316)
16.5 Applications (pg. 317)
16.6 Summary (pg. 320)
Notes (pg. 321)
Problems (pg. 324)
17 Variational autoencoders (pg. 326)
17.1 Latent variable models (pg. 326)
17.2 Nonlinear latent variable model (pg. 327)
17.3 Training (pg. 330)
17.4 ELBO properties (pg. 333)
17.5 Variational approximation (pg. 335)
17.6 The variational autoencoder (pg. 335)
17.7 The reparameterization trick (pg. 338)
17.8 Applications (pg. 339)
17.9 Summary (pg. 342)
Notes (pg. 343)
Problems (pg. 346)
18 Diffusion models (pg. 348)
18.1 Overview (pg. 348)
18.2 Encoder (forward process) (pg. 349)
18.3 Decoder model (reverse process) (pg. 355)
18.4 Training (pg. 356)
18.5 Reparameterization of loss function (pg. 360)
18.6 Implementation (pg. 362)
18.7 Summary (pg. 367)
Notes (pg. 367)
Problems (pg. 371)
19 Reinforcement learning (pg. 373)
19.1 Markov decision processes, returns, and policies (pg. 373)
19.2 Expected return (pg. 377)
19.3 Tabular reinforcement learning (pg. 381)
19.4 Fitted Q-learning (pg. 385)
19.5 Policy gradient methods (pg. 388)
19.6 Actor-critic methods (pg. 393)
19.7 Offline reinforcement learning (pg. 394)
19.8 Summary (pg. 395)
Notes (pg. 396)
Problems (pg. 399)
20 Why does deep learning work? (pg. 401)
20.1 The case against deep learning (pg. 401)
20.2 Factors that influence fitting performance (pg. 402)
20.3 Properties of loss functions (pg. 406)
20.4 Factors that determine generalization (pg. 410)
20.5 Do we need so many parameters? (pg. 414)
20.6 Do networks have to be deep? (pg. 417)
20.7 Summary (pg. 418)
Problems (pg. 419)
21 Deep learning and ethics (pg. 420)
21.1 Value alignment (pg. 420)
21.2 Intentional misuse (pg. 426)
21.3 Other social, ethical, and professional issues (pg. 428)
21.4 Case study (pg. 430)
21.5 The value-free ideal of science (pg. 431)
21.6 Responsible AI research as a collective action problem (pg. 432)
21.7 Ways forward (pg. 433)
21.8 Summary (pg. 434)
Problems (pg. 435)
Appendix A: Notation (pg. 436)
Appendix B: Mathematics (pg. 439)
B.1 Functions (pg. 439)
B.2 Binomial coefficients (pg. 441)
B.3 Vector, matrices, and tensors (pg. 442)
B.4 Special types of matrix (pg. 445)
B.5 Matrix calculus (pg. 447)
Appendix C: Probability (pg. 448)
C.1 Random variables and probability distributions (pg. 448)
C.2 Expectation (pg. 452)
C.3 Normal probability distribution (pg. 456)
C.4 Sampling (pg. 459)
C.5 Distances between probability distributions (pg. 459)
Bibliography (pg. 462)
Index (pg. 513)

Simon J. D. Prince

Simon J. D. Prince is Honorary Professor of Computer Science at the University of Bath and author of Computer Vision: Models, Learning and Inference. A research scientist specializing in artificial intelligence and deep learning, he has led teams of research scientists in academia and industry at Anthropics Technologies Ltd, Borealis AI, and elsewhere.

Instructors Only
You must have an instructor account and submit a request to access instructor materials for this book.
eTextbook
Go paperless today! Available online anytime, nothing to download or install.

Features

  • Bookmarking
  • Note taking
  • Highlighting