Learning Theory from First Principles

Click to preview

For Instructors

Request Resources

Digital Exam/Desk Copy Print Desk Copy

Learning Theory from First Principles

by Bach

Click here to preview

Instructor Requests

Digital Exam/Desk Copy Print Desk Copy Ancillaries

A comprehensive and cutting-edge introduction to the foundations and modern applications of learning theory.

Research has exploded in the field of machine learning resulting in complex mathematical arguments that are hard to grasp for new comers. In this accessible textbook, Francis Bach presents the foundations and latest advances of learning theory for graduate students as well as researchers who want to acquire a basic mathematical understanding of the most widely used machine learning architectures. Taking the position that learning theory does not exist outside of algorithms that can be run in practice, this book focuses on the theoretical analysis of learning algorithms as it relates to their practical performance. Bach provides the simplest formulations that can be derived from first principles, constructing mathematically rigorous results and proofs without overwhelming students.

·Provides a balanced and unified treatment of most prevalent machine learning methods
·Emphasizes practical application and features only commonly used algorithmic frameworks
·Covers modern topics not found in existing texts, such as overparameterized models and structured prediction
·Integrates coverage of statistical theory, optimization theory, and approximation theory
·Focuses on adaptivity, allowing distinctions between various learning techniques
·Hands-on experiments, illustrative examples, and accompanying code link theoretical guarantees to practical behaviors

Expand/Collapse All
Cover (pg. Cover)
Contents (pg. v)
Preface (pg. xiii)
I. Preliminaries (pg. 1)
1. Mathematical Preliminaries (pg. 3)
1.1. Linear Algebra and Differentiable Calculus (pg. 3)
1.2. Concentration Inequalities (pg. 7)
2. Introduction to Supervised Learning (pg. 21)
2.1. From Training Data to Predictions (pg. 22)
2.2. Decision Theory (pg. 25)
2.3. Learning from Data (pg. 30)
2.4. Statistical Learning Theory (pg. 36)
2.5. "No Free Lunch" Theorems (♦) (pg. 38)
2.6. Quest for Adaptivity (pg. 39)
2.7. Beyond Supervised Learning (pg. 40)
2.8. Summary–Book Outline (pg. 41)
3. Linear Least-Squares Regression (pg. 45)
3.1. Introduction (pg. 45)
3.2. Least-Squares Framework (pg. 46)
3.3. Ordinary Least-Squares Estimator (pg. 47)
3.4. Statistical Analysis of Ordinary Least-Squares (pg. 49)
3.5. Fixed Design Setting (pg. 50)
3.6. Ridge Least-Squares Regression (pg. 56)
3.7. Lower Bound (♦) (pg. 60)
3.8. Random Design Analysis (pg. 63)
3.9. Principal Component Analysis (♦) (pg. 66)
3.10. Conclusion (pg. 68)
II. Generalization Bounds for Learning Algorithms (pg. 69)
4. Empirical Risk Minimization (pg. 71)
4.1. Convexification of the Risk (pg. 72)
4.2. Risk Minimization Decomposition (pg. 84)
4.3. Approximation Error (pg. 84)
4.4. Estimation Error (pg. 85)
4.5. Rademacher Complexity (pg. 91)
4.6. Model Selection (♦) (pg. 103)
4.7. Relation with Asymptotic Statistics (♦) (pg. 105)
4.8. Summary (pg. 107)
5. Optimization for Machine Learning (pg. 109)
5.1. Optimization in Machine Learning (pg. 109)
5.2. Gradient Descent (pg. 111)
5.3. Gradient Methods on Nonsmooth Problems (pg. 130)
5.4. Stochastic Gradient Descent (pg. 134)
5.5. Conclusion (pg. 151)
6. Local Averaging Methods (pg. 155)
6.1. Introduction (pg. 155)
6.2. Local Averaging Methods (pg. 157)
6.3. Generic Simplest Consistency Analysis (pg. 163)
6.4. Universal Consistency (♦) (pg. 174)
6.5. Adaptivity (♦♦) (pg. 177)
6.6. Conclusion (pg. 178)
7. Kernel Methods (pg. 179)
7.1. Introduction (pg. 180)
7.2. Representer Theorem (pg. 181)
7.3. Kernels (pg. 183)
7.4. Algorithms (pg. 196)
7.5. Generalization Guarantees–Lipschitz-continuous Losses (pg. 202)
7.6. Theoretical Analysis of Ridge Regression (♦) (pg. 208)
7.7. Experiments (pg. 218)
7.8. Conclusion (pg. 220)
8. Sparse Methods (pg. 221)
8.1. Introduction (pg. 221)
8.2. Variable Selection by the l0-penalty (pg. 226)
8.3. Variable Selection by l1-regularization (pg. 231)
8.4. Experiments (pg. 243)
8.5 Extensions (pg. 243)
8.6. Conclusion (pg. 245)
9. Neural Networks (pg. 247)
9.1. Introduction (pg. 247)
9.2. Single Hidden-Layer Neural Network (pg. 249)
9.3. Approximation Properties (pg. 256)
9.4. Generalization Performance for Neural Networks (pg. 269)
9.5. Relationship with Kernel Methods (♦) (pg. 271)
9.6. Experiments (pg. 277)
9.7. Extensions (pg. 278)
9.8. Conclusion (pg. 279)
III. Special Topics (pg. 281)
10. Ensemble Learning (pg. 283)
10.1. Averaging/Bagging (pg. 284)
10.2. Random Projections and Averaging (pg. 288)
10.3. Boosting (pg. 298)
10.4. Conclusion (pg. 311)
11. From Online Learning to Bandits (pg. 313)
11.1. First-Order Online Convex Optimization (pg. 315)
11.2. Zeroth-Order Convex Optimization (pg. 323)
11.3. Multiarmed Bandits (pg. 331)
11.4. Conclusion (pg. 341)
12. Overparameterized Models (pg. 343)
12.1. Implicit Bias of Gradient Descent (pg. 344)
12.2. Double Descent (pg. 355)
12.3. Global Convergence of Gradient Descent (pg. 365)
12.4. Lazy Regime and Neural Tangent Kernels (♦) (pg. 375)
12.5. Conclusion (pg. 377)
13. Structured Prediction (pg. 379)
13.1. Multicategory Classification (pg. 380)
13.2. General Setup and Examples (pg. 387)
13.3. Surrogate Methods (pg. 391)
13.4. Smooth/Quadratic Surrogates (pg. 393)
13.5. Max-Margin Formulations (pg. 398)
13.6. Generalization Bounds (♦) (pg. 402)
13.7. Experiments (pg. 404)
13.8. Conclusion (pg. 407)
14. Probabilistic Methods (pg. 409)
14.1. From Empirical Risks to Log-Likelihoods (pg. 409)
14.2. Discriminative versus Generative Models (pg. 417)
14.3. Bayesian Inference (pg. 420)
14.4. PAC-Bayesian Analysis (pg. 423)
14.5. Conclusion (pg. 426)
15. Lower Bounds (pg. 427)
15.1. Statistical Lower Bounds (pg. 428)
15.2. Optimization Lower Bounds (pg. 441)
15.3. Lower Bounds for Stochastic Gradient Descent (♦) (pg. 447)
15.4. Conclusion (pg. 449)
Conclusion (pg. 451)
References (pg. 453)
Index (pg. 471)

Francis Bach

Francis Bach is a researcher at Inria where he leads the machine learning team which is part of the Computer Science department at Ecole Normale Supé rieure. His research focuses on machine learning and optimization.

eTextbook

Go paperless today! Available online anytime, nothing to download or install.

Features

Bookmarking
Note taking
Highlighting

I have an access code.

Browser	Version
Chrome (Recommended)	Latest
Firefox	Latest
Edge	Latest
Safari	Latest

For Instructors

Request Resources

Learning Theory from First Principles

Instructor Requests

Francis Bach

Features

The MIT Press

Catalog

For Instructors

Request Resources

Learning Theory from First Principles

Instructor Requests

Francis Bach

Features

The MIT Press

Catalog

Video Title

Common Access Code Issues

Common Payment Issues