Learning Theory from First Principles
by Bach
| ISBN: 9780262381376 | Copyright 2024
Instructor Requests
A comprehensive and cutting-edge introduction to the foundations and modern applications of learning theory.
Research has exploded in the field of machine learning resulting in complex mathematical arguments that are hard to grasp for new comers. In this accessible textbook, Francis Bach presents the foundations and latest advances of learning theory for graduate students as well as researchers who want to acquire a basic mathematical understanding of the most widely used machine learning architectures. Taking the position that learning theory does not exist outside of algorithms that can be run in practice, this book focuses on the theoretical analysis of learning algorithms as it relates to their practical performance. Bach provides the simplest formulations that can be derived from first principles, constructing mathematically rigorous results and proofs without overwhelming students.
·Provides a balanced and unified treatment of most prevalent machine learning methods
·Emphasizes practical application and features only commonly used algorithmic frameworks
·Covers modern topics not found in existing texts, such as overparameterized models and structured prediction
·Integrates coverage of statistical theory, optimization theory, and approximation theory
·Focuses on adaptivity, allowing distinctions between various learning techniques
·Hands-on experiments, illustrative examples, and accompanying code link theoretical guarantees to practical behaviors
Expand/Collapse All | |
---|---|
Cover (pg. Cover) | |
Contents (pg. v) | |
Preface (pg. xiii) | |
I. Preliminaries (pg. 1) | |
1. Mathematical Preliminaries (pg. 3) | |
1.1. Linear Algebra and Differentiable Calculus (pg. 3) | |
1.2. Concentration Inequalities (pg. 7) | |
2. Introduction to Supervised Learning (pg. 21) | |
2.1. From Training Data to Predictions (pg. 22) | |
2.2. Decision Theory (pg. 25) | |
2.3. Learning from Data (pg. 30) | |
2.4. Statistical Learning Theory (pg. 36) | |
2.5. "No Free Lunch" Theorems (♦) (pg. 38) | |
2.6. Quest for Adaptivity (pg. 39) | |
2.7. Beyond Supervised Learning (pg. 40) | |
2.8. Summary–Book Outline (pg. 41) | |
3. Linear Least-Squares Regression (pg. 45) | |
3.1. Introduction (pg. 45) | |
3.2. Least-Squares Framework (pg. 46) | |
3.3. Ordinary Least-Squares Estimator (pg. 47) | |
3.4. Statistical Analysis of Ordinary Least-Squares (pg. 49) | |
3.5. Fixed Design Setting (pg. 50) | |
3.6. Ridge Least-Squares Regression (pg. 56) | |
3.7. Lower Bound (♦) (pg. 60) | |
3.8. Random Design Analysis (pg. 63) | |
3.9. Principal Component Analysis (♦) (pg. 66) | |
3.10. Conclusion (pg. 68) | |
II. Generalization Bounds for Learning Algorithms (pg. 69) | |
4. Empirical Risk Minimization (pg. 71) | |
4.1. Convexification of the Risk (pg. 72) | |
4.2. Risk Minimization Decomposition (pg. 84) | |
4.3. Approximation Error (pg. 84) | |
4.4. Estimation Error (pg. 85) | |
4.5. Rademacher Complexity (pg. 91) | |
4.6. Model Selection (♦) (pg. 103) | |
4.7. Relation with Asymptotic Statistics (♦) (pg. 105) | |
4.8. Summary (pg. 107) | |
5. Optimization for Machine Learning (pg. 109) | |
5.1. Optimization in Machine Learning (pg. 109) | |
5.2. Gradient Descent (pg. 111) | |
5.3. Gradient Methods on Nonsmooth Problems (pg. 130) | |
5.4. Stochastic Gradient Descent (pg. 134) | |
5.5. Conclusion (pg. 151) | |
6. Local Averaging Methods (pg. 155) | |
6.1. Introduction (pg. 155) | |
6.2. Local Averaging Methods (pg. 157) | |
6.3. Generic Simplest Consistency Analysis (pg. 163) | |
6.4. Universal Consistency (♦) (pg. 174) | |
6.5. Adaptivity (♦♦) (pg. 177) | |
6.6. Conclusion (pg. 178) | |
7. Kernel Methods (pg. 179) | |
7.1. Introduction (pg. 180) | |
7.2. Representer Theorem (pg. 181) | |
7.3. Kernels (pg. 183) | |
7.4. Algorithms (pg. 196) | |
7.5. Generalization Guarantees–Lipschitz-continuous Losses (pg. 202) | |
7.6. Theoretical Analysis of Ridge Regression (♦) (pg. 208) | |
7.7. Experiments (pg. 218) | |
7.8. Conclusion (pg. 220) | |
8. Sparse Methods (pg. 221) | |
8.1. Introduction (pg. 221) | |
8.2. Variable Selection by the l0-penalty (pg. 226) | |
8.3. Variable Selection by l1-regularization (pg. 231) | |
8.4. Experiments (pg. 243) | |
8.5 Extensions (pg. 243) | |
8.6. Conclusion (pg. 245) | |
9. Neural Networks (pg. 247) | |
9.1. Introduction (pg. 247) | |
9.2. Single Hidden-Layer Neural Network (pg. 249) | |
9.3. Approximation Properties (pg. 256) | |
9.4. Generalization Performance for Neural Networks (pg. 269) | |
9.5. Relationship with Kernel Methods (♦) (pg. 271) | |
9.6. Experiments (pg. 277) | |
9.7. Extensions (pg. 278) | |
9.8. Conclusion (pg. 279) | |
III. Special Topics (pg. 281) | |
10. Ensemble Learning (pg. 283) | |
10.1. Averaging/Bagging (pg. 284) | |
10.2. Random Projections and Averaging (pg. 288) | |
10.3. Boosting (pg. 298) | |
10.4. Conclusion (pg. 311) | |
11. From Online Learning to Bandits (pg. 313) | |
11.1. First-Order Online Convex Optimization (pg. 315) | |
11.2. Zeroth-Order Convex Optimization (pg. 323) | |
11.3. Multiarmed Bandits (pg. 331) | |
11.4. Conclusion (pg. 341) | |
12. Overparameterized Models (pg. 343) | |
12.1. Implicit Bias of Gradient Descent (pg. 344) | |
12.2. Double Descent (pg. 355) | |
12.3. Global Convergence of Gradient Descent (pg. 365) | |
12.4. Lazy Regime and Neural Tangent Kernels (♦) (pg. 375) | |
12.5. Conclusion (pg. 377) | |
13. Structured Prediction (pg. 379) | |
13.1. Multicategory Classification (pg. 380) | |
13.2. General Setup and Examples (pg. 387) | |
13.3. Surrogate Methods (pg. 391) | |
13.4. Smooth/Quadratic Surrogates (pg. 393) | |
13.5. Max-Margin Formulations (pg. 398) | |
13.6. Generalization Bounds (♦) (pg. 402) | |
13.7. Experiments (pg. 404) | |
13.8. Conclusion (pg. 407) | |
14. Probabilistic Methods (pg. 409) | |
14.1. From Empirical Risks to Log-Likelihoods (pg. 409) | |
14.2. Discriminative versus Generative Models (pg. 417) | |
14.3. Bayesian Inference (pg. 420) | |
14.4. PAC-Bayesian Analysis (pg. 423) | |
14.5. Conclusion (pg. 426) | |
15. Lower Bounds (pg. 427) | |
15.1. Statistical Lower Bounds (pg. 428) | |
15.2. Optimization Lower Bounds (pg. 441) | |
15.3. Lower Bounds for Stochastic Gradient Descent (♦) (pg. 447) | |
15.4. Conclusion (pg. 449) | |
Conclusion (pg. 451) | |
References (pg. 453) | |
Index (pg. 471) |
Francis Bach
eTextbook
Go paperless today! Available online anytime, nothing to download or install.
Features
|