Fundamentals of Probability and Statistics for Machine Learning

by Alpaydın

| ISBN: 9780262049818 | Copyright 2025

Click here to preview

Instructor Requests

Digital Exam/Desk Copy Print Desk Copy Ancillaries
Tabs

Most curricula have students take an undergraduate course on probability and statistics before turning to machine learning. In this innovative textbook, Ethem Alpaydın offers an alternative tack by integrating these subjects for a first course on learning from data. Alpaydın accessibly connects machine learning to its roots in probability and statistics, starting with the basics of random experiments and probabilities and eventually moving to complex topics such as artificial neural networks. With a practical emphasis and learn-by-doing approach, this unique text offers comprehensive coverage of the elements fundamental to an empirical understanding of machine learning in a data science context.

  • Consolidates foundational knowledge and key techniques needed for modern data science
  • Covers mathematical fundamentals of probability and statistics and ML basics
  • Emphasizes hands-on learning
  • Suits undergraduates as well as self-learners with basic programming experience
  • Includes slides, solutions, and code
Expand/Collapse All
Contents (pg. v)
Preface (pg. xi)
1. Introduction (pg. 1)
1.1 What Is Learning from Data? (pg. 1)
1.2 Types of Learning (pg. 8)
1.3 Relationship to Statistics, Data Science, and Artificial Intelligence (pg. 13)
1.4 Social, Ethical, and Legal Aspects (pg. 14)
1.5 Notes (pg. 15)
1.6 Exercises (pg. 16)
2. Random Experiments and Probabilities (pg. 19)
2.1 Random Events (pg. 19)
2.2 What Is a Probability? (pg. 20)
2.3 Equally Likely Events (pg. 22)
2.4 Principles of Counting (pg. 23)
2.5 Some Events May Be More Equal Than Others (pg. 27)
2.6 The Additive Rule (pg. 28)
2.7 Random Variables and Probability Distributions (pg. 29)
2.8 Joint Probability Distribution of Two Random Variables (pg. 37)
2.9 Conditional Probabilities (pg. 39)
2.10 Bayes’ Rule (pg. 45)
2.11 Graphical Models (pg. 47)
2.12 Notes (pg. 62)
2.13 Exercises (pg. 62)
3. Probability Distributions (pg. 67)
3.1 Expected Value (pg. 67)
3.2 Variance (pg. 72)
3.3 Covariance, Correlation, and Independence (pg. 75)
3.4 Russian Inequalities (pg. 79)
3.5 Programming Probability Distributions (pg. 81)
3.6 Discrete Probability Distributions (pg. 82)
3.7 Continuous Probability Distributions (pg. 93)
3.8 Mixtures of Distributions (pg. 110)
3.9 Generalized Distributions (pg. 114)
3.10 Distributions of Two Random Variables (pg. 116)
3.11 Exercises (pg. 125)
4. Sampling and Estimation (pg. 131)
4.1 Population vs Sample (pg. 131)
4.2 Sample Statistics (pg. 133)
4.3 Maximum Likelihood Estimation (pg. 135)
4.4 Bias and Variance (pg. 139)
4.5 Knowledge Extraction (pg. 144)
4.6 Prediction (pg. 145)
4.7 Sampling Distributions (pg. 149)
4.8 Interval Estimation (pg. 155)
4.9 Nonparametric Estimation (pg. 166)
4.10 Monte Carlo Methods (pg. 169)
4.11 Bootstrapping (pg. 172)
4.12 Notes (pg. 176)
4.13 Exercises (pg. 177)
5. Hypothesis Testing (pg. 179)
5.1 Basic Definitions (pg. 179)
5.2 Tests On the Mean of a Population (pg. 180)
5.3 Tests on the Proportion of a Bernoulli Population (pg. 195)
5.4 Tests on the Variance of a Normal Population (pg. 199)
5.5 Comparing the Parameters of Two Populations (pg. 200)
5.6 Comparing Many Populations: Analysis of Variance (pg. 208)
5.7 Design of Experiments (pg. 212)
5.8 Goodness of Fit (pg. 215)
5.9 Nonparametric Tests (pg. 219)
5.10 Notes (pg. 222)
5.11 Exercises (pg. 222)
6. Multivariate Models (pg. 225)
6.1 Multivariate Data (pg. 225)
6.2 Multivariate Modeling (pg. 230)
6.3 Multivariate Normal Distribution (pg. 238)
6.4 Multivariate Bernoulli Distribution (pg. 244)
6.5 Principal Component Analysis (pg. 246)
6.6 Dimensionality Reduction and Class Separability (pg. 260)
6.7 Encoding/Decoding Data (pg. 261)
6.8 Feature Embedding (pg. 264)
6.9 Singular Value Decomposition (pg. 267)
6.10 Notes (pg. 271)
6.11 Exercises (pg. 272)
7. Regression (pg. 277)
7.1 The Idea (pg. 277)
7.2 Simple Linear Regression (pg. 279)
7.3 Probabilistic Interpretation (pg. 282)
7.4 Analysis of Variance for Regression (pg. 286)
7.5 Prediction (pg. 288)
7.6 Vector-Matrix Notation (pg. 290)
7.7 Generalizing the Linear Model (pg. 293)
7.8 Regression Using Iterative Optimization (pg. 296)
7.9 Online Learning (pg. 307)
7.10 Model Selection and the Bias/Variance Trade-off (pg. 312)
7.11 Cross-Validation (pg. 314)
7.12 Feature Selection (pg. 325)
7.13 Regularization (pg. 330)
7.14 K-Fold Resampling (pg. 333)
7.15 Exercises (pg. 337)
8. Classification (pg. 343)
8.1 Introduction (pg. 343)
8.2 Bayesian Decision Theory (pg. 344)
8.3 Parametric Classification (pg. 346)
8.4 Multivariate Case (pg. 349)
8.5 Losses and Rejects (pg. 357)
8.6 Information Retrieval (pg. 361)
8.7 Logistic Regression (pg. 363)
8.8 Notes (pg. 378)
8.9 Exercises (pg. 378)
9. Clustering (pg. 381)
9.1 Introduction (pg. 381)
9.2 k-Means Clustering (pg. 384)
9.3 Normal Mixtures and Soft Clustering (pg. 393)
9.4 Mixtures of Mixtures for Classification (pg. 398)
9.5 Radial Basis Functions (pg. 402)
9.6 Mixtures of Experts (pg. 406)
9.7 Notes (pg. 411)
9.8 Exercises (pg. 412)
10. Nearest Neighbors (pg. 415)
10.1 The Story So Far (pg. 415)
10.2 Nonparametric Methods (pg. 416)
10.3 Kernel Density Estimation (pg. 418)
10.4 Nonparametric Classification (pg. 422)
10.5 k-Nearest Neighbors (pg. 425)
10.6 Smoothing Models (pg. 431)
10.7 Distance Measures (pg. 435)
10.8 Notes (pg. 437)
10.9 Exercises (pg. 437)
11. Artificial Neural Networks (pg. 439)
11.1 Why We Care about the Brain (pg. 439)
11.2 The Perceptron (pg. 446)
11.3 Training a Perceptron (pg. 449)
11.4 Learning Boolean Functions (pg. 450)
11.5 The Multilayer Perceptron (pg. 456)
11.6 The Autoencoder (pg. 467)
11.7 Deep Learning (pg. 470)
11.8 Improving Convergence (pg. 472)
11.9 Structuring the Network (pg. 474)
11.10 Recurrent Networks (pg. 480)
11.11 Composite Architectures (pg. 481)
11.12 Notes (pg. 485)
11.13 Exercises (pg. 486)
A. Linear Algebra (pg. 489)
A.1 Vectors and Matrices (pg. 489)
A.2 Vector Projections (pg. 491)
A.3 Similarity of Vectors (pg. 492)
A.4 Square Matrices (pg. 493)
A.5 Linear Dependence, Rank, and Inverse Matrices (pg. 493)
A.6 Positive Definite Matrices (pg. 494)
A.7 Trace and Determinant (pg. 494)
A.8 Matrix-Vector Product (pg. 494)
A.9 Eigenvalues and Eigenvectors (pg. 495)
A.10 Matrix Decomposition (pg. 495)
B. Calculus (pg. 497)
B.1 Calculus of A Single Variable (pg. 497)
B.2 Optimization (pg. 500)
B.3 Multivariable Calculus (pg. 504)
B.4 Multivariable Optimization (pg. 509)
B.5 Least Squares (pg. 520)
References (pg. 523)
Index (pg. 527)

Ethem Alpaydın

Ethem Alpaydın is Professor in the Department of Computer Engineering at Özyeğin University and a member of the Science Academy, Istanbul. He is the author of the widely used textbook Introduction to Machine Learning (MIT Press), now in its fourth edition.

Instructors Only
You must have an instructor account and submit a request to access instructor materials for this book.
eTextbook
Go paperless today! Available online anytime, nothing to download or install.

Features

  • Bookmarking
  • Note taking
  • Highlighting