Machine Learning in Production

From Models to Products

by Kästner

| ISBN: 9780262049726 | Copyright 2025

Click here to preview

Instructor Requests

Digital Exam/Desk Copy Print Desk Copy Ancillaries
Tabs

A practical and innovative textbook detailing how to build real-world software products with machine learning components, not just models.

Traditional machine learning texts focus on how to train and evaluate the machine learning model, while MLOps books focus on how to streamline model development and deployment. But neither focus on how to build actual products that deliver value to users. This practical textbook, by contrast, details how to responsibly build products with machine learning components, covering the entire development lifecycle from requirements and design to quality assurance and operations. Machine Learning in Production brings an engineering mindset to the challenge of building systems that are usable, reliable, scalable, and safe within the context of real-world conditions of uncertainty, incomplete information, and resource constraints. Based on the author's popular class at Carnegie Mellon, this pioneering book integrates foundational knowledge in software engineering and machine learning to provide the holistic view needed to create not only prototype models but production-ready systems.

•Integrates coverage of cutting-edge research, existing tools, and real-world applications
•Provides students and professionals with an engineering view for production-ready machine learning systems
•Proven in the classroom
•Offers supplemental resources including slides, videos, exams, and further readings

Expand/Collapse All
Cover (pg. Cover)
Contents (pg. v)
Acknowledgments (pg. xiii)
I. Setting the Stage (pg. 1)
1. Introduction (pg. 3)
1.1 Motivating Example: An Automated Transcription Start-up (pg. 4)
1.2 Data Scientists and Software Engineers (pg. 7)
1.3 Machine-Learning Challenges in Software Projects (pg. 10)
1.4 A Foundation for MLOps and Responsible Engineering (pg. 16)
1.5 Summary (pg. 18)
1.6 Further Readings (pg. 19)
2. From Models to Systems (pg. 23)
2.1 ML and Non-ML Components in a System (pg. 23)
2.2 Beyond the Model (pg. 29)
2.3 On Terminology (pg. 35)
2.4 Summary (pg. 36)
2.5 Further Readings (pg. 37)
3. Machine Learning for Software Engineers, in a Nutshell (pg. 39)
3.1 Basic Terms: Machine Learning, Models, Predictions (pg. 39)
3.2 Technical Concepts: Model Parameters, Hyperparameters, Model Storage (pg. 40)
3.3 Machine-Learning Pipelines (pg. 42)
3.4 Foundation Models and Prompting (pg. 43)
3.5 On Terminology (pg. 45)
3.6 Summary (pg. 45)
3.7 Further Readings (pg. 45)
II. Requirements Engineering (pg. 47)
4. When to Use Machine Learning (pg. 49)
4.1 Problems That Benefit from Machine Learning (pg. 49)
4.2 Tolerating Mistakes and ML Risk (pg. 50)
4.3 Continuous Learning (pg. 51)
4.4 Costs and Benefits (pg. 52)
4.5 The Business Case: Machine Learning as Predictions (pg. 53)
4.6 Summary (pg. 54)
4.7 Further Readings (pg. 54)
5. Setting and Measuring Goals (pg. 57)
5.1 Scenario: Self-Help Legal Chatbot (pg. 57)
5.2 Setting Goals (pg. 58)
5.3 Measurement in a Nutshell (pg. 62)
5.4 Summary (pg. 68)
5.5 Further Readings (pg. 69)
6. Gathering Requirements (pg. 71)
6.1 Scenario: Fall Detection with a Smartwatch (pg. 72)
6.2 Untangling Requirements (pg. 72)
6.3 Eliciting Requirements (pg. 80)
6.4 How Much Requirements Engineering and When? (pg. 86)
6.5 Summary (pg. 87)
6.6 Further Readings (pg. 88)
7. Planning for Mistakes (pg. 91)
7.1 Mistakes Will Happen (pg. 91)
7.2 Designing for Failures (pg. 95)
7.3 Hazard Analysis and Risk Analysis (pg. 102)
7.4 Summary (pg. 109)
7.5 Further Readings (pg. 109)
III. Architecture and Design (pg. 113)
8. Thinking like a Software Architect (pg. 115)
8.1 Quality Requirements Drive Architecture Design (pg. 116)
8.2 The Role of Abstraction (pg. 119)
8.3 Common Architectural Design Challenges for ML-Enabled Systems (pg. 122)
8.4 Codifying Design Knowledge (pg. 124)
8.5 Summary (pg. 130)
8.6 Further Readings (pg. 131)
9. Quality Attributes of ML Components (pg. 133)
9.1 Scenario: Detecting Credit Card Fraud (pg. 133)
9.2 From System Quality to Model and Pipeline Quality (pg. 134)
9.3 Common Quality Attributes (pg. 135)
9.4 Constraints and Trade-offs (pg. 141)
9.5 Summary (pg. 143)
9.6 Further Readings (pg. 144)
10. Deploying a Model (pg. 145)
10.1 Scenario: Augmented Reality Translation (pg. 145)
10.2 Model Inference Function (pg. 146)
10.3 Feature Encoding (pg. 147)
10.4 Model Serving Infrastructure (pg. 150)
10.5 Deployment Architecture Trade-offs (pg. 153)
10.6 Model Inference in a System (pg. 160)
10.7 Documenting Model-Inference Interfaces (pg. 164)
10.8 Summary (pg. 166)
10.9 Further Readings (pg. 168)
11. Automating the Pipeline (pg. 171)
11.1 Scenario: Home Value Prediction (pg. 171)
11.2 Supporting Evolution and Experimentation by Designing for Change (pg. 172)
11.3 Pipeline Thinking (pg. 173)
11.4 Stages of Machine-Learning Pipelines (pg. 174)
11.5 Automation and Infrastructure Design (pg. 181)
11.6 Summary (pg. 184)
11.7 Further Readings (pg. 185)
12. Scaling the System (pg. 187)
12.1 Scenario: Google-Scale Photo Hosting and Search (pg. 188)
12.2 Scaling by Distributing Work (pg. 189)
12.3 Data Storage at Scale (pg. 190)
12.4 Distributed Data Processing (pg. 200)
12.5 Distributed Machine-Learning Algorithms (pg. 213)
12.6 Performance Planning and Monitoring (pg. 215)
12.7 Summary (pg. 215)
12.8 Further Readings (pg. 216)
13. Planning for Operations (pg. 219)
13.1 Scenario: Blogging Platform with Spam Filter (pg. 220)
13.2 Service Level Objectives (pg. 220)
13.3 Observability (pg. 222)
13.4 Automating Deployments (pg. 224)
13.5 Infrastructure as Code and Virtualization (pg. 226)
13.6 Orchestrating and Scaling Deployments (pg. 228)
13.7 Elevating Data Engineering (pg. 230)
13.8 Incident Response Planning (pg. 230)
13.9 DevOps and MLOps Principles (pg. 232)
13.10 DevOps and MLOps Tooling (pg. 234)
13.11 Summary (pg. 237)
13.12 Further Readings (pg. 237)
IV. Quality Assurance (pg. 239)
14. Quality Assurance Basics (pg. 241)
14.1 Testing (pg. 242)
14.2 Code Review (pg. 248)
14.3 Static Analysis (pg. 249)
14.4 Other Quality Assurance Approaches (pg. 250)
14.5 Planning and Process Integration (pg. 252)
14.6 Summary (pg. 254)
14.7 Further Readings (pg. 254)
15. Model Quality (pg. 257)
15.1 Scenario: Cancer Prognosis (pg. 257)
15.2 Defining Correctness and Fit (pg. 258)
15.3 Measuring Prediction Accuracy (pg. 265)
15.4 Model Evaluation beyond Accuracy (pg. 282)
15.5 Test Data Adequacy (pg. 298)
15.6 Model Inspection (pg. 300)
15.7 Summary (pg. 300)
15.8 Further Readings (pg. 301)
16. Data Quality (pg. 307)
16.1 Scenario: Inventory Management (pg. 307)
16.2 Data Quality Challenges (pg. 308)
16.3 Data-Quality Checks (pg. 312)
16.4 Drift and Data-Quality Monitoring (pg. 319)
16.5 Data Quality Is a System-Wide Concern (pg. 324)
16.6 Summary (pg. 329)
16.7 Further Readings (pg. 329)
17. Pipeline Quality (pg. 333)
17.1 Silent Mistakes in ML Pipelines (pg. 333)
17.2 Code Review for ML Pipelines (pg. 335)
17.3 Testing Pipeline Components (pg. 335)
17.4 Static Analysis of ML Pipelines (pg. 348)
17.5 Process Integration and Test Maturity (pg. 349)
17.6 Summary (pg. 350)
17.7 Further Readings (pg. 350)
18. System Quality (pg. 353)
18.1 Limits of Modular Reasoning (pg. 353)
18.2 System Testing (pg. 356)
18.3 Testing Component Interactions and Safeguards (pg. 359)
18.4 Testing Operations (Deployment, Monitoring) (pg. 360)
18.5 Summary (pg. 361)
18.6 Further Readings (pg. 362)
19. Testing and Experimenting in Production (pg. 363)
19.1 A Brief History of Testing in Production (pg. 363)
19.2 Scenario: Meeting Minutes for Video Calls (pg. 365)
19.3 Measuring System Success in Production (pg. 366)
19.4 Measuring Model Quality in Production (pg. 367)
19.5 Designing and Implementing Quality Measures with Telemetry (pg. 372)
19.6 Experimenting in Production (pg. 377)
19.7 Summary (pg. 384)
19.8 Further Readings (pg. 384)
V. Process and Teams (pg. 387)
20. Data Science and Software Engineering Process Models (pg. 389)
20.1 Data-Science Process (pg. 389)
20.2 Software-Engineering Process (pg. 393)
20.3 Tensions between Data Science and Software Engineering Processes (pg. 397)
20.4 Integrated Processes for AI-Enabled Systems (pg. 399)
20.5 Summary (pg. 404)
20.6 Further Readings (pg. 404)
21. Interdisciplinary Teams (pg. 407)
21.1 Scenario: Fighting Depression on Social Media (pg. 407)
21.2 Unicorns Are Not Enough (pg. 408)
21.3 Conflicts within and between Teams Are Common (pg. 409)
21.4 Coordination Costs (pg. 411)
21.5 Conflicting Goals and T-Shaped People (pg. 417)
21.6 Groupthink (pg. 419)
21.7 Team Structure and Allocating Experts (pg. 420)
21.8 Learning from DevOps and MLOps Culture (pg. 423)
21.9 Summary (pg. 427)
21.10 Further Readings (pg. 428)
22. Technical Debt (pg. 431)
22.1 Scenario: Automated Delivery Robots (pg. 431)
22.2 Deliberate and Prudent Technical Debt (pg. 432)
22.3 Technical Debt in Machine-Learning Projects (pg. 434)
22.4 Managing Technical Debt (pg. 436)
22.5 Summary (pg. 438)
22.6 Further Readings (pg. 438)
VI. Responsible ML Engineering (pg. 441)
23. Responsible Engineering (pg. 443)
23.1 Legal and Ethical Responsibilities (pg. 443)
23.2 Why Responsible Engineering Matters for ML-Enabled Systems (pg. 445)
23.3 Facets of Responsible ML Engineering (pg. 449)
23.4 Regulation Is Coming (pg. 451)
23.5 Summary (pg. 454)
23.6 Further Readings (pg. 455)
24. Versioning, Provenance, and Reproducibility (pg. 457)
24.1 Scenario: Debugging a Loan Decision (pg. 458)
24.2 Versioning (pg. 459)
24.3 Data Provenance and Lineage (pg. 465)
24.4 Reproducibility (pg. 468)
24.5 Putting the Pieces Together (pg. 471)
24.6 Summary (pg. 473)
24.7 Further Readings (pg. 473)
25. Explainability (pg. 477)
25.1 Scenario: Proprietary Opaque Models for Recidivism Risk Assessment (pg. 477)
25.2 Defining Explainability (pg. 478)
25.3 Explaining a Model (pg. 482)
25.4 Explaining a Prediction (pg. 485)
25.5 Explaining Data and Training (pg. 491)
25.6 The Dark Side of Explanations (pg. 492)
25.7 Summary (pg. 493)
25.8 Further Readings (pg. 493)
26. Fairness (pg. 495)
26.1 Scenario: Mortgage Applications (pg. 496)
26.2 Fairness Concepts (pg. 497)
26.3 Measuring and Improving Fairness at the Model Level (pg. 507)
26.4 Fairness Is a System-Wide Concern (pg. 513)
26.5 Summary (pg. 529)
26.6 Further Readings (pg. 529)
27. Safety (pg. 535)
27.1 Safety and Reliability (pg. 536)
27.2 Improving Model Reliability (pg. 537)
27.3 Building Safer Systems (pg. 541)
27.4 The AI Alignment Problem (pg. 547)
27.5 Summary (pg. 548)
27.6 Further Readings (pg. 549)
28. Security and Privacy (pg. 551)
28.1 Scenario: Content Moderation (pg. 551)
28.2 Security Requirements (pg. 552)
28.3 Attacks and Defenses (pg. 554)
28.4 ML-Specific Attacks (pg. 555)
28.5 Threat Modeling (pg. 566)
28.6 Designing for Security (pg. 570)
28.7 Data Privacy (pg. 575)
28.8 Summary (pg. 580)
28.9 Further Readings (pg. 580)
29. Transparency and Accountability (pg. 583)
29.1 Transparency of the Model’s Existence (pg. 583)
29.2 Transparency of How the Model Works (pg. 584)
29.3 Human Oversight and Appeals (pg. 587)
29.4 Accountability and Culpability (pg. 589)
29.5 Summary (pg. 590)
29.6 Further Readings (pg. 591)
Index (pg. 595)

Christian Kästner

Christian Kästner is Associate Professor of Computer Science at Carnegie Mellon University.

eTextbook
Go paperless today! Available online anytime, nothing to download or install.

Features

  • Bookmarking
  • Note taking
  • Highlighting