Foundations of Computer Vision

Click to preview

For Instructors

Request Resources

Digital Exam/Desk Copy Print Desk Copy

Foundations of Computer Vision

by Torralba, Isola, Freeman

Click here to preview

Instructor Requests

Digital Exam/Desk Copy Print Desk Copy Ancillaries

An accessible, authoritative, and up-to-date computer vision textbook offering a comprehensive introduction to the foundations of the field that incorporates the latest deep learning advances.

Machine learning has revolutionized computer vision, but the methods of today have deep roots in the history of the field. Providing a much-needed modern treatment, this accessible and up-to-date textbook comprehensively introduces the foundations of computer vision while incorporating the latest deep learning advances. Taking a holistic approach that goes beyond machine learning, it addresses fundamental issues in the task of vision and the relationship of machine vision to human perception. Foundations of Computer Vision covers topics not standard in other texts, including transformers, diffusion models, statistical image models, issues of fairness and ethics, and the research process. To emphasize intuitive learning, concepts are presented in short, lucid chapters alongside extensive illustrations, questions, and examples. Written by leaders in the field and honed by a decade of classroom experience, this engaging and highly teachable book offers an essential next-generation view of computer vision.

•Up-to-date treatment integrates classic computer vision and deep learning
•Accessible approach emphasizes fundamentals and assumes little background knowledge
•Student-friendly presentation features extensive examples and images
•Proven in the classroom
•Instructor resources include slides, solutions, and source code

Expand/Collapse All
Cover (pg. Cover)
C1 (pg. i)
Title (pg. i)
Contents (pg. vii)
Preface (pg. xxi)
Notation (pg. xxvii)
1 The Challenge of Vision (pg. 1)
1.1 Introduction (pg. 1)
1.2 Vision (pg. 1)
1.2.1 The Input: The Structure of Ambient Light (pg. 2)
1.2.2 The Output: Measuring Light Versus Measuring Scene Properties (pg. 3)
1.3 Theories of Vision (pg. 4)
1.3.1 The Origins of the Science of Perception (pg. 5)
1.3.2 Helmholtz: Perception as Inference (pg. 9)
1.3.3 Gestalt Psychology and Perceptual Organization (pg. 11)
1.3.4 Gibson's Ecological Approach to Visual Perception (pg. 15)
1.3.5 The Neural Mechanisms of Visual Perception (pg. 18)
1.3.6 Marr's Computational Theory of Vision (pg. 24)
1.3.7 Computer Vision (pg. 26)
1.3.8 Learning-Based Vision (pg. 27)
1.4 What's Next? (pg. 31)
1.5 Concluding Remarks (pg. 32)
I Foundations (pg. 33)
2 A Simple Vision System (pg. 35)
2.1 Introduction (pg. 35)
2.2 A Simple World: The Blocks World (pg. 35)
2.3 A Simple Image Formation Model (pg. 36)
2.4 A Simple Goal (pg. 38)
2.5 From Images to Edges and Useful Features (pg. 38)
2.6 From Edges to Surfaces (pg. 42)
2.7 Generalization (pg. 49)
2.8 Concluding Remarks (pg. 51)
3 Looking at Images (pg. 53)
3.1 Introduction (pg. 53)
3.2 Looking at Individual Pixels (pg. 53)
3.3 The More You Look, the More You See (pg. 55)
3.4 The Eye of the Artist (pg. 57)
3.5 Tree Shadows and Image Formation (pg. 58)
3.6 Horizontal or Vertical (pg. 59)
3.7 Motion Blur (pg. 60)
3.8 Accidents Happen (pg. 62)
3.9 Cues for Support (pg. 63)
3.10 Looking at Raindrops (pg. 64)
3.11 Plato's Cave (pg. 65)
3.12 How Do You Know Something Is Wet? (pg. 65)
3.13 Concluding Remarks (pg. 66)
4 Computer Vision and Society (pg. 67)
4.1 Introduction (pg. 67)
4.2 Fairness (pg. 67)
4.3 Ethics (pg. 72)
4.4 Concluding Remarks (pg. 73)
II Image Formation (pg. 75)
5 Imaging (pg. 77)
5.1 Introduction (pg. 77)
5.2 Light Interacting with Surfaces (pg. 77)
5.3 The Pinhole Camera and Image Formation (pg. 79)
5.4 Concluding Remarks (pg. 88)
6 Lenses (pg. 89)
6.1 Introduction (pg. 89)
6.2 Lensmaker's Formula (pg. 91)
6.3 Imaging with Lenses (pg. 97)
6.4 Concluding Remarks (pg. 106)
7 Cameras as Linear Systems (pg. 107)
7.1 Introduction (pg. 107)
7.2 Flatland (pg. 107)
7.3 Cameras as Linear Systems (pg. 108)
7.4 More General Imagers (pg. 110)
7.5 Concluding Remarks (pg. 116)
8 Color (pg. 117)
8.1 Introduction (pg. 117)
8.2 Color Physics (pg. 117)
8.3 Color Perception (pg. 125)
8.4 Spatial Resolution and Color (pg. 131)
8.5 Concluding Remarks (pg. 134)
III Foundations of Learning (pg. 135)
9 Introduction to Learning (pg. 137)
9.1 Introduction (pg. 137)
9.2 Learning from Examples (pg. 138)
9.3 Learning without Examples (pg. 140)
9.4 Key Ingredients (pg. 140)
9.5 Empirical Risk Minimization: A Formalization of Learning from Examples (pg. 141)
9.6 Learning as Probabilistic Inference (pg. 142)
9.7 Case Studies (pg. 142)
9.8 Learning to Learn (pg. 149)
9.9 Concluding Remarks (pg. 150)
10 Gradient-Based Learning Algorithms (pg. 151)
10.1 Introduction (pg. 151)
10.2 Technical Setting (pg. 151)
10.3 Basic Gradient Descent (pg. 152)
10.4 Learning Rate Schedules (pg. 153)
10.5 Momentum (pg. 153)
10.6 What Kinds of Functions Can Be Minimized with Gradient Descent? (pg. 155)
10.7 Stochastic Gradient Descent (pg. 159)
10.8 Concluding Remarks (pg. 160)
11 The Problem of Generalization (pg. 161)
11.1 Introduction (pg. 161)
11.2 Underfitting and Overfitting (pg. 161)
11.3 Regularization (pg. 165)
11.4 Rethinking Generalization (pg. 167)
11.5 Three Tools in the Search for Truth: Data, Priors, and Hypotheses (pg. 167)
11.6 Concluding Remarks (pg. 173)
12 Neural Networks (pg. 175)
12.1 Introduction (pg. 175)
12.2 The Perceptron: A Simple Model of a Single Neuron (pg. 175)
12.3 Multilayer Perceptrons (pg. 177)
12.4 Activations Versus Parameters (pg. 179)
12.5 Deep Nets (pg. 180)
12.6 Deep Learning: Learning with Neural Nets (pg. 184)
12.7 Catalog of Layers (pg. 186)
12.8 Why Are Neural Networks a Good Architecture? (pg. 189)
12.9 Concluding Remarks (pg. 190)
13 Neural Networks as Distribution Transformers (pg. 191)
13.1 Introduction (pg. 191)
13.2 A Different Way of Plotting Functions (pg. 191)
13.3 How Deep Nets Remap a Data Distribution (pg. 193)
13.4 Binary Classifier Example (pg. 194)
13.5 How High-Dimensional Datapoints Get Remapped by Deep Net (pg. 196)
13.6 Concluding Remarks (pg. 198)
14 Backpropagation (pg. 199)
14.1 Introduction (pg. 199)
14.2 The Trick of Backpropagation: Reuse of Computation (pg. 200)
14.3 Backward for a Generic Layer (pg. 201)
14.4 The Full Algorithm: Forward, Then Backward (pg. 203)
14.5 Backpropagation Over Data Batches (pg. 204)
14.6 Example: Backpropagation for an MLP (pg. 205)
14.7 Backpropagation through DAGs: Branch and Merge (pg. 212)
14.8 Parameter Sharing (pg. 214)
14.9 Backpropagation to the Data (pg. 214)
14.10 Concluding Remarks (pg. 216)
IV Foundations of image processing (pg. 217)
15 Linear Image Filtering (pg. 219)
15.1 Introduction (pg. 219)
15.2 Signals and Images (pg. 219)
15.3 Systems (pg. 223)
15.4 Convolution (pg. 227)
15.5 Cross-Correlation Versus Convolution (pg. 235)
15.6 System Identification (pg. 238)
15.7 Concluding Remarks (pg. 239)
16 Fourier Analysis (pg. 241)
16.1 Introduction (pg. 241)
16.2 Image Transforms (pg. 241)
16.3 Fourier Series (pg. 241)
16.4 Continuous and Discrete Waves (pg. 243)
16.5 The Discrete Fourier Transform (pg. 247)
16.6 Useful Transforms (pg. 251)
16.7 Discrete Fourier Transform Properties (pg. 255)
16.8 A Family of Fourier Transforms (pg. 261)
16.9 Fourier Analysis as an Image Representation (pg. 262)
16.10 Fourier Analysis of Linear Filters (pg. 267)
16.11 Concluding Remarks (pg. 272)
V Linear filters (pg. 273)
17 Blur Filters (pg. 275)
17.1 Introduction (pg. 275)
17.2 Box Filter (pg. 276)
17.3 Gaussian Filter (pg. 279)
17.4 Binomial Filters (pg. 283)
17.5 Concluding Remarks (pg. 286)
18 Image Derivatives (pg. 287)
18.1 Introduction (pg. 287)
18.2 Discretizing Image Derivatives (pg. 287)
18.3 Gradient-Based Image Representation (pg. 291)
18.4 Image Editing in the Gradient Domain (pg. 292)
18.5 Gaussian Derivatives (pg. 293)
18.6 High-Order Gaussian Derivatives (pg. 295)
18.7 Derivatives of Binomial Filters (pg. 299)
18.8 Image Gradient and Directional Derivatives (pg. 301)
18.9 Image Laplacian (pg. 302)
18.10 A Simple Model of the Early Visual System (pg. 305)
18.11 Sharpening Filter (pg. 307)
18.12 Retinex (pg. 309)
18.13 Concluding Remarks (pg. 313)
19 Temporal Filters (pg. 315)
19.1 Introduction (pg. 315)
19.2 Modeling Sequences (pg. 315)
19.3 Modeling Sequences in the Fourier Domain (pg. 317)
19.4 Temporal Filters (pg. 318)
19.5 Concluding Remarks (pg. 324)
VI Sampling and multiscale image representations (pg. 325)
20 Image Sampling and Aliasing (pg. 327)
20.1 Introduction (pg. 327)
20.2 Aliasing (pg. 327)
20.3 Sampling Theorem (pg. 329)
20.4 Reconstruction (pg. 334)
20.5 Ideal Reconstruction (pg. 334)
20.6 A Family of 2D Spatial Samplings (pg. 338)
20.7 Anti-Aliasing Filter (pg. 340)
20.8 Spatiotemporal Sampling (pg. 342)
20.9 Concluding Remarks (pg. 342)
21 Downsampling and Upsampling Images (pg. 345)
21.1 Introduction (pg. 345)
21.2 Example: Aliasing-Based Adversarial Attack (pg. 345)
21.3 Downsampling (pg. 346)
21.4 Upsampling (pg. 358)
21.5 Concluding Remarks (pg. 363)
22 Filter Banks (pg. 365)
22.1 Introduction (pg. 365)
22.2 Gabor Filters (pg. 365)
22.3 Steerable Filters and Orientation Analysis (pg. 374)
22.4 Motion Analysis (pg. 380)
22.5 Concluding Remarks (pg. 383)
23 Image Pyramids (pg. 385)
23.1 Introduction (pg. 385)
23.2 Image Pyramids and Multiscale Image Analysis (pg. 386)
23.3 Linear Image Transforms (pg. 387)
23.4 Gaussian Pyramid (pg. 388)
23.5 Laplacian Pyramid (pg. 390)
23.6 Steerable Pyramid (pg. 395)
23.7 A Pictorial Summary (pg. 397)
23.8 Concluding Remarks (pg. 399)
VII Neural Architectures for Vision (pg. 401)
24 Convolutional Neural Nets (pg. 403)
24.1 Introduction (pg. 403)
24.2 Convolutional Layers (pg. 404)
24.3 Nonlinear Filtering Layers (pg. 414)
24.4 A Simple CNN Classifier (pg. 415)
24.5 A Worked Example (pg. 417)
24.6 Feature Maps in CNNs (pg. 420)
24.7 Receptive Fields (pg. 423)
24.8 Spatial Outputs (pg. 424)
24.9 CNN as a Sliding Filter (pg. 425)
24.10 Why Process Images Patch by Patch? (pg. 426)
24.11 Popular CNN Architectures (pg. 427)
24.12 Concluding Remarks (pg. 430)
25 Recurrent Neural Nets (pg. 431)
25.1 Introduction (pg. 431)
25.2 Recurrent Layer (pg. 433)
25.3 Backpropagation through Time (pg. 433)
25.4 Stacking Recurrent Layers (pg. 435)
25.5 Long Short-Term Memory (pg. 436)
25.6 Concluding Remarks (pg. 437)
26 Transformers (pg. 439)
26.1 Introduction (pg. 439)
26.2 A Limitation of CNNs: Independence between Far Apart Patches (pg. 439)
26.3 The Idea of Attention (pg. 440)
26.4 A New Data Type: Tokens (pg. 440)
26.5 Token Nets (pg. 444)
26.6 The Attention Layer (pg. 445)
26.7 The Full Transformer Architecture (pg. 453)
26.8 Permutation Equivariance (pg. 455)
26.9 CNNs in Disguise (pg. 456)
26.10 Masked Attention (pg. 458)
26.11 Positional Encodings (pg. 460)
26.12 Comparing Fully Connected, Convolutional, and Self-Attention Layers (pg. 462)
26.13 Concluding Remarks (pg. 463)
VIII Probabilistic models of images (pg. 465)
27 Statistical Image Models (pg. 467)
27.1 Introduction (pg. 467)
27.2 How Do We Tell Noise from Texture? (pg. 469)
27.3 Independent Pixels (pg. 470)
27.4 Dead Leaves Model (pg. 474)
27.5 The Gaussian Model (pg. 477)
27.6 The Wavelet Marginal Model (pg. 482)
27.7 Nonparametric Markov Random Field Image Models (pg. 489)
27.8 Concluding Remarks (pg. 490)
28 Textures (pg. 493)
28.1 Introduction (pg. 493)
28.2 A Few Notes about Human Perception (pg. 494)
28.3 Heeger-Bergen Texture Analysis and Synthesis (pg. 496)
28.4 Efros-Leung Texture Analysis and Synthesis Model (pg. 501)
28.5 Connection to Deep Generative Models (pg. 503)
28.6 Concluding Remarks (pg. 504)
29 Probabilistic Graphical Models (pg. 505)
29.1 Introduction (pg. 505)
29.2 Simple Examples (pg. 505)
29.3 Directed Graphical Models (pg. 509)
29.4 Inference in Graphical Models (pg. 510)
29.5 Simple Example of Inference in a Graphical Model (pg. 511)
29.6 Belief Propagation (pg. 512)
29.7 Loopy Belief Propagation (pg. 520)
29.8 Relationship of Probabilistic Graphical Models to Neural Networks (pg. 523)
29.9 Concluding Remarks (pg. 523)
IX Generative Image Models and Representation Learning (pg. 525)
30 Representation Learning (pg. 527)
30.1 Introduction (pg. 527)
30.2 Problem Setting (pg. 527)
30.3 What Makes for a Good Representation? (pg. 528)
30.4 Autoencoders (pg. 530)
30.5 Predictive Encodings (pg. 533)
30.6 Self-Supervised Learning (pg. 535)
30.7 Imputation (pg. 536)
30.8 Abstract Pretext Tasks (pg. 537)
30.9 Clustering (pg. 537)
30.10 Contrastive Learning (pg. 542)
30.11 Concluding Remarks (pg. 547)
31 Perceptual Grouping (pg. 549)
31.1 Introduction (pg. 549)
31.2 Why Group? (pg. 550)
31.3 Segments (pg. 551)
31.4 Edges, Boundaries, and Contours (pg. 555)
31.5 Layers (pg. 556)
31.6 Emergent Groups (pg. 556)
31.7 Concluding Remarks (pg. 557)
32 Generative Models (pg. 559)
32.1 Introduction (pg. 559)
32.2 Unconditional Generative Models (pg. 561)
32.3 Learning Generative Models (pg. 563)
32.4 Density Models (pg. 565)
32.5 Energy-Based Models (pg. 567)
32.6 Gaussian Density Models (pg. 570)
32.7 Autoregressive Density Models (pg. 572)
32.8 Diffusion Models (pg. 576)
32.9 Generative Adversarial Networks (pg. 579)
32.10 Concluding Remarks (pg. 581)
33 Generative Modeling Meets Representation Learning (pg. 583)
33.1 Introduction (pg. 583)
33.2 Latent Variables as Representations (pg. 584)
33.3 Technical Setting (pg. 585)
33.4 Variational Autoencoders (pg. 586)
33.5 Do VAEs Learn Good Representations? (pg. 598)
33.6 Generative Adversarial Networks Are Representation Learners Too (pg. 600)
33.7 Concluding Remarks (pg. 601)
34 Conditional Generative Models (pg. 603)
34.1 Introduction (pg. 603)
34.2 A Motivating Example: Image Colorization (pg. 603)
34.3 Conditional Generative Models Solve Multimodal Structured Prediction (pg. 608)
34.4 A Tour of Popular Conditional Models (pg. 609)
34.5 Structured Prediction in Vision (pg. 613)
34.6 Image-to-Image Translation (pg. 614)
34.7 Concluding Remarks (pg. 620)
X Challenges in Learning-Based Vision (pg. 621)
35 Data Bias and Shift (pg. 623)
35.1 Introduction (pg. 623)
35.2 Out-of-Distribution Generalization (pg. 625)
35.3 A Toy Example (pg. 627)
35.4 Dataset Bias (pg. 630)
35.5 Sources of Bias (pg. 632)
35.6 Adversarial Shifts (pg. 636)
35.7 Concluding Remarks (pg. 637)
36 Training for Robustness and Generality (pg. 639)
36.1 Introduction (pg. 639)
36.2 Data Augmentation (pg. 639)
36.3 Adversarial Training (pg. 643)
36.4 Toward General-Purpose Vision Models (pg. 643)
36.5 Concluding Remarks (pg. 644)
37 Transfer Learning and Adaptation (pg. 645)
37.1 Introduction (pg. 645)
37.2 Problem Setting (pg. 645)
37.3 Finetuning (pg. 646)
37.4 Learning from a Teacher (pg. 649)
37.5 Prompting (pg. 651)
37.6 Domain Adaptation (pg. 653)
37.7 Generative Data (pg. 654)
37.8 Other Kinds of Knowledge that Can Be Transferred (pg. 655)
37.9 A Combinatorial Catalog of Transfer Learning Methods (pg. 655)
37.10 Sequence Models from the Lens of Adaptation (pg. 656)
37.11 Concluding Remarks (pg. 656)
XI Understanding geometry (pg. 657)
38 Representing Images and Geometry (pg. 659)
38.1 Introduction (pg. 659)
38.2 Homogeneous and Heteregenous Coordinates (pg. 660)
38.3 2D Image Transformations (pg. 661)
38.4 Lines and Planes in Homogeneous Coordinates (pg. 668)
38.5 Image Warping (pg. 670)
38.6 Implicit Image Representations (pg. 671)
38.7 Concluding Remarks (pg. 673)
39 Camera Modeling and Calibration (pg. 675)
39.1 Introduction (pg. 675)
39.2 3D Camera Projections in Homogeneous Coordinates (pg. 676)
39.3 Camera-Intrinsic Parameters (pg. 678)
39.4 Camera-Extrinsic Parameters (pg. 683)
39.5 Full Camera Model (pg. 685)
39.6 A Few Concrete Examples (pg. 686)
39.7 Camera Calibration (pg. 692)
39.8 Concluding Remarks (pg. 699)
40 Stereo Vision (pg. 701)
40.1 Introduction (pg. 701)
40.2 Stereo Cues (pg. 702)
40.3 Model-Based Methods (pg. 706)
40.4 Learning-Based Methods (pg. 717)
40.5 Evaluation (pg. 719)
40.6 Concluding Remarks (pg. 719)
41 Homographies (pg. 721)
41.1 Introduction (pg. 721)
41.2 Homography (pg. 722)
41.3 Creating Image Panoramas (pg. 727)
41.4 Concluding Remarks (pg. 730)
42 Single View Metrology (pg. 731)
42.1 Introduction (pg. 731)
42.2 A Few Notes about Perception of Depth by Humans (pg. 732)
42.3 Linear Perspective (pg. 735)
42.4 Measuring Heights Using Parallel Lines (pg. 741)
42.5 3D Metrology from a Single View (pg. 749)
42.6 Camera Calibration from Vanishing Points (pg. 753)
42.7 Concluding Remarks (pg. 755)
43 Learning to Estimate Depth from a Single Image (pg. 757)
43.1 Introduction (pg. 757)
43.2 Monocular Depth Cues (pg. 757)
43.3 3D Representations (pg. 758)
43.4 Supervised Methods for Depth from a Single Image (pg. 761)
43.5 Unsupervised Methods for Depth from a Single Image (pg. 764)
43.6 Concluding Remarks (pg. 767)
44 Multiview Geometry and Structure from Motion (pg. 769)
44.1 Introduction (pg. 769)
44.2 Structure from Motion (pg. 769)
44.3 Sparse SFM (pg. 771)
44.4 Concluding Remarks (pg. 780)
45 Radiance Fields (pg. 783)
45.1 Introduction (pg. 783)
45.2 What is a Radiance Field? (pg. 784)
45.3 Representing Radiance Fields With Parameterized Functions (pg. 787)
45.4 Rendering Radiance Fields (pg. 789)
45.5 Fitting a Radiance Field to Explain a Scene (pg. 793)
45.6 Beyond Radiance Fields: The Rendering Equation (pg. 797)
45.7 Concluding Remarks (pg. 798)
XII Understanding motion (pg. 799)
46 Motion Estimation (pg. 801)
46.1 Introduction (pg. 801)
46.2 Motion Perception in the Human Visual System (pg. 802)
46.3 Matching-Based Motion Estimation (pg. 804)
46.4 Does the Human Visual System Use Matching to Estimate Motion? (pg. 808)
46.5 Concluding Remarks (pg. 811)
47 3D Motion and Its 2D Projection (pg. 813)
47.1 Introduction (pg. 813)
47.2 3D Motion and Its 2D Projection (pg. 813)
47.3 Concluding Remarks (pg. 822)
48 Optical Flow Estimation (pg. 823)
48.1 Introduction (pg. 823)
48.2 2D Motion Field and Optical Flow (pg. 823)
48.3 Model-Based Approaches (pg. 826)
48.4 Concluding Remarks (pg. 834)
49 Learning to Estimate Motion (pg. 835)
49.1 Introduction (pg. 835)
49.2 Learning-Based Approaches (pg. 835)
49.3 Concluding Remarks (pg. 839)
XIII Understanding Vision with Language (pg. 841)
50 Object Recognition (pg. 843)
50.1 Introduction (pg. 843)
50.2 A Few Notes About Object Recognition in Humans (pg. 844)
50.3 Image Classification (pg. 847)
50.4 Object Localization (pg. 854)
50.5 Class Segmentation (pg. 863)
50.6 Instance Segmentation (pg. 865)
50.7 Concluding Remarks (pg. 867)
51 Vision and Language (pg. 869)
51.1 Introduction (pg. 869)
51.2 Background: Representing Text as Tokens (pg. 869)
51.3 Learning Visual Representations from Language Supervision (pg. 871)
51.4 Translating between Images and Text (pg. 877)
51.5 Text as a Visual Representation (pg. 882)
51.6 Visual Question Answering (pg. 883)
51.7 Concluding Remarks (pg. 884)
XIV On Research, Writing and Speaking (pg. 885)
52 How to Do Research (pg. 887)
52.1 Introduction (pg. 887)
52.2 Research Advice (pg. 887)
52.3 Concluding Remarks (pg. 891)
53 How to Write Papers (pg. 893)
53.1 Introduction (pg. 893)
53.2 Organization (pg. 894)
53.3 General Writing Tips (pg. 896)
53.4 Concluding Remarks (pg. 901)
54 How to Give Talks (pg. 903)
54.1 Introduction (pg. 903)
54.2 Very Short Talks (2 – 10 minutes) (pg. 903)
54.3 Preparation (pg. 904)
54.4 Nervousness (pg. 905)
54.5 Your Distracted Audience (pg. 905)
54.6 Ways to Engage the Audience (pg. 905)
54.7 Show Yourself to the Audience (pg. 906)
54.8 Concluding Remarks (pg. 907)
XV Closing Remarks (pg. 909)
55 A Simple Vision System—Revisited (pg. 911)
55.1 Introduction (pg. 911)
55.2 A Simple Neural Network (pg. 911)
55.3 From 2D Images to 3D (pg. 914)
55.4 Large Language Model-based Scene Understanding (pg. 916)
55.5 Unsolved Solved Computer Vision Problems (pg. 918)
55.6 Concluding Remarks (pg. 918)
Bibliography (pg. 921)
Index (pg. 943)

Antonio Torralba

Antonio Torralba is Professor and Head of the AI+D faculty at the Department of Electrical Engineering and Computer Science at MIT, where he is a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Phillip Isola

Phillip Isola is Associate Professor of Electrical Engineering and Computer Science at MIT, where he is a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

William T. Freeman

William T. Freeman is Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science at MIT, where he is a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). He is also a research manager at Google Research in Cambridge, Massachusetts.

Instructors Only
You must have an instructor account and submit a request to access instructor materials for this book. Register / Log in

eTextbook

Go paperless today! Available online anytime, nothing to download or install.

Features

Bookmarking
Note taking
Highlighting

I have an access code.

Browser	Version
Chrome (Recommended)	Latest
Firefox	Latest
Edge	Latest
Safari	Latest

For Instructors

Request Resources

Foundations of Computer Vision

Instructor Requests

Antonio Torralba

Phillip Isola

William T. Freeman

Features

The MIT Press

Catalog

Video Title

Common Access Code Issues

Common Payment Issues