Introduction to Natural Language Processing

by Eisenstein

ISBN: 9780262042840 | Copyright 2019

Click here to preview

Instructor Requests

Digital Exam/Desk Copy Print Desk Copy Ancillaries
Tabs

A survey of computational methods for understanding, generating, and manipulating human language, which offers a synthesis of classical representations and algorithms with contemporary machine learning techniques.

This textbook provides a technical perspective on natural language processing—methods for building computer software that understands, generates, and manipulates human language. It emphasizes contemporary data-driven approaches, focusing on techniques from supervised and unsupervised machine learning. The first section establishes a foundation in machine learning by building a set of tools that will be used throughout the book and applying them to word-based textual analysis. The second section introduces structured representations of language, including sequences, trees, and graphs. The third section explores different approaches to the representation and analysis of linguistic meaning, ranging from formal logic to neural word embeddings. The final section offers chapter-length treatments of three transformative applications of natural language processing: information extraction, machine translation, and text generation. End-of-chapter exercises include both paper-and-pencil analysis and software implementation.

The text synthesizes and distills a broad and diverse research literature, linking contemporary machine learning techniques with the field's linguistic and computational foundations. It is suitable for use in advanced undergraduate and graduate-level courses and as a reference for software engineers and data scientists. Readers should have a background in computer programming and college-level mathematics. After mastering the material presented, students will have the technical skill to build and analyze novel natural language processing systems and to understand the latest research in the field.

Expand/Collapse All
Contents (pg. v)
Preface (pg. ix)
Notation (pg. xiii)
1: Introduction (pg. 1)
1.1 Natural Language Processing and Its Neighbors (pg. 1)
1.2 Three Themes in Natural Language Processing (pg. 5)
I. Learning (pg. 11)
2: Linear Text Classification (pg. 13)
2.1 The Bag of Words (pg. 13)
2.2 Naïve Bayes (pg. 17)
2.3 Discriminative Learning (pg. 24)
2.4 Loss Functions and Large-Margin Classification (pg. 28)
2.5 Logistic Regression (pg. 34)
2.6 Optimization (pg. 37)
2.7 *Additional Topics in Classification (pg. 40)
2.8 Summary of Learning Algorithms (pg. 42)
3: Nonlinear Classification (pg. 47)
3.1 Feedforward Neural Networks (pg. 48)
3.2 Designing Neural Networks (pg. 50)
3.3 Learning Neural Networks (pg. 53)
3.4 Convolutional Neural Networks (pg. 61)
4: Linguistic Applications of Classification (pg. 67)
4.1 Sentiment and Opinion Analysis (pg. 67)
4.2 Word Sense Disambiguation (pg. 71)
4.3 Design Decisions for Text Classification (pg. 74)
4.4 Evaluating Classifiers (pg. 78)
4.5 Building Datasets (pg. 85)
5: Learning without Supervision (pg. 91)
5.1 Unsupervised Learning (pg. 91)
5.2 Applications of Expectation-Maximization (pg. 99)
5.3 Semi-Supervised Learning (pg. 102)
5.4 Domain Adaptation (pg. 105)
5.5 *Other Approaches to Learning with Latent Variables (pg. 109)
II. Sequences and Trees (pg. 117)
6: Language Models (pg. 119)
6.1 N-Gram Language Models (pg. 120)
6.2 Smoothing and Discounting (pg. 122)
6.3 Recurrent Neural Network Language Models (pg. 127)
6.4 Evaluating Language Models (pg. 132)
6.5 Out-of-Vocabulary Words (pg. 134)
7: Sequence Labeling (pg. 137)
7.1 Sequence Labeling as Classification (pg. 137)
7.2 Sequence Labeling as Structure Prediction (pg. 139)
7.3 The Viterbi Algorithm (pg. 140)
7.4 Hidden Markov Models (pg. 145)
7.5 Discriminative Sequence Labeling with Features (pg. 149)
7.6 Neural Sequence Labeling (pg. 158)
7.7 *Unsupervised Sequence Labeling (pg. 161)
8: Applications of Sequence Labeling (pg. 167)
8.1 Part-of-Speech Tagging (pg. 167)
8.2 Morphosyntactic Attributes (pg. 173)
8.3 Named Entity Recognition (pg. 175)
8.4 Tokenization (pg. 176)
8.5 Code Switching (pg. 177)
8.6 Dialogue Acts (pg. 178)
9: Formal Language Theory (pg. 183)
9.1 Regular Languages (pg. 184)
9.2 Context-Free Languages (pg. 198)
9.3 *Mildly Context-Sensitive Languages (pg. 209)
10: Context-Free Parsing (pg. 215)
10.1 Deterministic Bottom-Up Parsing (pg. 216)
10.2 Ambiguity (pg. 219)
10.3 Weighted Context-Free Grammars (pg. 222)
10.4 Learning Weighted Context-Free Grammars (pg. 227)
10.5 Grammar Refinement (pg. 231)
10.6 Beyond Context-Free Parsing (pg. 238)
11: Dependency Parsing (pg. 243)
11.1 Dependency Grammar (pg. 243)
11.2 Graph-Based Dependency Parsing (pg. 248)
11.3 Transition-Based Dependency Parsing (pg. 253)
11.4 Applications (pg. 261)
III. Meaning (pg. 267)
12: Logical Semantics (pg. 269)
12.1 Meaning and Denotation (pg. 270)
12.2 Logical Representations of Meaning (pg. 270)
12.3 Semantic Parsing and the Lambda Calculus (pg. 274)
12.4 Learning Semantic Parsers (pg. 280)
13: Predicate-Argument Semantics (pg. 289)
13.1 Semantic Roles (pg. 291)
13.2 Semantic Role Labeling (pg. 295)
13.3 Abstract Meaning Representation (pg. 302)
14: Distributional and Distributed Semantics (pg. 309)
14.1 The Distributional Hypothesis (pg. 309)
14.2 Design Decisions for Word Representations (pg. 311)
14.3 Latent Semantic Analysis (pg. 313)
14.4 Brown Clusters (pg. 315)
14.5 Neural Word Embeddings (pg. 317)
14.6 Evaluating Word Embeddings (pg. 322)
14.7 Distributed Representations beyond Distributional Statistics (pg. 324)
14.8 Distributed Representations of Multiword Units (pg. 327)
15: Reference Resolution (pg. 333)
15.1 Forms of Referring Expressions (pg. 334)
15.2 Algorithms for Coreference Resolution (pg. 339)
15.3 Representations for Coreference Resolution (pg. 348)
15.4 Evaluating Coreference Resolution (pg. 353)
16: Discourse (pg. 357)
16.1 Segments (pg. 357)
16.2 Entities and Reference (pg. 359)
16.3 Relations (pg. 362)
IV. Applications (pg. 377)
17: Information Extraction (pg. 379)
17.1 Entities (pg. 381)
17.2 Relations (pg. 387)
17.3 Events (pg. 395)
17.4 Hedges, Denials, and Hypotheticals (pg. 397)
17.5 Question Answering and Machine Reading (pg. 399)
18: Machine Translation (pg. 405)
18.1 Machine Translation as a Task (pg. 405)
18.2 Statistical Machine Translation (pg. 410)
18.3 Neural Machine Translation (pg. 415)
18.4 Decoding (pg. 423)
18.5 Training toward the Evaluation Metric (pg. 424)
19: Text Generation (pg. 431)
19.1 Data-to-Text Generation (pg. 431)
19.2 Text-to-Text Generation (pg. 437)
19.3 Dialogue (pg. 440)
Appendix A: Probability (pg. 447)
A.1 Probabilities of Event Combinations (pg. 447)
A.2 Conditional Probability and Bayes' Rule (pg. 449)
A.3 Independence (pg. 451)
A.4 Random Variables (pg. 451)
A.5 Expectations (pg. 452)
A.6 Modeling and Estimation (pg. 453)
Appendix B: Numerical Optimization (pg. 455)
B.1 Gradient Descent (pg. 456)
B.2 Constrained Optimization (pg. 456)
B.3 Example: Passive-Aggressive Online Learning (pg. 457)
Bibliography (pg. 459)
Index (pg. 509)
Jacob Eisenstein

Jacob Eisenstein

Jacob Eisenstein is Associate Professor in the School of Interactive Computing at Georgia Institute of Technology.

Instructors Only
You must have an instructor account and submit a request to access instructor materials for this book.
eTextbook
Go paperless today! Available online anytime, nothing to download or install.

Features

  • Bookmarking
  • Note taking
  • Highlighting