Human-Centered Data Science

An Introduction

by Aragon, Guha, Kogan, Muller, Neff

ISBN: 9780262367585 | Copyright 2022

Click here to preview

Instructor Requests

Digital Exam/Desk Copy Print Desk Copy Ancillaries
Tabs

Best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of large datasets.

Human-centered data science is a new interdisciplinary field that draws from human-computer interaction, social science, statistics, and computational techniques. This book, written by founders of the field, introduces best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of very large datasets. It offers a brief and accessible overview of many common statistical and algorithmic data science techniques, explains human-centered approaches to data science problems, and presents practical guidelines and real-world case studies to help readers apply these methods.

The authors explain how data scientists' choices are involved at every stage of the data science workflow—and show how a human-centered approach can enhance each one, by making the process more transparent, asking questions, and considering the social context of the data. They describe how tools from social science might be incorporated into data science practices, discuss different types of collaboration, and consider data storytelling through visualization. The book shows that data science practitioners can build rigorous and ethical algorithms and design projects that use cutting-edge computational tools and address social concerns.

Expand/Collapse All
Contents (pg. vii)
Acknowledgments (pg. ix)
1. Data Science to Human-Centered Data Science (pg. 1)
Emergence of Human-Centeredness in Data Science (pg. 3)
Doing Human-Centered Data Science (pg. 4)
About This Book: Themes (pg. 7)
About This Book: Stories, Audience, and Our Purpose (pg. 9)
Book Outline (pg. 10)
Who We Are (pg. 12)
2. The Data Science Cycle (pg. 13)
Elements of the Data Science Cycle (pg. 14)
Feedback and Iteration (pg. 27)
Models and Pipelines (pg. 28)
Conclusion (pg. 28)
Recommended Reading (pg. 29)
3. Interrogating Data Science (pg. 31)
Measurement Plans (pg. 32)
Discovery and Capture (pg. 35)
Curation of Data (pg. 40)
Design of Data (pg. 40)
Creation of Data (pg. 45)
Privacy and Reidentification (pg. 46)
Conclusion (pg. 49)
Recommended Reading (pg. 50)
4. Techniques and Tools for Data Science Models (pg. 51)
Machine Learning Models (pg. 51)
Statistical Models (pg. 60)
Overfitting and Underfitting (pg. 66)
Bias Detection and Mitigation Tools (pg. 66)
Human Decision-Making in Data Science Models (pg. 69)
Visualization (pg. 69)
Visual Analytics (pg. 72)
Visualization Tools (pg. 72)
Conclusion (pg. 72)
Recommended Reading (pg. 73)
5. Human-Centered Approaches to Data Science Problems (pg. 75)
Asking Good Questions (pg. 75)
Ethics (pg. 77)
An Example of Ethics Practices and Principles to Develop and Extend (pg. 80)
Fairness (pg. 84)
Designing Projects for Others to Build On (pg. 85)
Thinking about Your Process and Practice (pg. 86)
Conclusion (pg. 91)
Recommended Reading (pg. 91)
6. Human-Centered Data Science Methods (pg. 93)
Social Science Methods for Rethinking Data Science (pg. 93)
Data Science and Context: Why This Matters (pg. 94)
Thick Data and Its Importance for Data Science (pg. 96)
Quantitative Social Science Methods (pg. 97)
Computational Methods in the Social Sciences (pg. 98)
Qualitative Social Science Methods (pg. 99)
Transforming Methods: Design and Critical Approaches to Data Science (pg. 100)
Mixed Methods (pg. 105)
Combining Methods and Extending Data Science (pg. 105)
Collecting Context-Rich Data (pg. 105)
Incorporating Context into Data Science Methods (pg. 107)
Mixing Methods to Incorporate Context into Data Science Methods (pg. 108)
Visualization and Reflection (pg. 110)
Data Science Ethnography (pg. 110)
Iterating among Levels of Analysis (pg. 111)
Conclusion (pg. 112)
Recommended Reading (pg. 113)
7. Collaborations across and beyond Data Science (pg. 115)
Working in Teams (pg. 115)
Working with Other Disciplines (pg. 116)
Pi-shaped People (pg. 117)
Disciplinary Bias (pg. 117)
Working with Data Science Teams (pg. 118)
Understanding Patterns of Human-AI Collaborations (pg. 121)
Working in Organizations (pg. 123)
Working with Communities (pg. 125)
Working with People’s Data (pg. 126)
Conclusion (pg. 127)
Recommended Reading (pg. 128)
8. Storytelling with Data (pg. 129)
Why Stories Matter to Data Science (pg. 130)
The Importance of Stories in General (pg. 132)
Visualizations as Storytelling (pg. 133)
How Stories Work (pg. 134)
Storytelling as Communication (pg. 136)
How to Talk about Data Science (pg. 137)
How to Talk about Human-Centered Data Science (pg. 137)
Visualizing Is Part of the Story (pg. 137)
Risk Management for Business Leaders (pg. 138)
What Drives Policymakers (pg. 140)
Working with Experts (pg. 141)
Working with Communities (pg. 141)
Working with Journalists and the Media (pg. 143)
Storytelling Strategies for Data Science (pg. 144)
Conclusion (pg. 145)
Recommended Reading (pg. 145)
9. The Future of Human-Centered Data Science (pg. 147)
Human-Centered Data Science as Ethical Responsibility (pg. 147)
Human-Centered Data Science as Looking in the Right Places (pg. 148)
Human-Centered Data Science as a Collective Practice (pg. 149)
Human-Centered Data Science as Communication (pg. 150)
Human-Centered Data Science as Action (pg. 150)
Looking Forward in Human-Centered Data Science (pg. 151)
Glossary (pg. 153)
References (pg. 163)
Index (pg. 179)

Cecilia Aragon

Cecilia Aragon is Professor in the Department of Human Centered Design and Engineering at the University of Washington.

Shion Guha

Shion Guha is Assistant Professor in the Faculty of Information at the University of Toronto.

Marina Kogan

Marina Kogan is Assistant Professor in the School of Computing at the University of Utah.

Michael Muller

Michael Muller is Research staff member at IBM Research.

Gina Neff

Gina Neff is Director of the Minderoo Centre for Technology and Democracy at the University of Cambridge and Professor of Technology and Society at the Oxford Internet Institute and the Department of Sociology at the University of Oxford. She is the author of Venture Labor: Work and the Burden of Risk in Innovative Industries and coauthor of Self-Tracking and Human-Centered Data Science (both published by the MIT Press).

eTextbook
Go paperless today! Available online anytime, nothing to download or install.

Features

  • Bookmarking
  • Note taking
  • Highlighting