Cloud Computing for Science and Engineering

by Foster, Gannon

ISBN: 9780262364737 | Copyright 2017

Click here to preview

Instructor Requests

Digital Exam/Desk Copy Print Desk Copy Ancillaries
Tabs

The emergence of powerful, always-on cloud utilities has transformed how consumers interact with information technology, enabling video streaming, intelligent personal assistants, and the sharing of content. Businesses, too, have benefited from the cloud, outsourcing much of their information technology to cloud services. Science, however, has not fully exploited the advantages of the cloud. Could scientific discovery be accelerated if mundane chores were automated and outsourced to the cloud? Leading computer scientists Ian Foster and Dennis Gannon argue that it can, and in this book offer a guide to cloud computing for students, scientists, and engineers, with advice and many hands-on examples.

The book surveys the technology that underpins the cloud, new approaches to technical problems enabled by the cloud, and the concepts required to integrate cloud services into scientific work. It covers managing data in the cloud, and how to program these services; computing in the cloud, from deploying single virtual machines or containers to supporting basic interactive science experiments to gathering clusters of machines to do data analytics; using the cloud as a platform for automating analysis procedures, machine learning, and analyzing streaming data; building your own cloud with open source software; and cloud security.

The book is accompanied by a website, Cloud4SciEng.org, that provides a variety of supplementary material, including exercises, lecture slides, and other resources helpful to readers and instructors.

Expand/Collapse All
Contents (pg. v)
Website (pg. xi)
Acknowledgments (pg. xiii)
Preface (pg. xv)
Chapter 1: Orienting in the Cloud Universe (pg. 1)
1.1 Cloud: Computer, Assistant, and Platform (pg. 1)
1.2 The Cloud Landscape (pg. 3)
1.3 A Guide to This Book (pg. 7)
1.4 Accessing the Cloud: Web, APIs, and SDKs (pg. 8)
1.5 Tools Used in This Book (pg. 12)
1.6 Summary (pg. 15)
1.7 Resources (pg. 16)
Part I: Managing Data in the Cloud (pg. 17)
Chapter 2: Storage as a Service (pg. 21)
2.1 Three Motivating Examples (pg. 22)
2.2 Storage Models (pg. 23)
2.3 The Cloud Storage Landscape (pg. 29)
2.4 Summary (pg. 35)
2.5 Resources (pg. 36)
Chapter 3: Using Cloud Storage Services (pg. 37)
3.1 Two Access Methods: Portals and APIs (pg. 37)
3.2 Using Amazon Cloud Storage Services (pg. 38)
3.3 Using Microsoft Azure Storage Services (pg. 42)
3.4 Using Google Cloud Storage Services (pg. 46)
3.5 Using OpenStack Cloud Storage Services (pg. 50)
3.6 Transferring and Sharing Data with Globus (pg. 51)
3.7 Summary (pg. 56)
3.8 Resources (pg. 57)
Part II: Computing in the Cloud (pg. 59)
Chapter 4: Computing as a Service (pg. 63)
4.1 Virtual Machines and Containers (pg. 64)
4.2 Advanced Computing Services (pg. 66)
4.3 Serverless Computing (pg. 67)
4.4 Pros and Cons of Public Cloud Computing (pg. 68)
4.5 Summary (pg. 70)
4.6 Resources (pg. 71)
Chapter 5: Using and Managing Virtual Machines (pg. 73)
5.1 Historical Roots (pg. 74)
5.2 Amazon’s Elastic Compute Cloud (pg. 75)
5.3 Azure VMs (pg. 80)
5.4 Google Cloud VM Services (pg. 82)
5.5 Jetstream VM Services (pg. 82)
5.6 Summary (pg. 83)
5.7 Resources (pg. 84)
Chapter 6: Using and Managing Containers (pg. 85)
6.1 Container Basics (pg. 85)
6.2 Docker and the Hub (pg. 87)
6.3 Containers for Science (pg. 90)
6.4 Creating Your Own Container (pg. 91)
6.5 Summary (pg. 92)
6.6 Resources (pg. 93)
Chapter 7: Scaling Deployments (pg. 95)
7.1 Paradigms of Parallel Computing in the Cloud (pg. 96)
7.2 SPMD and HPC-style Parallelism (pg. 97)
7.3 Many Task Parallelism (pg. 107)
7.4 MapReduce and Bulk Synchronous Parallelism (pg. 108)
7.5 Graph Dataflow Execution and Spark (pg. 109)
7.6 Agents and Microservices (pg. 110)
7.7 HTCondor (pg. 128)
7.8 Summary (pg. 128)
7.9 Resources (pg. 129)
Part III: The Cloud as Platform (pg. 131)
Chapter 8: Data Analytics in the Cloud (pg. 135)
8.1 Hadoop and YARN (pg. 136)
8.2 Spark (pg. 137)
8.3 Amazon Elastic MapReduce (pg. 143)
8.4 Azure HDInsight and Data Lake (pg. 147)
8.5 Amazon Athena Analytics (pg. 149)
8.6 Google Cloud Datalab (pg. 150)
8.7 Summary (pg. 157)
8.8 Resources (pg. 158)
Chapter 9: Streaming Data to the Cloud (pg. 161)
9.1 Scientific Stream Examples (pg. 162)
9.2 Basic Design Challenges of Streaming Systems (pg. 166)
9.3 Amazon Kinesis and Firehose (pg. 167)
9.4 Kinesis, Spark, and the Array of Things (pg. 170)
9.5 Streaming Data with Azure (pg. 175)
9.6 Kafka, Storm, and Heron Streams (pg. 180)
9.7 Google Dataflow and Apache Beam (pg. 184)
9.8 Apache Flink (pg. 187)
9.9 Summary (pg. 189)
9.10 Resources (pg. 190)
Chapter 10: Machine Learning in the Cloud (pg. 191)
10.1 Spark Machine Learning Library (MLlib) (pg. 192)
10.2 Azure Machine Learning Workspace (pg. 197)
10.3 Amazon Machine Learning Platform (pg. 202)
10.4 Deep Learning: A Shallow Introduction (pg. 204)
10.5 Amazon MXNet Virtual Machine Image (pg. 212)
10.6 Google TensorFlow in the Cloud (pg. 215)
10.7 Microsoft Cognitive Toolkit (pg. 218)
10.8 Summary (pg. 220)
10.9 Resources (pg. 222)
Chapter 11: The Globus Research Data Management Platform (pg. 225)
11.1 Challenges and Opportunities of Distributed Data (pg. 226)
11.2 The Globus Platform (pg. 226)
11.3 Identity and Credential Management (pg. 230)
11.4 Building a Remotely Accessible Service (pg. 240)
11.5 The Research Data Portal Design Pattern (pg. 243)
11.6 The Portal Design Pattern Revisited (pg. 250)
11.7 Closing the Loop: From Portal to Graph Service (pg. 252)
11.8 Summary (pg. 255)
11.9 Resources (pg. 255)
Part IV: Building Your Own Cloud (pg. 257)
Chapter 12: Building Your Own Cloud with Eucalyptus (pg. 261)
12.1 Implementing Cloud Infrastructure Abstractions (pg. 262)
12.2 Deployment Planning (pg. 263)
12.3 Single-cluster Eucalyptus Cloud (pg. 267)
12.4 Summary (pg. 281)
12.5 Resources (pg. 281)
Chapter 13: Building Your Own Cloud with OpenStack (pg. 283)
13.1 OpenStack Core Services (pg. 284)
13.2 HPC in an OpenStack Environment (pg. 284)
13.3 Considerations for Scientific Workloads (pg. 285)
13.4 OpenStack Deployment (pg. 288)
13.5 Example Deployment (pg. 289)
13.6 Summary (pg. 295)
13.7 Resources (pg. 296)
Chapter 14: Building Your Own SaaS (pg. 297)
14.1 The Meaning of SaaS (pg. 298)
14.2 SaaS Architecture (pg. 299)
14.3 SaaS and Science (pg. 301)
14.4 The Globus Genomics Bioinformatics System (pg. 303)
14.5 The Globus Research Data Management Service (pg. 307)
14.6 Summary (pg. 310)
14.7 Resources (pg. 310)
Part V: Security and Other Topics (pg. 311)
Chapter 15: Security and Privacy (pg. 315)
15.1 Thinking about Security in the Cloud (pg. 315)
15.2 Role-based Access Control (pg. 319)
15.3 Secure Data in the Cloud (pg. 320)
15.4 Secure Your VMs and Containers (pg. 324)
15.5 Secure Access to Cloud Software Services (pg. 327)
15.6 Summary (pg. 327)
15.7 Resources (pg. 328)
Chapter 16: History, Critiques, Futures (pg. 329)
16.1 Historical Perspectives (pg. 329)
16.2 Critiques (pg. 332)
16.3 Futures (pg. 335)
16.4 Resources (pg. 339)
Chapter 17: Jupyter Notebooks (pg. 341)
17.1 Environment (pg. 342)
17.2 The Notebooks (pg. 342)
17.3 Resources (pg. 344)
Chapter 18: Afterword: A Discovery Cloud (pg. 345)
Bibliography (pg. 347)
Index (pg. 367)
Contents (pg. v)
Website (pg. xi)
eTextbook
Go paperless today! Available online anytime, nothing to download or install.

Features

  • Bookmarking
  • Note taking
  • Highlighting