Week 4, Day 2 (Wednesday, Jan 29)
In-class exercise
- How are these terms relevant or related to your app:
- data center
- virtual machine
- virtualization
- provisioning
- overprovisioned
- underprovisioned
- elastic computing
- utilization
- throughput
- latency
- API
- If you were to define an overall SLA for your service, what kinds of targets would you have to set?
- How does or doesn’t your ID generation algorithm prevent conflicts?
- What platform-level, cluster-level, and application-level software is being used in your app?
-
How does your app scale?
- How could you change it to scale better?
- What other existing apps might use a similar platform? Why? (Hint: Video.)
- What metric did you chose for your AutoScaler/CloudWatch alarm? Why?
- If a worker fails while encoding an image, what happens? Can your system recover?
Distributed/Cloud systems structure
From There Is No Getting Around It: You Are Building a Distributed System.
Cloud applications are distributed systems—designing for the cloud is designing a distributed system
Every application is different
- Many off-the-shelf components won’t perform well enough
Note how much other stuff is necessary to build an application:
Source: p. 66 of There Is No Getting Around It: You Are Building a Distributed System, Copyright ACM, 2013.
The many questions to ask about a distributed service:
- Will the system have regions or be global?
- Single- or multiple-tenant?
- SLAs (for availability, latency, throughput, consistency, durability, …)
- Security
- Usage tracking
- Deployment and configuration management
Reading guide for next class
Read There Is No Getting Around It: You Are Building a Distributed System, from Messaging (p. 67) up to but not including Platform Components (p. 68).
These two sections are short but dense. For the “Messaging” section,
consider how your use of SQS in Assignment 2 fits his
description. Also note how he considers each of the factors (geographies, etc.) for this subservice.
For the “Automating Failover” section, focus on the basic need: We need
to design how the system recovers when a subcomponent fails.