Week 4, Day 2 (Wednesday, Jan 29)

In-class exercise

How are these terms relevant or related to your app:
- data center
- virtual machine
- virtualization
- provisioning
- overprovisioned
- underprovisioned
- elastic computing
- utilization
- throughput
- latency
- API
If you were to define an overall SLA for your service, what kinds of targets would you have to set?
How does or doesn’t your ID generation algorithm prevent conflicts?
What platform-level, cluster-level, and application-level software is being used in your app?
How does your app scale?
How could you change it to scale better?
What other existing apps might use a similar platform? Why? (Hint: Video.)
What metric did you chose for your AutoScaler/CloudWatch alarm? Why?
If a worker fails while encoding an image, what happens? Can your system recover?

Distributed/Cloud systems structure

From There Is No Getting Around It: You Are Building a Distributed System.

Cloud applications are distributed systems—designing for the cloud is designing a distributed system

Every application is different

Many off-the-shelf components won’t perform well enough

Note how much other stuff is necessary to build an application:

The distributed services of an image resize service
Source: p. 66 of There Is No Getting Around It: You Are Building a Distributed System, Copyright ACM, 2013.

The many questions to ask about a distributed service:

Will the system have regions or be global?
Single- or multiple-tenant?
SLAs (for availability, latency, throughput, consistency, durability, …)
Security
Usage tracking
Deployment and configuration management

Reading guide for next class

Read There Is No Getting Around It: You Are Building a Distributed System, from Messaging (p. 67) up to but not including Platform Components (p. 68).

These two sections are short but dense. For the “Messaging” section, consider how your use of SQS in Assignment 2 fits his description. Also note how he considers each of the factors (geographies, etc.) for this subservice.

For the “Automating Failover” section, focus on the basic need: We need to design how the system recovers when a subcomponent fails.