Walking through the table we used on Friday.
Source: Table 2, p. 78 of The Tail at Scale, Copyright ACM 2013.
The categories of analysis:
SLA guarantees
The standard design questions for any system also apply (versioning, upgrades, …).
Many systems have a “leader” instance that assigns work to the other instances.
server.py
, assigning tasks to the worker.py
instances.What happens when the “leader” fails? Do you bring up a new leader automatically or have the operations staff do it?
Carried over from Friday.
Read There Is No Getting Around It: You Are Building a Distributed System, from Platform Components (p. 68) up to and including Platform Usage Collection (p. 69).
Two key points from these sections: