Review of Weeks 3–5 (Week 6, Wednesday)

Latency: Consider the following three percentile tables:

Table 1
Percentile of completed subrequests	Time (ms)
50	300
75	350
99	375

Table 2
Percentile of completed subrequests	Time (ms)
50	1000
75	1100
99	1200

Table 3
Percentile of completed subrequests	Time (ms)
50	500
75	1000
99	1750

Question 1: Assume your SLA for latency is a 99% of 900 ms. For the response times for each table, what approach would you use to achieve your SLA?

Answer:

For these response times, even the slowest responses are well under our SLA. The simplest algorithm, distributing the work across the workers, will be more than enough.
For these response times, even the 50th percentile is too low. The system must be rearchitected to speed up the basic operations. No amount of hedging or tying will make the system fast enough.
These response times are a case where hedged or tied requests will help. Half the results are returned in 500 ms and three-quarters in 1000 ms but the last 25% form a long, slow tail. Hedged or tied requests will shrink that tail. We might not get down to our 900 ms goal but we could get close.

Question 2: If your service were something like a Web search, where a "good-enough" answer is sufficient, What approaches might you use for each table?

Answer:

As in the first case, these response times are so fast that we don't need any special handling to meet our target.
Again, these times remain too slow. We don't even get half the results within our 900 ms target. That's not close to "good enough".
Given these response times, we could elect to return a result to the user when we only have 50-70% of the data. That would keep our latencies within the 900 ms target.

Sharding and replication

Question 1: Define sharding. Define replication. What is the difference?

Answer:

sharding: Breaking the primary key range of a dataset into sub-ranges ('shards') so that requests to different shards can be handled independently.
replication: Creating identical copies of a dataset or shard of a dataset so that requests to different replicas can be handled independently.

Both methods increase the number of requests that can be handled independently. Sharding splits different data across multiple instances, while replication spreads identical copies of some data across multiple instances.

Question 2: Assume that you have a database storing data about 1000 products, numbered 0–999. The products numbered 0–399 are accessed 1000 times/hr, the products numbered 400–799 are accessed 500 times/hr, and the products numbered 800–999 are accessed 100 times/hr.

You want to maximize parallelism using some combination of sharding and replication. You have up to 10000 servers. How might you divide your 1000 products into shards and replications to maximize the parallism? Note that you'll want to assign the most servers to the products that have the most requests. Your answer will probably not assign all 10000 servers.

Answer: The problem rewuests that you shard and replicate the data. There are many ways of organizing this. Here is one. Start by setting up shards that will require the same number of accesses:

Sharding of the 1000 products
Accesses/hr/product	Products/shard	Accesses/hr/shard	Products	Shards
1000	1	1000	400	400
500	2	1000	400	200
100	10	1000	200	20

Total shards is 620, all accessed 1000 times/hr. Replicating 620 shards as many times as possible across 10000 servers, we get 16 replications each, with 80 servers left over.

Descriptive questions

Question: What is the primary risk the network poses to a system?

Answer: The network will rarely fail completely but it is extremely likely that it will degrade in quality and capacity at times. Your application must be ready to handle such degradation. (Distributed, p. 69)

Question: Define authentication and authorization.

Answer: Authentication is the process of determining that a user is who they claim to be. Authorization is the process of determining whether an authenticated user can do an operation they have requested.

Question: List four categories of requirement that might be specified in an SLA.

Answer: Distributed lists five: availability, latency, throughput, consistency, and durability. Not all of these will be specified for every service but you do need to consider them all.

Question: Why don’t the percentile tables for response time include a row for 100%?

Answer: There is no upper bound on the very longest time a response might take. The worker may have crashed, resulting in infinite response time (you will never get an answer) or it may just take a very long time.

Question: What are the twin purposes of measuring the performance of your system?

Answer: 1. Monitoring the system in real time to detect failures and overloads, and 2. Analytics to determine usage trends. (Distributed, p. 70)