Percentile of completed subrequests | Time (ms) |
---|---|
50 | 300 |
75 | 350 |
99 | 375 |
Percentile of completed subrequests | Time (ms) |
---|---|
50 | 1000 |
75 | 1100 |
99 | 1200 |
Percentile of completed subrequests | Time (ms) |
---|---|
50 | 500 |
75 | 1000 |
99 | 1750 |
Question 1: Assume your SLA for latency is a 99% of 900 ms. For the response times for each table, what approach would you use to achieve your SLA?
Answer:
Question 2: If your service were something like a Web search, where a "good-enough" answer is sufficient, What approaches might you use for each table?
Answer:
Question 1: Define sharding. Define replication. What is the difference?
Answer:
Both methods increase the number of requests that can be handled independently. Sharding splits different data across multiple instances, while replication spreads identical copies of some data across multiple instances.
Question 2: Assume that you have a database storing data about 1000 products, numbered 0–999. The products numbered 0–399 are accessed 1000 times/hr, the products numbered 400–799 are accessed 500 times/hr, and the products numbered 800–999 are accessed 100 times/hr.
You want to maximize parallelism using some combination of sharding and replication. You have up to 10000 servers. How might you divide your 1000 products into shards and replications to maximize the parallism? Note that you'll want to assign the most servers to the products that have the most requests. Your answer will probably not assign all 10000 servers.
Answer: The problem rewuests that you shard and replicate the data. There are many ways of organizing this. Here is one. Start by setting up shards that will require the same number of accesses:
Accesses/hr/product | Products/shard | Accesses/hr/shard | Products | Shards |
---|---|---|---|---|
1000 | 1 | 1000 | 400 | 400 |
500 | 2 | 1000 | 400 | 200 |
100 | 10 | 1000 | 200 | 20 |
Total shards is 620, all accessed 1000 times/hr. Replicating 620 shards as many times as possible across 10000 servers, we get 16 replications each, with 80 servers left over.
Question: What is the primary risk the network poses to a system?
Answer: The network will rarely fail completely but it is extremely likely that it will degrade in quality and capacity at times. Your application must be ready to handle such degradation. (Distributed, p. 69)
Question: Define authentication and authorization.
Answer: Authentication is the process of determining that a user is who they claim to be. Authorization is the process of determining whether an authenticated user can do an operation they have requested.
Question: List four categories of requirement that might be specified in an SLA.
Answer: Distributed lists five: availability, latency, throughput, consistency, and durability. Not all of these will be specified for every service but you do need to consider them all.
Question: Why don’t the percentile tables for response time include a row for 100%?
Answer: There is no upper bound on the very longest time a response might take. The worker may have crashed, resulting in infinite response time (you will never get an answer) or it may just take a very long time.
Question: What are the twin purposes of measuring the performance of your system?
Answer: 1. Monitoring the system in real time to detect failures and overloads, and 2. Analytics to determine usage trends. (Distributed, p. 70)