Suppose our image processing application (Assignment 2) included its own image storage function rather than using Amazon S3. Some of the images on our site might become very popular (say, a striking photograph of a celebrity). When an image is popular, we want to retrieve it with low latency. We also want to ensure our images are retained even if there are disk crashes. (Recall that many data centers intentionally use disks with relatively low reliabilities.)
How would you use the approaches of sharding and replication (from Week 2 Wednesday) to decrease the latency of your image retrievals? For your sharding, how might you group images?
Build versus buy is not a simple choice
Testing is difficult
Operating the system requires careful monitoring (Week 9)
On an unsigned sheet of paper, briefly complete:
Work on Assignment 3 and review the previous readings.