Paxos Made Alive
“Paxos Made Simple” mentions that a common optimization is to have a distinguished proposer. This single-master raises a few problems
- How can a replica become a coordinator?
- If coordinator instance does not change between paxos instances. There is an optimization we can use, and why does it work?
- Why we can not serve read request directly out of master copy/db?
- In response to 3, Why Chubby introduces master leases? How would read request served without master leases?
- How is master lease renewed? Why the lease timeout in the master is shorter than that in the replicas?
- During a network partition, a new master is elected, but the old master may still exist in the other partition. When network reconciles, what is the bad case that may happen between the 2 masters? How does Chubby defend against this case?
- How do they detect master turnover? What is its semantics?
To help reduce the log size, we need to take snapshots, which introduces a few classic problems related to log + snapshot
- Why it is application that takes snapshot?
- What is in the snapshot handle? why is it there?
- Steps of taking a snapshot
Other problems
- One of the premises of “Paxos Made Simple” is the non-Byzantine failure. However, disk may be emptied, or the file may be corrupted. How does Chubby mitigate these 2 threats?
- Paxos asks for replicas generate ever-increasing number from disjoint sets. Given some example schemes
- What kind of system Paxos is in terms of CAP?
- What is the purpose of safety mode and liveness mode testing?