Distributed databases often exaggerate the strength of their safety properties. For the last year or so, I’ve been building tools to analyze database consistency, especially during network partitions. Since IP networks are nowhere near as reliable as we’d like to think, designing for node and network failure is a critical aspect of building reliable systems. In exploring this problem, I’ve discovered bugs, undocumented behaviors, and success stories alike, in eight major distributed systems. In my talk at Factual (Tuesday 12/3/13), I’ll be discussing the latest installation of this project, known as Jepsen.
We’ll learn about the distribution models and integrity constraints of Zookeeper, a CP tree-structured datastore; NuoDB, a replicated SQL database; Kafka, a sharded and redundant queuing system; and Cassandra, a Dynamo-style column store. Along the way, we’ll learn how timestamps, replication, and failoversystems interact, what properties are realistically achievable in distributed systems, and key questions to ask when designing modern software.