Notes on mutil-dc replication of kafka

May 1, 2017

Based on https://www.slideshare.net/HadoopSummit/building-largescale-stream-infrastructures-across-multiple-data-centers-with-apache-kafka

Put replicas directly across DCs

on failed dc, the consumer will just switch to another replica.
when dc recovers, the existing catch up mechanism will kick in
If replicas are across region, latency will be high, plus demands a lot of cross dc bandwidth from replicator - hard for Kafka

Active-Passive

consumer on either active or passive dc
upon failure, the passive dc becomes the new active, consumer may need to switch too
note that offset may not match, because of at-least-once semantics of producer
normally for real time consumers, just start from the end and accept data loss
When the dc comes back, need to MM back changes from the new active to the former active - hard to manage the DC offset problem again.

Active-Active

each one has an active and an aggregate cluster
On failure and recovery, no need to reconfigure MM
To avoid aggregate cluster, we can preifx topics with DC tag,and confg MM to mirror remote topic only. and consumer need to sub to topics with both DC tags

How to make DB data available in all DCs

active-active: same consumer concurrently in both DCs
active-passive: only one consumer per dc at any given time

Ideally DB replication policy should be same as kafka cluster’s