KIP-98: Exactly once processing
Sequence of actions
-
producer ask the broker to discover TC
-
producer ask the TC for a producer id (PID), and bump the PID version number/epoch to avoid zombie PID => synchronous Note: this PID/epoch is to identify the producer, so that TC can keep track of the current sequence number from the producer => necessary to implement idempotent semantics
-
rollback or forward any transactions left incomplete by the previous instance of the producer => synchrounous
-
Producer marks locally the transaciton has begun, and for TC it doesn’t begin until the first record is sent
-
producer add topicPartiion to the transaction on TC, so that we can mark commit or abort marker on each transaction Like most states in Kafka, TC state is backed by a log
-
producer produce to broker with PID, epoch, and sequence number Here broker will accept only if the sequence number = current stored + 1, if seqence # > current stored + 1, that makes fatal error. We need the PID so that later consumer can decide if messages from that PID can be materialized.
-
producer send Map<TopicPartitions, OffsetAndMetadata> and a groupId to TC of course this will be logged in the transaction log
-
producer will send a TxnOffsetCommitRequest to the consumer coordinator to persist the offsets in the __consumer-offsets topic
-
producer issues an EndTxnRequest to TC, with additional data indicating whether the transaction is to be committed or aborted
-
TC writes a PREPARE_COMMIT or PREPARE_ABORT message to the transaction log. Notice here the steps are similar to commit step in 1PC
-
TC begins the process of writing the command messages known as COMMIT (or ABORT) markers to the leader of each TopicPartition which is part of the transcation
-
Each broker will write a COMMIT(PID) or ABORT(PID) to the log, this message shows that whether the messages with the given PID must be delivered to the user or dropped. Cosumer will buffer messages which have a PID until it reads a corresponding COMMIT or ABORT message until it sees such message. Speical case: __consumer-offsets being one of the Topic Partitions in the transaction
13.After all the commit or abort markers are written the data logs, the TC writes the final COMMITTED or ABORTED message to the transaction log,