What triggers the txnLockFast failure?
Source version 2.1
What is txnLockFast error
This is triggered when read txn with read_start_ts happens between the write txn’s write_start_ts and write_commit_ts. Upon seeing this, the read txn will backoff retry later witht the same read_start_ts, so that we preserve snapshot isolation. The effect of txnLockFast is similar to pessimistic lock but with higher latency
tidb_tikvclient_backoff_seconds_count
Triggered by expression increase( tidb_tikvclient_backoff_seconds_count[10m] ) > 10
tidb_tikvclient_backoff_seconds_count
is a range vector which contains counters. This expression means it happensin more than 10 of the seconds in the last 10 minutes
Data structures
Retriever is the interface wraps the basic Get and Seek methods. Retriever is implemented by BufferStore -> unionStore -> tikvSnapshot -> tikvTxn
Transaction defines the interface for operations inside a Transaction. Transaction
is implemented by tikvTxn -> TxnState
// Scanner support tikv scan
type Scanner struct {
snapshot *tikvSnapshot
batchSize int
valid bool
cache []*pb.KvPair
idx int
nextStartKey []byte
endKey []byte
eof bool
// Use for reverse scan.
reverse bool
nextEndKey []byte
}
What triggers txnLockFast?
- Commit check
TxnState
also implementsBatchGet
but seems not used.tikvTxn
also implmentsGet
but not used except for testing
//part of Retriever interface
func (us *unionStore) Get(k Key) ([]byte, error) {
us.markLazyConditionPair(k, nil, e.(error))
}
//part of Transaction interface
func (txn *tikvTxn) Commit(ctx context.Context) error {
txn.us.CheckLazyConditionPairs()
}
func (us *unionStore) CheckLazyConditionPairs() error {
us.snapshot.BatchGet(keys)//this may trigger txnLockFast
}
- Scan keys in tikv
func (s *Scanner) Next() error {
s.resolveCurrentLock(bo, current)
}
func (s *Scanner) resolveCurrentLock(bo *Backoffer, current *pb.KvPair) error {
s.snapshot.get(bo, kv.Key(current.Key))
}
- Coprocesser
// handleCopResponse checks coprocessor Response for region split and lock,
// returns more tasks when that happens, or handles the response if no error.
// if we're handling streaming coprocessor response, lastRange is the range of last
// successful response, otherwise it's nil.
func (worker *copIteratorWorker) handleCopResponse(bo *Backoffer, resp *copResponse, task *copTask, ch chan<- *copResponse, lastRange *coprocessor.KeyRange) ([]*copTask, error) {
logutil.Logger(context.Background()).Debug("coprocessor encounters",
zap.Stringer("lock", lockErr))
ok, err1 := worker.store.lockResolver.ResolveLocks(bo, []*Lock{NewLock(lockErr)})
if err1 != nil {
return nil, errors.Trace(err1)
}
if !ok {
if err := bo.Backoff(boTxnLockFast, errors.New(lockErr.String())); err != nil {
return nil, errors.Trace(err)
}
}
}