System design: Dapper
Design a dapper-like system for tracing and monitoring
Requirements: drill down a certain request and see its sequence of actions
- Embed in the header
- Trace id: root id
- Parent session id: who called this request
- Current session id
- Seq number to mark layer and de-dup
- Do sampling to control no more than 1k writes per sec
- Can just store them in (traceid, session) -> table
- Upon viewing, just load all sessions in a trace on the fly and compute the tree (< 100 sessions per trace)
Requirements: calculate avg latency by node id and time
- Timeseries db, similar to the design of prometheus
- For each metric, we maintain (timestamp, machineid, sequence) -> labels, value
- Suppose 100 metrics sent per sec, 3 months data is 100 * 86400 * 100 = 1B data points, so if we want to plot the data over 3 months we need to scan at most 1 bill rows.
- It can be parallelized by each point we see on the graph