Design Problem: Pagination
Problem
- composite PK for message (msg_id, order_id, post_id)
- We want to paginate by the time
- data is sharded across DBs
- partition key is id % shards to ensure even distribution among shards
- We want to be able to go to a SPECIFIC page with known page size.
Of course complete view can do that, but can we do better?
100% accurate: next page only
- From product PoV, allows only next page, instead of going to the specific page.
- Every page we fetch, we remember the end of the timestamp at the current page
- When we look for the next page, we just use that last timestamp to pull the full page from each shard, and aggregate them in memory
Mostly accurate: guess the shard
- We just assume the data is evenly distrubuted
- Therefore, for each specific page we can compute exactly the offset, and page size. Both will be the original size/number of shards
100% accurate: good perf but more complicated
- Similar to the random partition method, pull pages from sharded db based on guessed offsets, but real page number, call the returned page RS
- Find the min_time of all 3 pages returned
- Query again with the time between min_time and the max time returned by each page in RS
- Find the virtual offset of min_time of in each shard’s page, and now we know the exact offset of min_time from the global PoV. Note that min_time global offset will be less than the target offset for sure.
- Now that with global view, we have enough information in memory to calculate the real offset from each record