Note capacity estimation, storage design, and computation design will keep triggering update of each other

Requirements

  • Each kiosk has multiple delivery boxes, of different dimensions
  • When user places an order, the system should return up to 3 possible kiosks that can accept user’s delivery
  • Handle the NA traffic

Capacity estimation

  • 500 mil NA user, assume each place 1 order at black friday, so peak traffic at 500 mil / 90 k sec per day * 4 = 25 K TPS max
  • Each kiosk has 100 boxes, so it can support 1k population, assumption on normal day 10% people buys. So 500K kiosk is more than enough
  • Each kiosk covers a 0.5 mile radius with little overlap

Storage layer

  • Kiosk needs to keep track of:
  • geohash - needs index to range search
  • number of boxes
  • Box needs to keep track of
  • dimension (H, L, W)
  • parent kiosk
  • Status needs to keep track of
  • Box to the packages ids, one to many
  • parent kiosk if they are in separate service
  • Since kiosk info is rarely updated, kiosk and box info can be in read slaves or cache to scale out read.

Computation sequence

  • Fetch the nearest 10 kiosk from on of the DB’s read slave by range searching on geohash
  • Fetch the empty boxes (< 1k total) in those kiosks, and compute in memory possible empty ones, i.e., WLH > the package’s WLH
  • Return to the user kiosks with empty boxes