1. mutex lock/unlock is around the same range as RAM access - 100 ns = 0.001 ms. L2 access at 7ns
  2. Reading 1 MB sequentially from RAM - 0.25 ms.
  3. Ping between AWS AZs. Remember to turn on ICMP protocol, which is IP protocol.
    • Same AZ, 0.3 ms
    • ap-northeast-1a to ap-northeast-1c, 2.3 ms
    • ap-northeast-1a to ap-northeast-1d, 1.6 ms
    • ap-northeast-1c to ap-northeast-1d, 0.8 ms
  4. Reading 1 MB from network 10 ms. Note this number seems to include the cost of copying from kernel state to user state
  5. Reading 1 MB sequentially from disk 30 ms, excluding seek time. Sequential read 1GB/s for SSD, 4GB/s for memory. Random read 4KB from SDD is about 0.15ms
    • Disk seek is 10ms on traditional, 0.1 ms on SSD.
    • For traditional disk, writing has similar performance as read, but need to add full rotation + blocksize/transfer rate
    • SSD is log-structured and has lots of errors, but masked on the on-device logic. Hard to survive multiple power outages but average failure time > 6 years
    • for SSD, seq write improves throughput (250 MB/s vs 25 MB/s) and latency is roughly same. Seq read and random read has roughly same throughput and latency.
  6. Roundtrip between CA and Netherland is 150ms. SF to NYC is about 40ms
  7. Compressing 1K bytes with Zippy is in the same range as sending 1k bytes over 1 Gbps network - 0.01ms
  8. Empty Object in Java is 8 bytes (Some sources claim 12?). Actuall size is padded to the closest 8 bytes > estimate

Numbers for capacity planing

There are just rule of thumb. The strongest data point is PoC on the projected workload

  • Nginx
    • at least 20k concurrent connections per machine
  • Tomcat
    • Between 200 to 2k QPS per instance. Use Littleā€™s law to estimate
      • Time out for microservices, try not to make it > 1s, 90th percentile aim for 200ms
      • Each day has 86.4k seconds, although during estimate we probably want to use 40k instead of 80k to take sleep time into account.
      • Without support of other data, we can assume 80% of traffic happens within 20% of time,i.e., peak is 4 times of average traffic
    • Default thread pool size = 200. Max # of threads in OS - 3-5k. 500 threads in a single process is OK.
    • For OLTP workload, The most likely bouding factor is blocking calls, followed by CPU. During daily peaks, the CPU peak should be between 10 and 30 percent
  • Redis
    • 20k to 50k QPS, Mostly bounded by network instead CPU
    • CPU usage should not be > 15%
  • MySQL
    • Default connneciton pool size = 151. Note RDS connection pool size is based on the instance type. Normally keep 1.5k peak/sec per instance is no problem, but something to watch out for when it reachs 3k rps
    • InnoDB default lock wait_time is 50s, i.e., at least 50s before one deadlocked process exits
    • In OLTP workload, each txn should take no more than 10 ms on average on DB layer
    • c5.4xlarge (16 core,68 G ram) should be enough to hand 500 TPS OLTP workload. Less than that means problem with with code
  • Kafka
    • 95th percentile at 5 ms, with replication factor = 3, within the same DC
    • Most likely network and memory bound than CPU
    • 2M record per second on 3 node cheap machines in their standard benchmark

AWS cost saving tips

  • Recycle unattached EBS volumes. They are not reclaimed when you retire EC2 instances
  • Rarely you need more than 15k iops if it is not storage layer. 30k is more than enough for most storage layer
  • On-demand instance should not be more than 1/3 of the fleet. The rest will be a mixture of RI and spot instance