(Un)reliability of timing code
Suppose we have code
long epochBefore = currentTime()
//A
remoteSlowCall()
//B
long epochAfter = currentTime()
long callTime = epochAfter - epochBefore
//C
- Suppose we have full GC happened at A or B, then callTime is much longer than the actual cost! For the same reason, in production, STW is a common cause for read timeout
- For the similar reason, with epochBefore, use only epochAfter at C is not reliable either, because it includes GC time! Therefore, we need to record BOTH epochBefore and epochAfter to perform sanity check and detect if gc possibly happened, and extend potential expiry time/time out to take GC time into consideration