Codeforces Notes

2019-02-19 00:00:00 +0000

113B: Petr#

  • find all S(begin) and S(ends)
  • for each i..j, calculate its rolling hash, add it to the corresponding map hash -> (start, end), if i is in S(begin), and j is in S(end)
  • for each entry in the map, add the size of the entry - # of duplicates to the final answer

633C: Spy Syndrome 2

  • reverse the whole sentence
  • caluclate the rolling hash of each word in the dictionary
  • DP on the suffix size. The overall run time will be O(n), because for each check, we reduce the overall problem size by 1,because we search for the FIRST matching one

965C: Greedy Arkady

  • bsearch on the times A gets, note that D is no more than 1000, we can brute force on all possible Ds, and find the max possible factor A gets
  • k * D * x >= n, K * (D - 1) * x <= n - x, also, x <= M
  • The target function is D * x

448D: Multiplication Table

  • bsearch on answer, at least iteration we can afford an O(n) iteration, i.e., just calculate through all rows to see if number < the target number matches, also keep track of equal coutns

Never Split the Difference

2019-02-07 00:00:00 +0000

  • Try NOT to speed up process to get a “Yes”, very easy to leave the impression that you are ignoring your counterpart.
  • Avoid direct/assertive voice - easy to put people into defensive mode. Use a positive/playful voice by default
  • Try to trigger “No” from counter part, so that the other part feels safe. You can force a “No” by mislabelling
    • If very hard for the counterpart to say “No”, just walk away, it is too early
  • When you talk about numbers, use precise, odd ones to so that it APPEARS to be thought after
  • SLOW DOWN
  • NEVER use “I understand”
  • Liar is more likely to use longer sentences, third-party pronoun, and more complex sentences
  • Use the other side’s own name - they want to be called!
  • The more I, me, my one uses, the less impact one has in decision making. Other questions to identify deal-breakers:
    • How does this affect the rest of them?
    • How on board are people with this call?
    • What do your team see as the main challenge in this area?
  • Labelling, which works well against analyst such as myself
    • It seems/looks/sounds like… DO NOT use I, otheriwse it looks like you are self-centered!
    • pause 3-4 secs
    • Rinse and repeat
    • I didn’t say that is the actual case, I just said it seems that.
  • Anger shows your passion, but reduces your counterpart’s congintive activity due to the emotional mode. You can de-escalate by proposing a time-out. Try to calm emotion before listing theories and facts
  • Something doesn’t make sense means chance to discover unknowns! Best discovered during face-to-face communications, near breaks or meeting ends. Hard to discover during emails
  • The word “fair” is often used so the other side goes defensive and gives concessions. Common cases:
    • We just want what is fair -> OK I am sorry, let us get back to where the fair part begins
    • We have given you a fair offer -> Mirror: Fair? + Label: It seems that you have evidence to support it.
    • Powerful: Stop me at any time if you feel I am being unfair
  • List the worst thing the other side could say and say them first, this will encourage the counterpart to say what the opposite is true
  • When you sell yourself to the manager, sell yourself as ways to prove their own intelligence and that they can broadcast to the rest of the company, i.e., you are the proof of HIS importance, let him have a stake in your success
  • Power Sales Negotiators know that splitting the difference does not mean splitting it down the middle. Just split the difference twice and the split becomes 75 percent/25 percent. Furthermore, you may be able to get the other party to split the difference three or more times.
  • The first thing to remember is that you should never offer to split the difference yourself, but always encourage the other person to offer to split the difference.

How to negotiate with an accommodator

  • Time spend communicating is a sign of building relationship, which they value
  • Ask what/how questions. This lets the counterpart feel you need their intelligence to overcome the problem and they have the control on defining success. Such feeling appeals much to people with egos. It also makes people feel it is THEIR idea.
    • Avoid binary questions
    • Avoid when/where questions which can be answered without too much thinking
    • Avoid why questions, because it is easy to trigger people into fight or flight mode
    • Concretely, AFTER they agree with something for the first time. Label and summarize their answer to get a “that’s right”, and follow up with a what/how questions
      • “What do you see as being the most difficult thing”
      • “What is the biggest challenge you face?”
      • “What is the success measure”
      • “how will we know we are on track”
      • What about this is important to you?
      • How can I help to make this better for us?
      • How would you like me to proceed?
      • What is it that brought us into this situation?
    • How can we solve this problem?
    • What’s the objective? / What are we trying to accomplish here?
    • How am I supposed to do that?
      • “how do we address things if we are off-track”
  • Communicating is good sign, and slience suggests angry
  • Easy to be distracted and not good time manager
  • Hesitation comes more often in tone and body language, and NOT in words
  • Accommodators are more likely to oversell/overpromise.

How to negotiate with an assertive

  • Analytical is the worst type match with assertives. Analyst type does not enjoy “What”/”How” questions.
  • To assertives, getting things done is more important than making it perfect.
  • They need to think you understand them before listening to your pov
  • They will keep talking on every moment of slience. This leaves the room for mirroring technique
    • Use a slow, calm voice to create authority and trustworthiness. Note this voice is normally used to make a point and should NOT be the default voice
    • “I’m sorry…”
    • Repeat the last 3 words or the last critical 1-3 words of what the assertive just said
    • Keep slient and let the other side speak. This also helps reveal their strategy
    • Rinse and repeat if needed
  • Try NOT to offer trade-offs/reciprocity, because assertives will ask for more. Conversely, if they give up something, they expect something equal or more in return.
  • Get “That’s right” from them, because that is a sign they FEEL they own the conclusion.
    • Can follow up with a “According to $(name)…”
    • On the other hand, “you are right” means almost nothing, and should not definitely taken as sign of progress
    • I will try. You are right = I plan to fail. In this case, drill again with a “how” when you get a “that’s right”
  • When the negotiation is going off-track/losing focus, just invite them to re-engage

On ThreadPoolExecutor

2019-01-30 00:00:00 +0000

  • NOT recommend to create via Executors static factory, instead, use ThreadPoolExecutor. Recommend to use newCacheThreadExecutor, defaults to ThreadPoolExecutor
  • Note that in general try NOT to create new threads explicitly. Let TP provide one.
  • To handle exception in the worker
    • try-catch inside the runnable
    • submit() to execute, Future.get() to handle ex
    • TPE.afterExecute()
    • pass in Thread.UncaughtExceptionHandler
  • Constructor params
    • corePoolSize
    • long keepAliveTime
      • When # of threads > cores, idle threads will wait this long before got terminated
      • Defaults to 60s
    • maximumPoolSzie, i.e., we can use it to set constant size TP.
    • workQueue
      • Capacity defaults to Integer.MAX_VALUE!
      • Insertion is done by BlockingQueue.offer(), which will not throw exception or block.
    • threadFactory
    • RejectedExecutionHandler handler - when out of thread range or queue capacity
  • The state of TPE,i.e., highest 3 bits of the atomic ctl value
    • RUNNING
    • SHUTDOWN: shutdown(), no new task will be accepted, but will finish executing the jobs in the task queue
    • STOP: shutdownNow(), no new task, not executing tasks in the q
    • TIDYING: all tasks completed, will execute terminated()
    • TERMINIATED: after termiated() is run
  • submit() returns Future vs execute() returns nothing

Building pieces of the TPE

class Worker extends AbstractQueuedSynchronizer implements Runnable
{
       Runnable firstTask;
       final Thread thread;
       volatile long completedTasks;
}
  • thread is created in the worker’s ctor, by the thread factory passed in
  • Worker is a nested class of the TPE. In fact, TPE has a reference to Worker too
class FutureTask{
        Callable callable;
        private volatile int state;
        private Object outcome; // non-volatile, protected by state reads/writes
        private volatile Thread runner;
}

Future<?> submit(Runnable task) {
        RunnableFuture<Void> tftask = new Futuretask<T>(runnable,  null); //uses Executors.callable(runanble, null)
        execute(tfask);
        return ftask;
}
  • runner is CASed as the Thread.currentThread() from null. This also acts as the mutex check
  • callable is turned from runnable by the Executor. Callable returns result and may throw exceptions

TPE.execute()

void execute(Runnable command) {
        int c = ctl.get();

        if(wokerCountOf(c) < corePoolsize) {
                if(addWorker(command, true))
                        return;
                c = ctil.get();
        }

        if(isRunning (c) && workQueue.offer(command)){ //added to the blocking queue successfully
                int recheck = ctl.get();
                if(!isRunning (recheck) && remove(command))
                        reject(command);
                else if (wokerCountOf(recheck) == 0)
                        addWorker(null, false);
        } else if (!addWorker(command, false)){
                        reject(command);
        }
}
  • TPE.execute() is the interface to outside, i.e., directly called by submit()
  • If TPE is running, and we can add the command to the task q. Note that we have to double check the state again because the offer operation is not an atomic step of isRunning and try removing it it. Similarly, the two else if logics are quite similar because we essentialy performed the check twice.
  • If isRunning try adding the worker. Note that addWorker side checks the TPE state, so when the TPE is not running, the new task will not be run

TPE.runWorker(Worker w):

while (task != null || (task = getTask()) != null) {
    w.lock();
    if ((runStateAtLeast(ctl.get(), STOP) ||
         (Thread.interrupted() &&
          runStateAtLeast(ctl.get(), STOP))) &&
        !wt.isInterrupted())
        wt.interrupt();
    try {
        beforeExecute(wt, task);
        Throwable thrown = null;
        try {
            task.run();
        } catch (Throwable x) {
            thrown = x; throw new Error(x);
        } finally {
            afterExecute(task, thrown);
        }
    } finally {
        task = null;
        w.completedTasks++;
        w.unlock();
    }
}
  • This is called from worker and, by extension, worker’s thread since worker is an inner class
  • Standard lock() and then unlock() inside finally
  • getTask() checks if it can poll a task with keepAliveTime, if it couldn’t work, it will return a null. This is how the non-core threads are terminated

Shutting down TPE

void shutdown() {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try{
                checkShutdownAccess();
                advanceRunState(SHUTDOWN);//CAS in a loop to set state. No actual interruption
                interruptIdleWorkers();
                onShutdown();
        }finally{
                mainLock.unlock();
        }

        tryTerminate();

}

List<Runnable> shutdownNow() {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try{
                checkShutdownAccess();
                advanceRunState(STOP);
                interruptWorkers(); //uses Thread.interrupt(), which just sets the mark, needs Thread.interrupted() inside that thread
                tasks = drainQueue();
        }finally{
                mainLock.unlock();
        }

        tryTerminate();
        return tasks;
}

threadPool.shutdown(); //have to wait for the termination explicitly
while(!threadPool.awaitTermination(60, TimeUnit.SECONDS)){
        //busy waiting here, most likely you need to add the max retry count
}

FutureTask.run()

if (state != NEW ||
    !UNSAFE.compareAndSwapObject(this, runnerOffset,
                                 null, Thread.currentThread()))
    return;
try {
    Callable<V> c = callable;
    if (c != null && state == NEW) {
        V result;
        boolean ran;
        try {
            result = c.call();
            ran = true;
        } catch (Throwable ex) {
            result = null;
            ran = false;
            setException(ex);
        }
        if (ran)
            set(result);
    }
} finally {
    // runner must be non-null until state is settled to
    // prevent concurrent calls to run()
    runner = null;
    // state must be re-read after nulling runner to prevent
    // leaked interrupts
    int s = state;
    if (s >= INTERRUPTING)
        handlePossibleCancellationInterrupt(s);
}
  • CAS at the start uses current thread id as the optimistic lock. Hence, runner is set to null in the finally

On CMS GC

2019-01-28 00:00:00 +0000

Commonly used in old gen GC. New gen uses ParNew

Heap size

  • normally 2-6 G, higher than that becomes much more debatable.
  • common to set min heap size same as max heap size to reduce OS scheduling cost

Goal: shortest stop the world time.

  • tag objects directed referenced by GC roots
  • Remark: fix changed introduced during the concurrent tag stage
  • By default GC thread 3 = (CPU + 3) /4
  • Still need to mark & sweep for old ge
  • Old gen will have frag problem that may trigger full gc too much
  • New gen GC strategy is same as ParNew
  1. STW, init mark: mark old gen objects DIRECTLY reachable from the GC root, and old gen objects referred by the new gen.
  2. Concurrent mark: use GC root tracing to tag all garbages, user thread still running
  3. Pre-sweep: tag old gen survival objects
  4. Interruptable pre-sweep
  5. STW: retag, scan GC roots + new gen + old gen objects with dirty cards
  6. concurrent sweep: mark unreachable object
  7. concrrent reset: reset gc state and enter the next GC cycle

Manage oneself

2019-01-21 00:00:00 +0000

Reading notes on the essay of same name by Peter Drucker

  • Most self-evaluation on one’s own strength and weakness are not reliable - has to go through proper try-feedback loop over some time, AT LEAST couple of months
  • Focusing on improving strength instead of weakness
  • Things need to figure out on people, again, NOT by self-evaluation:
  • Reader or listener, i.e., needs preparation or on the spot
  • Work with people or alone?
  • Advisor or decision maker?
  • Under stress or predicative environment?
  • Big corporation or small one?
  • When short term results are in conflict with long-term growths. A company’s value will determine the priority
  • A Plan > 18 months is hard to stay clear and specific
  • The key to up-management is to understand in which your boss works in the most efficient way, and adjust to that
  • Communicate with your 360 clearly:
    • This is what I am good at.
    • This is how I work.
    • These are my values.
    • This is the contribution I plan to concentrate on and the results I should be expected to deliver
    • And ask them the same thing (NOT via plain self-evaluation though!)
  • The existence of trust between people does not necessarily mean that they like one another. It means that they understand one another
  • Don’t try hard to change people’s behaviors

(Un)reliability of timing code

2019-01-20 00:00:00 +0000

Suppose we have code

long epochBefore = currentTime()
//A
remoteSlowCall()
//B
long epochAfter = currentTime()

long callTime = epochAfter - epochBefore
//C
  • Suppose we have full GC happened at A or B, then callTime is much longer than the actual cost! For the same reason, in production, STW is a common cause for read timeout
  • For the similar reason, with epochBefore, use only epochAfter at C is not reliable either, because it includes GC time! Therefore, we need to record BOTH epochBefore and epochAfter to perform sanity check and detect if gc possibly happened, and extend potential expiry time/time out to take GC time into consideration

On red black tree

2019-01-20 00:00:00 +0000

Invariants

  • Root node is black, leaf nodes are EMPTY black nodes
  • Parent and child can NOT be both R
  • From any node, the paths to ALL leaves pass through same # of B

Insertion

  • Use the standard BST way to insert a R leaf node, and then try to fix the invariant
  • Case 1: P is B, we are good, because the old paths to the P can not extened to the current node (the new leaf that replaced the old P), with no additional cost
  • Case 2: P is R, and sibling of P is R too
  • Paint U and P B, and G R
  • RECURSIVELY fix the invariants with G (now R) as the leaf param
  • Case 3: P is R, and U is B or U does not exist
  • if necessary, left rotate or right rotate P so that C is on the “outside” of the G, i.e., it is not the left-right or right-left of G. No color change at this step
  • Then rotate the grandparent to make C the new grand parent, and paind the old G red, C black

Removal

  • Idea similar to BST - we just find the in-order predecessor or successor to the node , swap the value, and then delete that predecessor/successor node
  • However, the actual cases are complicated. If both P and child are B, there are SIX cases to consider

On Load Balancing

2019-01-14 00:00:00 +0000

  • Nginx does level 7 load balancing to the individual services based on url/service. The server behind LBs repond the request directly, because all servers will have the same VIP bound to their loopback interface
  • LVS does level 4 load balancing that LBs multiple Nginx instances. It uses keepalived, which is on active-standby model, to ensure LVS’s HA. Note that we don’t use keepalived on Nginx, since LVS will handle that.
    • Use lvs/f5 IN FRONT OF nginx, e.g. f5 can do 100k qps, and use the same keepalived + virtual IP trick on lvs, i.e., LVS will share the same VIP. This should be enough for most cases.
    • If a single LVS is not enougu, we can let DNS point to multiple virtual ips backed by the lvl 4 load balancer
  • All level 4 LB instances in the same cluster will have the same virtual IP and same virtual MAC address. Note that level 4 LB can easily handle 100k qps
  • DR mode: to change mac address on the data link layer. LB will not change ip address, but change target’s MAC address. All physical servicers and LB servers have the same virtual IP. The web server can respond to the user directly, bypassing LB
    • Use ARP to broacast VIP, and machine with such IP will reply with its own MAC address, but only LB can respond to the ARP reqeust
  • Problem with source IP hash with lvl 4 load balancing
    • When source IP is NAT gateway for large LAN
    • When mobile client reconnects, hard to go back to the original server with consistent hashing

Hashmap and ConcurrentHashmap

2019-01-07 00:00:00 +0000

Hashmap

  • Array init size defaults to 16. Double the size when it is 75% full
  • Length is always a power of two, because hash%length==hash&(length-1)
    • How to avoid/optimize rehash when resizing
  • When array length > 64 and some link list(LL) length > 8, this LL will be changed to red-black tree(RBT)
  • When iterating entrys, prefer EntrySet to retrive k & v at the same time.

Hashmap’s concurrency problem

  • When insert two elements with the same hash, while there is no existing node at the entry, in putVal()

    if ((p = tab[i = (n - 1) & hash]) == null) 
            tab[i] = newNode(hash, key, value, null);

and one node will be overwritten

  • When resize(), this may introduce a cycle in the link

ConcurrentHashMap

Node type


static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        volatile V val;
        volatile Node<K,V> next;

//Note that Node.setValue gives UnsupportedOperationException(),i.e., Node itself is immutable
//....

Members

    /**
     * The array of bins. Lazily initialized upon first insertion.
     * Size is always a power of two. Accessed directly by iterators.
     */
    transient volatile Node<K,V>[] table;

    /**
     * The next table to use; non-null only while resizing.
     */
    private transient volatile Node<K,V>[] nextTable;

Put

  • Assuming the normal case, I deleted/commented/erased out some lines for the ease of reading
  • CHM does not allow key or value to be null
    final V putVal(K key, V value, boolean onlyIfAbsent) {
        int hash = spread(key.hashCode());
        int binCount = 0;//size in the list to decide if need to convert to RBT
        for (Node<K,V>[] tab = table;;) {
	Node<K,V> f;//current node at tbl 
	int n; //size of array, always a power of two 
	int i; //index at the tbl array
	int fh; //mod hash value
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
	    // (n-1) & hash, fast mod with 2^n, i,e. first time ever at this slot
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
//can't CAS, spin again
            }
            else if ((fh = f.hash) == MOVED)
	    //resizing is underway, let it finish first, may create a thread to speed up resizing
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {
		if (tabAt(tab, i) == f) { 
		//now that we are in mutex, first check if the node value remains valid
                        if (fh >= 0) {
                            binCount = 1;
			    //linked list upsert
                        }
                        else if (f instanceof TreeBin) {
                            binCount = 2;
			    //RBT upsert
                        }
                    }
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
		    if (oldVal != null)
			//update case. no need to resize
                        return oldVal; 
                    break; //conclude spinning 
                }
            }
        }
	//attempt to help resize in case of insert
        addCount(1L, binCount); 
        return null;
    }
  • get() does not use any mutex or CAS, why does it work?
    • Every time we update the table, we acquire the mutex first, i.e., in effect only 1 thread updating volatiles. Therefore, reading volatiles becomes thread safe

Concurrent resizing

Inside transfer()

  • Mark copied node and empty node as ForwardingNode. When iterating, when we see such nodes, move to the next
  • Uses MIN_TRANSFER_STRIDE to lower bound the range
  • No new thread is spawned. It reuses threads in put().

On G1 GC

2019-01-03 00:00:00 +0000

How does G1 separate new gen and old gen?

Concepts

  • Collet Set(CSet): collecatable-region set. Survivors in the CSet will be moved to antoher region. LT 1% of heap size
  • Remembmered Set(RS): RS records Points-into,i.e., object references in other regions that use this region’s objects. It is a (region address, set of card table index) hashtable
  • Snapshot-At-The-Beginning (SATB): previous bitmap and next bitmap
  • Eviction failure (EF): new gen not able to find regions to accept survivors
  • jdk9 default, separate heap into same size regions. Some for new gen, and some for old gen.
    • Each block can be one of Eden/Survivor/Old. Note that new gen will still do recycle/promotion when it is full.
    • New gen GC is similar to ParNew, will STW and copy to survivor or old gen
  • Designed to handle cases where stop-the-world is unacceptable,e.g., HFT.
  • Humogous Object goes directly to old gen, may use mutliple continous regions to hold it

Steps (for mixed GC)

  • Initital Marking (IM)
    • concurrently clean up next marking bitmap,
    • pause all user threads, and scan and tag all DIRECTLY reachable objects from root in each region
    • Put the top value to the next top at mark start(TAMS)
    • resume all user threads
    • Root region scan - G1 scans survivor regions, and mark all referenced objects
  • Concurrent Marking
    • scan based on the objects found in IM
    • Use remembered set logs to track what is object relationship is changed concurrently
  • Final marking pause
    • process RSL and update RS.
    • process SATB logs
  • Live Data Counting and Cleanup
    • Note that the cleanup trigger time is based on max gc pause allowed and a few other steps

Optimization & Tuning

  • Goal is to avoid full gc and EF
  • Don’t set NewRatio yourself
  • Prioritize maxGCPauseMillis (default 200 ms) instead of configing others
  • Default Heap space reachs 45% usage will trigger concurrent

On Java Synchronized

2018-12-23 00:00:00 +0000

Synchronized

  • every object has a monitor lock, montiorenter and monitorexit
  • Synchronized static methods are synchronized on the class object of the class the synchronized static method belongs to.
  • Every thread has an available monitor record list, and a global available list, every locked object is assocated with a montior, and monitor has an owner section to store the thread holding the the monitor
  • Mark word in object header,e.g., hashcode, ago, bias, lock tag. Object header = mark word + type pointer
  • Klass Pointer to the class’s meta data
  • JVM will take note of the thread id/address of Thread pointer as the owner of the syncrhonzied object

CAS

  • Heavily used by atomic types in java.util.concurrent
  • ABA problem: the value changes in the sequence of A-B-A, so from CAS pov the value is not changed since the last change on A, but it did in fact!
  • By default PreBlockSpin = 10. Note that locks can only upgrade but not downgrade
  • LongAdder in Java 8
    • If the current threads too much, it will split the addtions into a cell array, each with its own CAS
    • If one failed to CAS one cell, LongAdder will try a different cell.
    • The final result is the sum of base and all values from cells

Biased lock

  • Thread will store its id in the lock object’s mark word. When entering and leaving sync block it will not use CAS to lock/unlock, it will CAS the thread id instead. When the thread id is different from the current one:
    • If biased mark is 0, means it is LW lock already, use LW’s CAS to acquire lock
    • If biased mark is 1, just CAS to change thread id to current id. If this CAS failed, will ask the cancel the biased lock, and pause the thread holding the biased lock to check if the that thread is still active
  • Thread will not release biased lock itself. When no bytecode is running, the thread holding biased lock checks if the lock object is locked or not, and will cancel biased lock to restore to no-lock(01) or lightweight lock (00), depending on the answer

Lightweight lock

  • Other thread will spin to try acquiring the lock but will not block
  • if the sync obj lock is 01, jvm will setup a Lock Record in the stack frame to store the lock object’s Mark Word, and then use CAS to change object’s mark word to pointer to the Lock Record, and set Lock Record’s owner to object’s Mark Word, if good, then Mark Word’s lock status is 00
  • if update failed, jvm will check object’s mark word is pointing to current thread’s stack frame, if only 1 waiting, the thread will just spin, if the thread spins too many times, or one thread is hold, one spinning, and a third coming, LWL will upgrade to HWL
  • When release, try CAS to replace Mark Word back into object header. if failed, means there is contention with the lock, release the lock, upgrade to HWL, and wake up blocked threads

Heavyweight lock

  • When upgrading to heavy weight lock, lock status changed to “10”, Mark Word stores pointer to HWL, all waiting threads wil block
  • The synched object will point to the created monitor object, i.e., the monitor inside the object, which in turns based on MutexLock from OS - blocking and waking up requiring OS, i.e., from user mode to kernel mode (hence “heavy”)

On JVM Class Loading

2018-12-23 00:00:00 +0000

Class load sequence of actions

  • load - done by the class loader
    • get byte stream via FQTN
    • transform byte stream to metaspace runtime data structure
    • create java.lang.Class for this type
  • Linking - Verification, including file format, meta data, byte code, symbol reference
  • Linking - Prep
    • allocate memories for class variable and assign init values inside metaspace
    • However for final value, at this same it will be that value already
  • Linking - Analyze/Resolution. Note this step can be delayed after init
    • Compiler generate symbolic ref including class name, class method name, params type, and return types
    • the class does not even know its own method and string’s address yet. Need to replace symbolic refrence with direct reference
  • Init
    • execute type constructor clinit() method
    • () is to construct types: which has static code block or static assign value
    • () method is to execute instances, which runs () first
    • Init of class variables will be in the order of declaration

Types of class loader

  • Bootstrap ClassLoader: load JAVA_HOME/lib and boot classpath JAVA_HOME/lib/rt/jar
  • Extension ClassLoader: load JAVA_HOME/libext and java.ext.dirs. Parent is Bootstrap CL
  • Application/System ClassLoader: classpath in the normal sense. Parent is Extension CL. Loads from ENV VAR and class.path
  • User ClassLoader: inherits form Application ClassLoader. Note classes loaded by this CL can be unloaded, but not the other 3

Parent delegation model for loadClass()

  • See if the requested type has already been loaded into this class loader’s namespace (via findLoadedClass()). If so, return the Class instance for that already-loaded type.
  • Otherwise, delegate to this class loader’s parent loader. If the parent returns a Class instance, return that same Class instance.
  • Otherwise, invoke findClass(), which should attempt to locate or produce an array of bytes that define the desired type. If successful, findClass() should pass those bytes to defineClass(), which will attempt to import the type and return a Class instance. If findClass() returns a Class instance, loadClass() returns that same Class instance.
  • Otherwise, findClass() completes abruptly with some exception, and loadClass() completes abruptly with the same exception.
  • When the loadClass() method of a class loader successfully loads a type, it returns a java.lang.Class object to represent the newly loaded type.
  • Note that tomcat does not really follow this model

Custum class loader

  • Types loaded through the bootstrap class loader will always be reachable and never be unloaded. Only types loaded through user-defined class loaders can become unreachable and be unloaded by the virtual machine. A type is unreachable if its Class instance is found to be unreachable through the normal process of garbage collecting the heap.
  • Use case:
    • Byte encrption: use tool to encrypt class file. when running, use customized ClassLoader to decrypt file content and then load the decrypted bytecode
    • hot load class file
    • load class at specific location, e.g., URL
  • to uniquely identify a type loaded into JVM requires the fully qualified name and the defining class loader. Every Class object has reference to its class loader

When will java init the class?

  • create class instance type
  • use class/interfaces’ static val/method. Note that before that the interface of the class will not be inited. Note that this class’s instanct variable types will not be inited
  • Class.forName(). Note that Classloader.loadClass() will not init the class immediately
  • init a class’s subclass
  • Tagged during JVM’s start

On OKR

2018-12-12 00:00:00 +0000

Objective

  1. In a single sentence, qualitative goal
  2. Should be inspirational, not conform to status quo or describe business as usual
  3. the goal should be time bound, but not too long
  4. Try setting only one

Key Results

  1. most often three
  2. quantitative metrics to meature if objective has been reached by the end of the period, should include evidence of completion
  3. difficult but not impossible, i.e., 50% of failing, as if it is truly random
  4. something happened because of something you did => describe impacts instead of activities
  5. can be time-related if not by default end of the whole period
  6. Don’t give different weights to key results even though they contribute different level of values=> it is about tendency, not perfection
  7. much of the value in OKRs comes from the conversation on what matters
  8. baking OKR into weekly meetings and weekly status mail
  9. DO NOT CHANCE OKR, let it fail and set better goal next time
  10. If you can’t keep on track longer than a week. Then you are not ready for OKR yet. This is especailly true for startups. OKR should be reviewed weekly or biweekly to ensure focus and progress
  11. Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes
  12. rather than using a single number, the best assessment is usually a set of measurements. By choosing multiple metrics, we can design a solution without the unintended consequences that occur when optimizing for a narrow objective.

From YC

  • that the closer to an individual level you get the less useful OKRs are. The backward process of, “we know what we are going to do, how to we make it fit the OKR formula”
  • Spotify decided to stop using OKRs - they grow too quickly so that OKR changes a lot
  • OKRs are designed to exist in levels, “trickling down” in a way that narrows them down more and more as they reach specific teams and individuals
  • OKRs in particular are hugely reliant on leadership defining goals for others to align with.
  • For a single team, OKR is used to associate task with goals, instead of vice-versa as intended

Inter-process communication mechanisms

2018-12-04 00:00:00 +0000

  • File IO
  • Signals, e.g., SIGTERM, SIGSTOP
  • Socket - a form of file descriptor
    • Uses socket descriptor as handle. Underlying it associates with IP + port combo of LOCAL node
    • a concept used mostly in the transport layer
  • Unix domain sockets. similar to the internet socket, but all communications happens with the kernel. Referred to by the process as inode
  • MQ in the OS
  • Pipe
    • parent process creates the anonymous pipe, and let the child process inherit it
    • appearts to be normal FD, but no seek capability
    • In Unix, processes of the same pipe are started at the same time. The buffering ensures that no data is lost
  • Named pipe. appears as a file. Processes use it instead of standard streams
  • Shared memory, e.g., mmap file, note it is different from sendfile
    • Pros: without shared memory, message,pipe,and files have to copy from user space’s buffer to kernel space’s IPC buffer, and again from kernel to user space
    • In most operating systems the memory region mapped actually is the kernel’s page cache (file cache), meaning that no copies need to be created in user space
    • In the IPC context, normally it is non-persist file, i.e., when the last process is finished the data is lost
    • The memory-mapped approach has its cost in minor page faults—when a block of data is loaded in page cache, but is not yet mapped into the process’s virtual memory space
    • Aside from the popular IPC usage, most common usage is the process loader

On Little's Law

2018-12-03 00:00:00 +0000

Num of items in the system = rate of items arriving in the system * time each item stay in the system

In terms of performance analysis, this means:

  • Num of executors we provide >= # of our expected throughput * # of expected latencies
  • Conversely, given the size of resource/thread/conneciton pools, and the average latencies we observce, we can infer our max expected throughput
  • Tomcat has default thread pool size of 200. Mysql’s max_connections defaults to 151.

On JVM GC

2018-11-27 00:00:00 +0000

When is full gc triggered?

Concepts

  • safe point: points where you can pause user thread,i.e., needed for GC
  • safe area: for sleep or blocked threads. In the code, reference relationship won’t change
  • Scavengc GC: triggered when Eden failed get more space. Eden region only
  • From OS pov, perm is in the jvm process’s HEAP.
  • Default max thread stack size is 1MB
  • Note that SWAP usage along with GC is a common source of problem
  • NIO’s memory is mostly in Kernal memory’s System area and PageCache area

GC roots

  • referred by local variables and operand stacks in jvm stack frame
  • method area, referred by class static
  • method area, referred by constant
  • local method stack, referred by JNI

Copying GC

  • Use only half of space, gc, and then move the survior into the other region, so that the other region has no frag
  • Often used by new gen,e.g., Serial, ParNew, Parrallel Scanvenge (at new gen), Serial old (for old gen), because new gen has less surviors and more garbages than older gen
  • New gen often uses ParNew

Compacting GC

  • move live objects toward one end of the heap. Note this idea is similar to Copying GC but NOT same!
  • may use a table of object handles, so that only handle -> actual actual address needs updated during copyign
  • Good for old gen where relativly less garbages exist,i.e., less need to move
  • Used in Mark-compact, which is common for old gens

Mark and Sweep GC

  • Separate of mark phase and sweep stage (and potentially compact stage to defrag), often used for old gen. Will stop the world and generate memory fragments
  • For each unreferenced objects, need to also mark if the finalizer has been run or not. The sweep phase will include finalization. Therefore, object may be resurrected by finalizer which makes the object reachable, and GC can not decide if change the object back to reachable to unreachable until after all resurrectable objects’ finalizers are run
  • Parallel Scanvenge at old gen, Parallel old,

Method area gc

Decide if constant and types are still in use. Need to satify ALL of

  • all instance are gced
  • the ClassLoader for this type is gced. Hence, types (not its instances!) loaded by the bootstrap classloader is never gced
  • java.lang.Class object for that type is not referenced/unreachable at all

Generational GC

The young generation is divided into Eden/From Survivor(S0)/To Survivor(S1). Size ratio 8:1:1, default to 1/15 of the heap size

  • The majority of newly created objects are located in the Eden space. Use copying GC
  • After a GC in the Eden space, the objects are piled up into the Survivor space, where other surviving objects already exist.
  • If (Eden + From Surivior) obj size is less than To Survivor space, move them to To Surivior, otherwise, move them to old gen directly. increase these To Survivor object age by 1, when it hits age 15, they enter old gen
  • After minor GC, if Eden is not enough to for the new object, the new object goes to old gen directly
  • To Surivior half size, and with equal age, any objects older than this age enter the old gen

  • when doing minor GC, will determin if old gen’s max continious space > total object sizes in the new gen, this is ONE of the steps of deciding if we should do full gc instead
  • For high throughput requirement, more likely to have bigger young gen and smaller old gens, so that most short term objects can be gced ASAP

Bump the pointer(BP) vs TLABs (Thread-Local Allocation Buffers)

  • BP just keep track of top pointer in Eden, and see if the new object can fit it in before updating the pointer
  • But in a multi-thread case, BP will cause lock contention. Therefore, TLAB gives each thread a small, thread-local space in Eden

Old gen obj refering to young gen

  • Use a 512 byte card table to record which old gen objects are referring to young gen
  • managed by a write barrier

Train algorithm

  • each gen divided into blocks of same size
  • each gc runs only on one block
  • for objects promoted to older gen, they are added to the tail of the older gen’s train

InnoDB lock behavior under repeatable read

2018-11-19 00:00:00 +0000

  • By default InnoDB is on repeatable read isolation level.
  • S - shared locks, X - exclusive locks
  • The actual lock is on the clustered index, in most cases this means the PK index.
  • A consistent read(CR) is done by mult-versioning and reading a snaphost, i.e., NO lock involved
  • Gap lock(GL) exists to prevent other txns from inserting into the gap. Therefore, conflicting GL can co-exist, and GL is a concept applied to cluster index
  • S, X locks are row level locks (UPDATE/DELETE), but LOCK TABLE..READ and LOCK TABLE …WRITE are on full table lock

Suppose we have a table t like this

Id Val
2 200
4 400
SELECT * FROM t WHERE Val = 200 -- CR
SELECT * FROM t WHERE Val > 200 -- CR

---- S lock on all rows, GL covering all intervals in (-INF, INF)
SELECT * FROM t WHERE Val = 200 LOCK IN SHARE MOD
---- S lock on all rows, CL covering all intervals in (-INF, INF)
SELECT * FROM t WHERE Val > 200 LOCK IN SHARE MOD 

---- X lock on all rows, gap lock covering all intervals in (-INF, INF)
SELECT * FROM t WHERE Val = 200 FOR UPDATE 
---- X lock on all rows, gap lock covering all intervals in (-INF, INF)
SELECT * FROM t WHERE Val > 200 FOR UPDATE 

SELECT * FROM t WHERE Id = 2 -- CR
SELECT * FROM t WHERE Id > 2 -- CR

-- S on the row, NO gap lock or next-key lock
SELECT * FROM t WHERE Id = 2 LOCK IN SHARE MODE 
-- S on the row, and GL on (2, 4], (5, +INF)
SELECT * FROM t WHERE Id > 2 LOCK IN SHARE MODE 

-- X on the row
SELECT * FROM t WHERE Id = 2 FOR UPDATE 
-- X on the row, and GL on (2, 4], (5, +INF)
SELECT * FROM t WHERE Id > 2 FOR UPDATE 

Similarly, for UPDATE and DELETE

-- record lock because it hits the clustered index
UPDATE t_user set ago = 10 where uid = 1; 

-- table lock(gap lock, X next-key lock) because it does not hit clustered index
UPDATE t_user set age = 10 where uid != 1; 

For insert, will use X on the inserted record, but will not lock the range BEFORE this record. At the same time, it will add an insert intention lock, which does NOT block the gap or different key inserting into the same gap, as long as not the same postion within the gap

INSERT INTO t values (20);
commit;

--txn A
INSERT INTO t values(11)

--txn B
--IIK will be in effect, and B will not be blocked
INSERT INTO t values(12) 

Note that concurrent insert is different from concurrent select for update + insert. The gap lock will start taking effect!

Generic, covariance, and contravariance

2018-11-05 00:00:00 +0000

  • a <: b means a is a subtype of b
  • The overall goal of covariance and contravariance is to reduce the case where type check passed but runtime gives typing error.
  • The main use case is when the type is in a higher-order function or generic collections.
    • Adding a type to a generic collection is somewhat similar to the function call in typing rules, i.e., the generic type is the parameter type
    • Reading from a generic colleciton is somewhat similar to the function return in typing rules, i.e., The generic type beais the returned type

Covariance

  • Return types are covariants for function subtyping, that is, if tsub <: tsuper then t -> tsub <: t -> tsuper
  • Use to mark the upper bound of generic (at super).
    • Required when we return a generic collection, so that we have a guarantee that we can get super at least from collection
    • No guarantee on what we can write, because we don’t know the concretely underlying type, which will be casted to on read.
  • Class and types are different! Class defines an object’s behavior, a type describes what fields an object has and what messages it can respond to. Subtyping is a question of substitutsubbility and what we want to flag as a type error

Contravariance

  • Argument types are contravariant for function subtyping, that is, if tsub <: tsuper, then tsuper -> t <: tsub -> t
    • Suppose you wrap your tsuper -> t type in a collection or a higher-order function. If you pass a tsub -> t type, then during execution, the underlying type always expects passing in tsub. However, because the signature is tsuper-> t, we can still pass the tsuper to the concrete tsub-> t function without violating type check => contradiction!
  • Similar reasoning on why in Java you can’t pass List to a method accepting List, because the method will add an actual Object to the underlying List
    • Note that in the source they are all stored in an object array, and cast to the type on read
    • We can’t pass List to List functions either, due to the covariance rule
  • Use to mark the lower bound of generic (at subtype). This means we are 100% safe to add only the subtype and its subtypes to the generic collections

On Handling Silence

2018-11-02 00:00:00 +0000

Source

  • The subtle power of uncomfortable silences
  • How to Handle Silence, the Worst Kind of Feedback

  • Comforatble silence margin: 3-4 seconds. Also, use open-ended question instead of letting people make a choice. Note that we often feel the slience longer than it actually is.
  • Uncomfortable silence actually helps the buyer guide the conversation
  • If you are losing the room, be slient for 3-5 secs, and let people zoom out. Once they are back, touch base again.
  • Pause 3-5 before and after the important points to improve the impact of critical points - try not to ramble! Similarly, before starting the presentation, stay silent for 3-5 seconds and look around the audience
  • During listening, if you can’t resist on thinking what you want to say instead of actual listening, focus on staying slient.
  • If the slience approaching 1 min, offer to come back to the point later and move on.

On select(), poll(), and epoll()

2018-11-02 00:00:00 +0000

  • select() requires the kernel to iterate through all FDs passed to the select() call, check their state and register CBs. When an event on any FD happend, kernel has to iterate through these FDs again to deregister CB. Note that every time you get a new connection with accept() system call, you will get a new FD
  • Linux uses epoll_ctl() to manage the registration. Call epoll_wait to wait for updates about the list of files you’re interested in.
  • Level-triggered epoll() inherits from the “thundering herd” semantics form select() - get list of every FDs you are interested that is operatable. This means everyone will wake up from epoll_wait() and try accept() - all but one will get EAGAIN (Resource temporarily unavailable)!
  • In edge-triggered, only 1 thread is triggered, but the threads may wake up unneccearily only to get EAGAIN, because on awake, it does not/can not know how many connections kernel received originally - use level-triggered + EPOLLEXCLUSIVE to wake up 1 thread at a time
  • In the case of writing to FD with level triggered, your thread will get constant notification that FD is ready, even though you may not be able to write yet because your resource to FD is not available yet.
  • In the case of writing to FD with edge triggered, you get triggered by FD is ready ONCE and write at much as you want until your write(2), and then you wait for the next signal
  • Reading from FD with level-triggered is useful when you don’t want to consume all data at once and want epoll to keep triggring while data is available. However, for performance, probably want to use edge-triggered with all data buffered and will be eventually handled
  • Poll() is very similar to select() and returns more status,i.e., it also requires the full list of FDs to watch on each invocation
  • Poll() can perform better than select() if you have a sparse set of FDs you are interested, because select() accepts a RANGE of FDs and bitsets for FDs you want for reads/writes/exceptions

  • It is also called IO multiplexing because one thread can watch over the change of multiple FDs, i.e., the processing thread itself is reused.
  • Note that call to select() and poll() are still blocking