Notes on Fescar

2019-03-04 00:00:00 +0000

Fescar is derived from GTS (global transaction service), which has been on AliCloud since 2016. Right now it is at v0.2, and, according to their plan, still 2 major releases from GA

What problems do existing solutions have?

XA is the most common non-intrusive solution. However,

Need to be MySQL 5.7+
Long lock of txn resources
XA implmentations require heavyweight application server,.e.g, WebLogic/WebSphere

Common intrusive solutions has high dev costs:

reliable message delivery
TCC
Saga

What happens when you want to issue a new global txn?

Modules

Transaction Coordinator (TC) maintains the state of gloal txns.
Transaction Manager (TM) sends open/commit/rollback txn requests to TC
Resource manager (RM) maintains the branch txns

TM sends the “open new global txn” request to TC. TC will generate a globally unique id, XID.
XID will be passed along the microservice trace
RM will register the branch txn with the same XID on TC
TM will start the global commit or rollback
TC will commit/rollback all branch txns under the same XID

The flow quite similar to XA’s, the key differences are:

XA’s RM is on the DB layer. This RM is on the application layer
Fescar commits directly during phase 1: it is not strictly 2PC! See below for rollback discussion

How is global rollback implemented?

RM writes the rollback log including the data before and after, and will commit the data update and rollback log in the same local txn.

When RM receives the rollback request from TC, it finds the local rollback log by XID and branch ID included in the rollback request, and execute the reversed SQL.

What is the isolation level?

TC will maintain a global exculsive write lock, by default it is read UNCOMMITED.

What if my resource does not support transaction natively?

Use the manual transaction(MT) mode. You need to implement your own prepre-commit-rollback APIs for 2PC.

Codeforces Notes

2019-03-04 00:00:00 +0000

220B: Little Elephant and Array

Idea 1: offline band sum

First, we need to process all queries offline, and order them by the end index so that we scan from left to right by the end index.

We need to maintain the band start where a number x has exactly x times from the start to the current end. To update the band quickly, use the band-to-delta technique,i.e.,maintain a +1 at the start of the end, and -1 at the end of the band, overall contribution is the sum over the band.

This means, for each entry we see:

if cnt[x] LT X, do nothing
if cnt[x] GTE X, delta[previous X times occurance ]++ => start the current valid gap
if cnt[x] GTE X + 1, set delta[previous x-1 times occurance]-= 2 => potential end of the current gap, and cancel the end of the previous gap
if cnt[x] GTE X + 1, set delta[previous x + 1 times occurance]– => cancel the start of the previous gap

Idea 2: online brute force

Claim: if we just consider numbers cnt(x) >= x, such potential candidates count LTE 2 * sqrt(n)!!!

Proof: If x > sqrt(n), then we know we can not have more than sqrt(n) distinct x. If x LT sqrt(n), then we have at most sqrt(n) distinct choices of x

So we can just brute force calucalte the number of appearances and answer the queries online

219D: Choosing Capital for Treeland

Set orientiation as 1 as the root

For other nodes, we calculate the following things:

in-cost, min cost to convert when every node is reachable is from the parent, this can be done by one single DFS
fix parent cost: # of in-direction edges from 1 to v
reverse parent cost: # of reverse direction dedges from v to 1

For each candidate, the cost = cost to fix orientation 1 - fix parent cost + reverse parent cost

439D: Devu and his Brother

precalc how many effort to update the min/max/values to the next value,
bsearch on effort.
Given the effort, we bsearch on the target number, and see if the total effort LTE effort.

472D: Design Tutorial: Inverse the Problem

Think Kruskal’s MST theorem. For each node, we know its closest neighbor must be a single edge. So we can devise a similar reverse-MST algorithm, and then DFS on each node to verify the construction

InnoDB deadlock cases

2019-02-25 00:00:00 +0000

Base case

CREATE TABLE t (
id int(10) primary key
)engine=innodb

insert into t values(1);
insert into t values(3);
insert into t values(10);

Gap lock

--txn A
set session autocommit=0;
delete from t where id = 5;

--txn B
insert into t values(0); -- proceed
insert into t values(2); -- proceed
insert into t values(7); -- in conflict with gap lock (3, 10], blocked

In show engine innodb status, it will be lock_mode x locks gap

S + X

--txn A
set session autocommit=0;
insert into t values(7); 

--txn B
insert into t values(7);

Sequence of actions:

A acquires S on id = 7
B acquires S on id = 7
A tries to acquire X on id = 7, but blocked by B
B tries to acquire X, but blocked by A

shared gap lock

--txn A
delete from t where id = 6
insert into t values(5)

--txn B
delete from t where id = 7;
insert into t values(8)

Sequence of actions:

A acquires shared gap lock on (3, 10] on delete
B acquires S gap lock on (3, 10] on delete
A wants X gap lock on (3, 10], blocked
B wants X gap lock on (3, 10], blocked

Common solutions to DL

acquirng/relase in the same order
timed wait, and then release if not able to acquire (defaults to 50s in mysql)

Codeforces Notes

2019-02-19 00:00:00 +0000

113B: Petr#

find all S(begin) and S(ends)
for each i..j, calculate its rolling hash, add it to the corresponding map hash -> (start, end), if i is in S(begin), and j is in S(end)
for each entry in the map, add the size of the entry - # of duplicates to the final answer

633C: Spy Syndrome 2

reverse the whole sentence
caluclate the rolling hash of each word in the dictionary
DP on the suffix size. The overall run time will be O(n), because for each check, we reduce the overall problem size by 1,because we search for the FIRST matching one

965C: Greedy Arkady

bsearch on the times A gets, note that D is no more than 1000, we can brute force on all possible Ds, and find the max possible factor A gets
k * D * x >= n, K * (D - 1) * x <= n - x, also, x <= M
The target function is D * x

448D: Multiplication Table

bsearch on answer, at least iteration we can afford an O(n) iteration, i.e., just calculate through all rows to see if number < the target number matches, also keep track of equal coutns

Never Split the Difference

2019-02-07 00:00:00 +0000

Try NOT to speed up process to get a “Yes”, very easy to leave the impression that you are ignoring your counterpart.
Avoid direct/assertive voice - easy to put people into defensive mode. Use a positive/playful voice by default
Try to trigger “No” from counter part, so that the other part feels safe. You can force a “No” by mislabelling
- If very hard for the counterpart to say “No”, just walk away, it is too early
When you talk about numbers, use precise, odd ones to so that it APPEARS to be thought after
SLOW DOWN
NEVER use “I understand”
Liar is more likely to use longer sentences, third-party pronoun, and more complex sentences
Use the other side’s own name - they want to be called!
The more I, me, my one uses, the less impact one has in decision making. Other questions to identify deal-breakers:
- How does this affect the rest of them?
- How on board are people with this call?
- What do your team see as the main challenge in this area?
Labelling, which works well against analyst such as myself
- It seems/looks/sounds like… DO NOT use I, otheriwse it looks like you are self-centered!
- pause 3-4 secs
- Rinse and repeat
- I didn’t say that is the actual case, I just said it seems that.
Anger shows your passion, but reduces your counterpart’s congintive activity due to the emotional mode. You can de-escalate by proposing a time-out. Try to calm emotion before listing theories and facts
Something doesn’t make sense means chance to discover unknowns! Best discovered during face-to-face communications, near breaks or meeting ends. Hard to discover during emails
The word “fair” is often used so the other side goes defensive and gives concessions. Common cases:
- We just want what is fair -> OK I am sorry, let us get back to where the fair part begins
- We have given you a fair offer -> Mirror: Fair? + Label: It seems that you have evidence to support it.
- Powerful: Stop me at any time if you feel I am being unfair
List the worst thing the other side could say and say them first, this will encourage the counterpart to say what the opposite is true
When you sell yourself to the manager, sell yourself as ways to prove their own intelligence and that they can broadcast to the rest of the company, i.e., you are the proof of HIS importance, let him have a stake in your success
Power Sales Negotiators know that splitting the difference does not mean splitting it down the middle. Just split the difference twice and the split becomes 75 percent/25 percent. Furthermore, you may be able to get the other party to split the difference three or more times.
The first thing to remember is that you should never offer to split the difference yourself, but always encourage the other person to offer to split the difference.

How to negotiate with an accommodator

Time spend communicating is a sign of building relationship, which they value
Ask what/how questions. This lets the counterpart feel you need their intelligence to overcome the problem and they have the control on defining success. Such feeling appeals much to people with egos. It also makes people feel it is THEIR idea.
- Avoid binary questions
- Avoid when/where questions which can be answered without too much thinking
- Avoid why questions, because it is easy to trigger people into fight or flight mode
- Concretely, AFTER they agree with something for the first time. Label and summarize their answer to get a “that’s right”, and follow up with a what/how questions
  - “What do you see as being the most difficult thing”
  - “What is the biggest challenge you face?”
  - “What is the success measure”
  - “how will we know we are on track”
  - What about this is important to you?
  - How can I help to make this better for us?
  - How would you like me to proceed?
  - What is it that brought us into this situation?
- How can we solve this problem?
- What’s the objective? / What are we trying to accomplish here?
- How am I supposed to do that?
  - “how do we address things if we are off-track”
Communicating is good sign, and slience suggests angry
Easy to be distracted and not good time manager
Hesitation comes more often in tone and body language, and NOT in words
Accommodators are more likely to oversell/overpromise.

How to negotiate with an assertive

Analytical is the worst type match with assertives. Analyst type does not enjoy “What”/”How” questions.
To assertives, getting things done is more important than making it perfect.
They need to think you understand them before listening to your pov
They will keep talking on every moment of slience. This leaves the room for mirroring technique
- Use a slow, calm voice to create authority and trustworthiness. Note this voice is normally used to make a point and should NOT be the default voice
- “I’m sorry…”
- Repeat the last 3 words or the last critical 1-3 words of what the assertive just said
- Keep slient and let the other side speak. This also helps reveal their strategy
- Rinse and repeat if needed
Try NOT to offer trade-offs/reciprocity, because assertives will ask for more. Conversely, if they give up something, they expect something equal or more in return.
Get “That’s right” from them, because that is a sign they FEEL they own the conclusion.
- Can follow up with a “According to $(name)…”
- On the other hand, “you are right” means almost nothing, and should not definitely taken as sign of progress
- I will try. You are right = I plan to fail. In this case, drill again with a “how” when you get a “that’s right”
When the negotiation is going off-track/losing focus, just invite them to re-engage

On ThreadPoolExecutor

2019-01-30 00:00:00 +0000

NOT recommend to create via Executors static factory, instead, use ThreadPoolExecutor. Recommend to use newCacheThreadExecutor, defaults to ThreadPoolExecutor
Note that in general try NOT to create new threads explicitly. Let TP provide one.
To handle exception in the worker
- try-catch inside the runnable
- submit() to execute, Future.get() to handle ex
- TPE.afterExecute()
- pass in Thread.UncaughtExceptionHandler
Constructor params
- corePoolSize
- long keepAliveTime
  - When # of threads > cores, idle threads will wait this long before got terminated
  - Defaults to 60s
- maximumPoolSzie, i.e., we can use it to set constant size TP.
- workQueue
  - Capacity defaults to Integer.MAX_VALUE!
  - Insertion is done by BlockingQueue.offer(), which will not throw exception or block.
- threadFactory
- RejectedExecutionHandler handler - when out of thread range or queue capacity
The state of TPE,i.e., highest 3 bits of the atomic ctl value
- RUNNING
- SHUTDOWN: shutdown(), no new task will be accepted, but will finish executing the jobs in the task queue
- STOP: shutdownNow(), no new task, not executing tasks in the q
- TIDYING: all tasks completed, will execute terminated()
- TERMINIATED: after termiated() is run
submit() returns Future vs execute() returns nothing

Building pieces of the TPE

class Worker extends AbstractQueuedSynchronizer implements Runnable
{
       Runnable firstTask;
       final Thread thread;
       volatile long completedTasks;
}

thread is created in the worker’s ctor, by the thread factory passed in
Worker is a nested class of the TPE. In fact, TPE has a reference to Worker too

class FutureTask{
        Callable callable;
        private volatile int state;
        private Object outcome; // non-volatile, protected by state reads/writes
        private volatile Thread runner;
}

Future<?> submit(Runnable task) {
        RunnableFuture<Void> tftask = new Futuretask<T>(runnable,  null); //uses Executors.callable(runanble, null)
        execute(tfask);
        return ftask;
}

runner is CASed as the Thread.currentThread() from null. This also acts as the mutex check
callable is turned from runnable by the Executor. Callable returns result and may throw exceptions

TPE.execute()

void execute(Runnable command) {
        int c = ctl.get();

        if(wokerCountOf(c) < corePoolsize) {
                if(addWorker(command, true))
                        return;
                c = ctil.get();
        }

        if(isRunning (c) && workQueue.offer(command)){ //added to the blocking queue successfully
                int recheck = ctl.get();
                if(!isRunning (recheck) && remove(command))
                        reject(command);
                else if (wokerCountOf(recheck) == 0)
                        addWorker(null, false);
        } else if (!addWorker(command, false)){
                        reject(command);
        }
}

TPE.execute() is the interface to outside, i.e., directly called by submit()
If TPE is running, and we can add the command to the task q. Note that we have to double check the state again because the offer operation is not an atomic step of isRunning and try removing it it. Similarly, the two else if logics are quite similar because we essentialy performed the check twice.
If isRunning try adding the worker. Note that addWorker side checks the TPE state, so when the TPE is not running, the new task will not be run

TPE.runWorker(Worker w):

while (task != null || (task = getTask()) != null) {
    w.lock();
    if ((runStateAtLeast(ctl.get(), STOP) ||
         (Thread.interrupted() &&
          runStateAtLeast(ctl.get(), STOP))) &&
        !wt.isInterrupted())
        wt.interrupt();
    try {
        beforeExecute(wt, task);
        Throwable thrown = null;
        try {
            task.run();
        } catch (Throwable x) {
            thrown = x; throw new Error(x);
        } finally {
            afterExecute(task, thrown);
        }
    } finally {
        task = null;
        w.completedTasks++;
        w.unlock();
    }
}

This is called from worker and, by extension, worker’s thread since worker is an inner class
Standard lock() and then unlock() inside finally
getTask() checks if it can poll a task with keepAliveTime, if it couldn’t work, it will return a null. This is how the non-core threads are terminated

Shutting down TPE

void shutdown() {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try{
                checkShutdownAccess();
                advanceRunState(SHUTDOWN);//CAS in a loop to set state. No actual interruption
                interruptIdleWorkers();
                onShutdown();
        }finally{
                mainLock.unlock();
        }

        tryTerminate();

}

List<Runnable> shutdownNow() {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try{
                checkShutdownAccess();
                advanceRunState(STOP);
                interruptWorkers(); //uses Thread.interrupt(), which just sets the mark, needs Thread.interrupted() inside that thread
                tasks = drainQueue();
        }finally{
                mainLock.unlock();
        }

        tryTerminate();
        return tasks;
}

threadPool.shutdown(); //have to wait for the termination explicitly
while(!threadPool.awaitTermination(60, TimeUnit.SECONDS)){
        //busy waiting here, most likely you need to add the max retry count
}

FutureTask.run()

if (state != NEW ||
    !UNSAFE.compareAndSwapObject(this, runnerOffset,
                                 null, Thread.currentThread()))
    return;
try {
    Callable<V> c = callable;
    if (c != null && state == NEW) {
        V result;
        boolean ran;
        try {
            result = c.call();
            ran = true;
        } catch (Throwable ex) {
            result = null;
            ran = false;
            setException(ex);
        }
        if (ran)
            set(result);
    }
} finally {
    // runner must be non-null until state is settled to
    // prevent concurrent calls to run()
    runner = null;
    // state must be re-read after nulling runner to prevent
    // leaked interrupts
    int s = state;
    if (s >= INTERRUPTING)
        handlePossibleCancellationInterrupt(s);
}

CAS at the start uses current thread id as the optimistic lock. Hence, runner is set to null in the finally

On CMS GC

2019-01-28 00:00:00 +0000

Commonly used in old gen GC. New gen uses ParNew

Heap size

normally 2-6 G, higher than that becomes much more debatable.
common to set min heap size same as max heap size to reduce OS scheduling cost

Goal: shortest stop the world time.

tag objects directed referenced by GC roots
Remark: fix changed introduced during the concurrent tag stage
By default GC thread 3 = (CPU + 3) /4
Still need to mark & sweep for old ge
Old gen will have frag problem that may trigger full gc too much
New gen GC strategy is same as ParNew

STW, init mark: mark old gen objects DIRECTLY reachable from the GC root, and old gen objects referred by the new gen.
Concurrent mark: use GC root tracing to tag all garbages, user thread still running
Pre-sweep: tag old gen survival objects
Interruptable pre-sweep
STW: retag, scan GC roots + new gen + old gen objects with dirty cards
concurrent sweep: mark unreachable object
concrrent reset: reset gc state and enter the next GC cycle

Manage oneself

2019-01-21 00:00:00 +0000

Reading notes on the essay of same name by Peter Drucker

Most self-evaluation on one’s own strength and weakness are not reliable - has to go through proper try-feedback loop over some time, AT LEAST couple of months
Focusing on improving strength instead of weakness
Things need to figure out on people, again, NOT by self-evaluation:
Reader or listener, i.e., needs preparation or on the spot
Work with people or alone?
Advisor or decision maker?
Under stress or predicative environment?
Big corporation or small one?
When short term results are in conflict with long-term growths. A company’s value will determine the priority
A Plan > 18 months is hard to stay clear and specific
The key to up-management is to understand in which your boss works in the most efficient way, and adjust to that
Communicate with your 360 clearly:
- This is what I am good at.
- This is how I work.
- These are my values.
- This is the contribution I plan to concentrate on and the results I should be expected to deliver
- And ask them the same thing (NOT via plain self-evaluation though!)
The existence of trust between people does not necessarily mean that they like one another. It means that they understand one another
Don’t try hard to change people’s behaviors

(Un)reliability of timing code

2019-01-20 00:00:00 +0000

Suppose we have code

long epochBefore = currentTime()
//A
remoteSlowCall()
//B
long epochAfter = currentTime()

long callTime = epochAfter - epochBefore
//C

Suppose we have full GC happened at A or B, then callTime is much longer than the actual cost! For the same reason, in production, STW is a common cause for read timeout
For the similar reason, with epochBefore, use only epochAfter at C is not reliable either, because it includes GC time! Therefore, we need to record BOTH epochBefore and epochAfter to perform sanity check and detect if gc possibly happened, and extend potential expiry time/time out to take GC time into consideration

On red black tree

2019-01-20 00:00:00 +0000

Invariants

Root node is black, leaf nodes are EMPTY black nodes
Parent and child can NOT be both R
From any node, the paths to ALL leaves pass through same # of B

Insertion

Use the standard BST way to insert a R leaf node, and then try to fix the invariant
Case 1: P is B, we are good, because the old paths to the P can not extened to the current node (the new leaf that replaced the old P), with no additional cost
Case 2: P is R, and sibling of P is R too
Paint U and P B, and G R
RECURSIVELY fix the invariants with G (now R) as the leaf param
Case 3: P is R, and U is B or U does not exist
if necessary, left rotate or right rotate P so that C is on the “outside” of the G, i.e., it is not the left-right or right-left of G. No color change at this step
Then rotate the grandparent to make C the new grand parent, and paind the old G red, C black

Removal

Idea similar to BST - we just find the in-order predecessor or successor to the node , swap the value, and then delete that predecessor/successor node
However, the actual cases are complicated. If both P and child are B, there are SIX cases to consider

On Load Balancing

2019-01-14 00:00:00 +0000

Nginx does level 7 load balancing to the individual services based on url/service. The server behind LBs repond the request directly, because all servers will have the same VIP bound to their loopback interface
LVS does level 4 load balancing that LBs multiple Nginx instances. It uses keepalived, which is on active-standby model, to ensure LVS’s HA. Note that we don’t use keepalived on Nginx, since LVS will handle that.
- Use lvs/f5 IN FRONT OF nginx, e.g. f5 can do 100k qps, and use the same keepalived + virtual IP trick on lvs, i.e., LVS will share the same VIP. This should be enough for most cases.
- If a single LVS is not enougu, we can let DNS point to multiple virtual ips backed by the lvl 4 load balancer
All level 4 LB instances in the same cluster will have the same virtual IP and same virtual MAC address. Note that level 4 LB can easily handle 100k qps
DR mode: to change mac address on the data link layer. LB will not change ip address, but change target’s MAC address. All physical servicers and LB servers have the same virtual IP. The web server can respond to the user directly, bypassing LB
- Use ARP to broacast VIP, and machine with such IP will reply with its own MAC address, but only LB can respond to the ARP reqeust
Problem with source IP hash with lvl 4 load balancing
- When source IP is NAT gateway for large LAN
- When mobile client reconnects, hard to go back to the original server with consistent hashing

Hashmap and ConcurrentHashmap

2019-01-07 00:00:00 +0000

Hashmap

Array init size defaults to 16. Double the size when it is 75% full
Length is always a power of two, because hash%length==hash&(length-1)
- How to avoid/optimize rehash when resizing
When array length > 64 and some link list(LL) length > 8, this LL will be changed to red-black tree(RBT)
When iterating entrys, prefer EntrySet to retrive k & v at the same time.

Hashmap’s concurrency problem

When insert two elements with the same hash, while there is no existing node at the entry, in putVal()

    if ((p = tab[i = (n - 1) & hash]) == null) 
            tab[i] = newNode(hash, key, value, null);

and one node will be overwritten

When resize(), this may introduce a cycle in the link

ConcurrentHashMap

Node type

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        volatile V val;
        volatile Node<K,V> next;

//Note that Node.setValue gives UnsupportedOperationException(),i.e., Node itself is immutable
//....

Members

    /**
     * The array of bins. Lazily initialized upon first insertion.
     * Size is always a power of two. Accessed directly by iterators.
     */
    transient volatile Node<K,V>[] table;

    /**
     * The next table to use; non-null only while resizing.
     */
    private transient volatile Node<K,V>[] nextTable;

Put

Assuming the normal case, I deleted/commented/erased out some lines for the ease of reading
CHM does not allow key or value to be null

    final V putVal(K key, V value, boolean onlyIfAbsent) {
        int hash = spread(key.hashCode());
        int binCount = 0;//size in the list to decide if need to convert to RBT
        for (Node<K,V>[] tab = table;;) {
	Node<K,V> f;//current node at tbl 
	int n; //size of array, always a power of two 
	int i; //index at the tbl array
	int fh; //mod hash value
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
	    // (n-1) & hash, fast mod with 2^n, i,e. first time ever at this slot
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
//can't CAS, spin again
            }
            else if ((fh = f.hash) == MOVED)
	    //resizing is underway, let it finish first, may create a thread to speed up resizing
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {
		if (tabAt(tab, i) == f) { 
		//now that we are in mutex, first check if the node value remains valid
                        if (fh >= 0) {
                            binCount = 1;
			    //linked list upsert
                        }
                        else if (f instanceof TreeBin) {
                            binCount = 2;
			    //RBT upsert
                        }
                    }
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
		    if (oldVal != null)
			//update case. no need to resize
                        return oldVal; 
                    break; //conclude spinning 
                }
            }
        }
	//attempt to help resize in case of insert
        addCount(1L, binCount); 
        return null;
    }

get() does not use any mutex or CAS, why does it work?
- Every time we update the table, we acquire the mutex first, i.e., in effect only 1 thread updating volatiles. Therefore, reading volatiles becomes thread safe

Concurrent resizing

Inside transfer()

Mark copied node and empty node as ForwardingNode. When iterating, when we see such nodes, move to the next
Uses MIN_TRANSFER_STRIDE to lower bound the range
No new thread is spawned. It reuses threads in put().

On G1 GC

2019-01-03 00:00:00 +0000

How does G1 separate new gen and old gen?

Concepts

Collet Set(CSet): collecatable-region set. Survivors in the CSet will be moved to antoher region. LT 1% of heap size
Remembmered Set(RS): RS records Points-into,i.e., object references in other regions that use this region’s objects. It is a (region address, set of card table index) hashtable
Snapshot-At-The-Beginning (SATB): previous bitmap and next bitmap
Eviction failure (EF): new gen not able to find regions to accept survivors
jdk9 default, separate heap into same size regions. Some for new gen, and some for old gen.
- Each block can be one of Eden/Survivor/Old. Note that new gen will still do recycle/promotion when it is full.
- New gen GC is similar to ParNew, will STW and copy to survivor or old gen
Designed to handle cases where stop-the-world is unacceptable,e.g., HFT.
Humogous Object goes directly to old gen, may use mutliple continous regions to hold it

Steps (for mixed GC)

Initital Marking (IM)
- concurrently clean up next marking bitmap,
- pause all user threads, and scan and tag all DIRECTLY reachable objects from root in each region
- Put the top value to the next top at mark start(TAMS)
- resume all user threads
- Root region scan - G1 scans survivor regions, and mark all referenced objects
Concurrent Marking
- scan based on the objects found in IM
- Use remembered set logs to track what is object relationship is changed concurrently
Final marking pause
- process RSL and update RS.
- process SATB logs
Live Data Counting and Cleanup
- Note that the cleanup trigger time is based on max gc pause allowed and a few other steps

Optimization & Tuning

Goal is to avoid full gc and EF
Don’t set NewRatio yourself
Prioritize maxGCPauseMillis (default 200 ms) instead of configing others
Default Heap space reachs 45% usage will trigger concurrent

On Java Synchronized

2018-12-23 00:00:00 +0000

Synchronized

every object has a monitor lock, montiorenter and monitorexit
Synchronized static methods are synchronized on the class object of the class the synchronized static method belongs to.
Every thread has an available monitor record list, and a global available list, every locked object is assocated with a montior, and monitor has an owner section to store the thread holding the the monitor
Mark word in object header,e.g., hashcode, ago, bias, lock tag. Object header = mark word + type pointer
Klass Pointer to the class’s meta data
JVM will take note of the thread id/address of Thread pointer as the owner of the syncrhonzied object

CAS

Heavily used by atomic types in java.util.concurrent
ABA problem: the value changes in the sequence of A-B-A, so from CAS pov the value is not changed since the last change on A, but it did in fact!
By default PreBlockSpin = 10. Note that locks can only upgrade but not downgrade
LongAdder in Java 8
- If the current threads too much, it will split the addtions into a cell array, each with its own CAS
- If one failed to CAS one cell, LongAdder will try a different cell.
- The final result is the sum of base and all values from cells

Biased lock

Thread will store its id in the lock object’s mark word. When entering and leaving sync block it will not use CAS to lock/unlock, it will CAS the thread id instead. When the thread id is different from the current one:
- If biased mark is 0, means it is LW lock already, use LW’s CAS to acquire lock
- If biased mark is 1, just CAS to change thread id to current id. If this CAS failed, will ask the cancel the biased lock, and pause the thread holding the biased lock to check if the that thread is still active
Thread will not release biased lock itself. When no bytecode is running, the thread holding biased lock checks if the lock object is locked or not, and will cancel biased lock to restore to no-lock(01) or lightweight lock (00), depending on the answer

Lightweight lock

Other thread will spin to try acquiring the lock but will not block
if the sync obj lock is 01, jvm will setup a Lock Record in the stack frame to store the lock object’s Mark Word, and then use CAS to change object’s mark word to pointer to the Lock Record, and set Lock Record’s owner to object’s Mark Word, if good, then Mark Word’s lock status is 00
if update failed, jvm will check object’s mark word is pointing to current thread’s stack frame, if only 1 waiting, the thread will just spin, if the thread spins too many times, or one thread is hold, one spinning, and a third coming, LWL will upgrade to HWL
When release, try CAS to replace Mark Word back into object header. if failed, means there is contention with the lock, release the lock, upgrade to HWL, and wake up blocked threads

Heavyweight lock

When upgrading to heavy weight lock, lock status changed to “10”, Mark Word stores pointer to HWL, all waiting threads wil block
The synched object will point to the created monitor object, i.e., the monitor inside the object, which in turns based on MutexLock from OS - blocking and waking up requiring OS, i.e., from user mode to kernel mode (hence “heavy”)

On JVM Class Loading

2018-12-23 00:00:00 +0000

Class load sequence of actions

load - done by the class loader
- get byte stream via FQTN
- transform byte stream to metaspace runtime data structure
- create java.lang.Class for this type
Linking - Verification, including file format, meta data, byte code, symbol reference
Linking - Prep
- allocate memories for class variable and assign init values inside metaspace
- However for final value, at this same it will be that value already
Linking - Analyze/Resolution. Note this step can be delayed after init
- Compiler generate symbolic ref including class name, class method name, params type, and return types
- the class does not even know its own method and string’s address yet. Need to replace symbolic refrence with direct reference
Init
- execute type constructor clinit() method
- () is to construct types: which has static code block or static assign value
- () method is to execute instances, which runs () first
- Init of class variables will be in the order of declaration

Types of class loader

Bootstrap ClassLoader: load JAVA_HOME/lib and boot classpath JAVA_HOME/lib/rt/jar
Extension ClassLoader: load JAVA_HOME/libext and java.ext.dirs. Parent is Bootstrap CL
Application/System ClassLoader: classpath in the normal sense. Parent is Extension CL. Loads from ENV VAR and class.path
User ClassLoader: inherits form Application ClassLoader. Note classes loaded by this CL can be unloaded, but not the other 3

Parent delegation model for loadClass()

See if the requested type has already been loaded into this class loader’s namespace (via findLoadedClass()). If so, return the Class instance for that already-loaded type.
Otherwise, delegate to this class loader’s parent loader. If the parent returns a Class instance, return that same Class instance.
Otherwise, invoke findClass(), which should attempt to locate or produce an array of bytes that define the desired type. If successful, findClass() should pass those bytes to defineClass(), which will attempt to import the type and return a Class instance. If findClass() returns a Class instance, loadClass() returns that same Class instance.
Otherwise, findClass() completes abruptly with some exception, and loadClass() completes abruptly with the same exception.
When the loadClass() method of a class loader successfully loads a type, it returns a java.lang.Class object to represent the newly loaded type.
Note that tomcat does not really follow this model

Custum class loader

Types loaded through the bootstrap class loader will always be reachable and never be unloaded. Only types loaded through user-defined class loaders can become unreachable and be unloaded by the virtual machine. A type is unreachable if its Class instance is found to be unreachable through the normal process of garbage collecting the heap.
Use case:
- Byte encrption: use tool to encrypt class file. when running, use customized ClassLoader to decrypt file content and then load the decrypted bytecode
- hot load class file
- load class at specific location, e.g., URL
to uniquely identify a type loaded into JVM requires the fully qualified name and the defining class loader. Every Class object has reference to its class loader

When will java init the class?

create class instance type
use class/interfaces’ static val/method. Note that before that the interface of the class will not be inited. Note that this class’s instanct variable types will not be inited
Class.forName(). Note that Classloader.loadClass() will not init the class immediately
init a class’s subclass
Tagged during JVM’s start

On OKR

2018-12-12 00:00:00 +0000

Objective

In a single sentence, qualitative goal
Should be inspirational, not conform to status quo or describe business as usual
the goal should be time bound, but not too long
Try setting only one

Key Results

most often three
quantitative metrics to meature if objective has been reached by the end of the period, should include evidence of completion
difficult but not impossible, i.e., 50% of failing, as if it is truly random
something happened because of something you did => describe impacts instead of activities
can be time-related if not by default end of the whole period
Don’t give different weights to key results even though they contribute different level of values=> it is about tendency, not perfection
much of the value in OKRs comes from the conversation on what matters
baking OKR into weekly meetings and weekly status mail
DO NOT CHANCE OKR, let it fail and set better goal next time
If you can’t keep on track longer than a week. Then you are not ready for OKR yet. This is especailly true for startups. OKR should be reviewed weekly or biweekly to ensure focus and progress
Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes
rather than using a single number, the best assessment is usually a set of measurements. By choosing multiple metrics, we can design a solution without the unintended consequences that occur when optimizing for a narrow objective.

From YC

that the closer to an individual level you get the less useful OKRs are. The backward process of, “we know what we are going to do, how to we make it fit the OKR formula”
Spotify decided to stop using OKRs - they grow too quickly so that OKR changes a lot
OKRs are designed to exist in levels, “trickling down” in a way that narrows them down more and more as they reach specific teams and individuals
OKRs in particular are hugely reliant on leadership defining goals for others to align with.
For a single team, OKR is used to associate task with goals, instead of vice-versa as intended

Inter-process communication mechanisms

2018-12-04 00:00:00 +0000

File IO
Signals, e.g., SIGTERM, SIGSTOP
Socket - a form of file descriptor
- Uses socket descriptor as handle. Underlying it associates with IP + port combo of LOCAL node
- a concept used mostly in the transport layer
Unix domain sockets. similar to the internet socket, but all communications happens with the kernel. Referred to by the process as inode
MQ in the OS
Pipe
- parent process creates the anonymous pipe, and let the child process inherit it
- appearts to be normal FD, but no seek capability
- In Unix, processes of the same pipe are started at the same time. The buffering ensures that no data is lost
Named pipe. appears as a file. Processes use it instead of standard streams
Shared memory, e.g., mmap file, note it is different from sendfile
- Pros: without shared memory, message,pipe,and files have to copy from user space’s buffer to kernel space’s IPC buffer, and again from kernel to user space
- In most operating systems the memory region mapped actually is the kernel’s page cache (file cache), meaning that no copies need to be created in user space
- In the IPC context, normally it is non-persist file, i.e., when the last process is finished the data is lost
- The memory-mapped approach has its cost in minor page faults—when a block of data is loaded in page cache, but is not yet mapped into the process’s virtual memory space
- Aside from the popular IPC usage, most common usage is the process loader

On Little's Law

2018-12-03 00:00:00 +0000

Num of items in the system = rate of items arriving in the system * time each item stay in the system

In terms of performance analysis, this means:

Num of executors we provide >= # of our expected throughput * # of expected latencies
Conversely, given the size of resource/thread/conneciton pools, and the average latencies we observce, we can infer our max expected throughput
Tomcat has default thread pool size of 200. Mysql’s max_connections defaults to 151.

On JVM GC

2018-11-27 00:00:00 +0000

When is full gc triggered?

Concepts

safe point: points where you can pause user thread,i.e., needed for GC
safe area: for sleep or blocked threads. In the code, reference relationship won’t change
Scavengc GC: triggered when Eden failed get more space. Eden region only
From OS pov, perm is in the jvm process’s HEAP.
Default max thread stack size is 1MB
Note that SWAP usage along with GC is a common source of problem
NIO’s memory is mostly in Kernal memory’s System area and PageCache area

GC roots

referred by local variables and operand stacks in jvm stack frame
method area, referred by class static
method area, referred by constant
local method stack, referred by JNI

Copying GC

Use only half of space, gc, and then move the survior into the other region, so that the other region has no frag
Often used by new gen,e.g., Serial, ParNew, Parrallel Scanvenge (at new gen), Serial old (for old gen), because new gen has less surviors and more garbages than older gen
New gen often uses ParNew

Compacting GC

move live objects toward one end of the heap. Note this idea is similar to Copying GC but NOT same!
may use a table of object handles, so that only handle -> actual actual address needs updated during copyign
Good for old gen where relativly less garbages exist,i.e., less need to move
Used in Mark-compact, which is common for old gens

Mark and Sweep GC

Separate of mark phase and sweep stage (and potentially compact stage to defrag), often used for old gen. Will stop the world and generate memory fragments
For each unreferenced objects, need to also mark if the finalizer has been run or not. The sweep phase will include finalization. Therefore, object may be resurrected by finalizer which makes the object reachable, and GC can not decide if change the object back to reachable to unreachable until after all resurrectable objects’ finalizers are run
Parallel Scanvenge at old gen, Parallel old,

Method area gc

Decide if constant and types are still in use. Need to satify ALL of

all instance are gced
the ClassLoader for this type is gced. Hence, types (not its instances!) loaded by the bootstrap classloader is never gced
java.lang.Class object for that type is not referenced/unreachable at all

Generational GC

The young generation is divided into Eden/From Survivor(S0)/To Survivor(S1). Size ratio 8:1:1, default to 1/15 of the heap size

The majority of newly created objects are located in the Eden space. Use copying GC
After a GC in the Eden space, the objects are piled up into the Survivor space, where other surviving objects already exist.
If (Eden + From Surivior) obj size is less than To Survivor space, move them to To Surivior, otherwise, move them to old gen directly. increase these To Survivor object age by 1, when it hits age 15, they enter old gen
After minor GC, if Eden is not enough to for the new object, the new object goes to old gen directly
To Surivior half size, and with equal age, any objects older than this age enter the old gen
when doing minor GC, will determin if old gen’s max continious space > total object sizes in the new gen, this is ONE of the steps of deciding if we should do full gc instead
For high throughput requirement, more likely to have bigger young gen and smaller old gens, so that most short term objects can be gced ASAP

Bump the pointer(BP) vs TLABs (Thread-Local Allocation Buffers)

BP just keep track of top pointer in Eden, and see if the new object can fit it in before updating the pointer
But in a multi-thread case, BP will cause lock contention. Therefore, TLAB gives each thread a small, thread-local space in Eden

Old gen obj refering to young gen

Use a 512 byte card table to record which old gen objects are referring to young gen
managed by a write barrier

Train algorithm

each gen divided into blocks of same size
each gc runs only on one block
for objects promoted to older gen, they are added to the tail of the older gen’s train

InnoDB lock behavior under repeatable read

2018-11-19 00:00:00 +0000

By default InnoDB is on repeatable read isolation level.
S - shared locks, X - exclusive locks
The actual lock is on the clustered index, in most cases this means the PK index.
A consistent read(CR) is done by mult-versioning and reading a snaphost, i.e., NO lock involved
Gap lock(GL) exists to prevent other txns from inserting into the gap. Therefore, conflicting GL can co-exist, and GL is a concept applied to cluster index
S, X locks are row level locks (UPDATE/DELETE), but LOCK TABLE..READ and LOCK TABLE …WRITE are on full table lock

Suppose we have a table t like this

Id	Val
2	200
4	400

SELECT * FROM t WHERE Val = 200 -- CR
SELECT * FROM t WHERE Val > 200 -- CR

---- S lock on all rows, GL covering all intervals in (-INF, INF)
SELECT * FROM t WHERE Val = 200 LOCK IN SHARE MOD
---- S lock on all rows, CL covering all intervals in (-INF, INF)
SELECT * FROM t WHERE Val > 200 LOCK IN SHARE MOD 

---- X lock on all rows, gap lock covering all intervals in (-INF, INF)
SELECT * FROM t WHERE Val = 200 FOR UPDATE 
---- X lock on all rows, gap lock covering all intervals in (-INF, INF)
SELECT * FROM t WHERE Val > 200 FOR UPDATE 

SELECT * FROM t WHERE Id = 2 -- CR
SELECT * FROM t WHERE Id > 2 -- CR

-- S on the row, NO gap lock or next-key lock
SELECT * FROM t WHERE Id = 2 LOCK IN SHARE MODE 
-- S on the row, and GL on (2, 4], (5, +INF)
SELECT * FROM t WHERE Id > 2 LOCK IN SHARE MODE 

-- X on the row
SELECT * FROM t WHERE Id = 2 FOR UPDATE 
-- X on the row, and GL on (2, 4], (5, +INF)
SELECT * FROM t WHERE Id > 2 FOR UPDATE 

Similarly, for UPDATE and DELETE

-- record lock because it hits the clustered index
UPDATE t_user set ago = 10 where uid = 1; 

-- table lock(gap lock, X next-key lock) because it does not hit clustered index
UPDATE t_user set age = 10 where uid != 1; 

For insert, will use X on the inserted record, but will not lock the range BEFORE this record. At the same time, it will add an insert intention lock, which does NOT block the gap or different key inserting into the same gap, as long as not the same postion within the gap

INSERT INTO t values (20);
commit;

--txn A
INSERT INTO t values(11)

--txn B
--IIK will be in effect, and B will not be blocked
INSERT INTO t values(12) 

Note that concurrent insert is different from concurrent select for update + insert. The gap lock will start taking effect!

Previous Page: 11 of 23 Next