On IO

2019-05-04 00:00:00 +0000

Sync/async vs blocking/non-blocking
- first one is about the whether the caller get the result immedately, and second one is aobut the where is caller waits
- In the context of IO, Sync often means blocking
One example Sync + non-blocking will be self-spinning, which is what selector in java NIO does. The selector itself is sync + blocking, but underlying it will be busy polling the sockets, i.e., epoll() sys call.
In fact, most non-blocking is implmented by self-spinning, but encapsulated but encapsulated as sync/blocking.
Java AIO is purely async + non-blockng, it returns immediately and you need to pass the callback function at construction
Cocurrency is the fact that multiple tasks can progress over a period of time. Parallelism means they can progress AT THE SAME TIME

Zero-copy

Note that data needs to be copied from kernel space buffer to user space buffer, and vice versa. This motivates the zero-copy techinique
- Direct buffer maps the kernel space address and user process’s address into the SAME physical memory
- mmap maps the file/object into user process’s address space. Note that we still need to copy from kernel read buffer to kernel socket buffer if we want to send to the network drive.
- Sendfile means data transfer happens only inside the kernel space, and kernel buffer’s memeory address and offset will be recorded into socket buffer. This saves the cpu copy from kernel read buffer to kernel socket buffer
- In java, MapppedByteBuffer, DirectByteBuffer, Channel-to-Channel
- In later Linux versions, use DMA to copy file content to kernel’s read buffer, and then pass the FD with data location and length info to the socket buffer, and DMA direct transfers from kernel buffer to protocol engine/NIC buffer. Note in the who process no CPU is involved

Unix IO models

Blocking IO: process uses recvfrom, which copies from kernel space to user space
Non blocking: polling to check if kernel has prepared the data yet, still via recvfrom
Event-driven: process ues sigaction, kernel will notific with SIGIO, and then process recvfrom
Io-reuse: multiple process’s IO can register on the smae channel, i.e., select(). Note that after select(), the process has to wait until at least one required data is ready and then return
Async io: aio_read with FD, buffer pointer, buffer size, after the kernel prepared the data, will copy it to the user space, and send signal

Manage energy, not time

2019-04-29 00:00:00 +0000

Energy can be renewed via rituals. Need to make them unconcious and automatic as quickly as possible. If you can completely disenage and change channel, it may take only a few minutes
Change energy depleting behavior! Reduce fight or flight case if you can! It will reduce the ability of thiking clearly
Lower energy also means harder to manage emotion. Note that not feeling positively means people will not be able to perform well
Small meals along with day, once every 3 hours
The energy cycle is between 90 to 120 mins
Express appreciation to other is also a way to regain energy. The more specific the appreciation, the better
Deep inhale and exhale, 5-6 second is effect way to defuse
To gain positivity, ask
- What the other side in this conflict would say and in what ways might it be true
- How do I view this situation in 6 month?
- How do I learn and grow from it, regardless of outcome
Try not to answer email or phone during your true focus session, check the email only during fixed time:w
Identify the most important task the NIGHT before work
To ask people to the top two work experiences where they find they are in their sweet spots. Then deconstruct them to find the motivating factor.
- However, defining value often too predictive. Instead, ask what people dislike!
Schedule relaxation and fun activities and stick to it, even if you think you don’t need it! Double the break you think need!
For every 30 mins of focused work, take 10 mins break. Note that each person has difference attention span. Identify your own!
Detailed daily plan demotivates people.

Notes on TiDB Operator

2019-04-23 00:00:00 +0000

TO = TiDB Operator TPR = (k8s) third party resource PV = persistent volume

An operator is an application-specific controller mostly to handle stateful applications. The controller watches changes to etcd (state behind k8s api server), and executes corresponding changes
Use k8s’s TPR API to encapsulate ops domain knowledge into the operator, and then inject the module to the k8s. TO to K8s is like bash to OS
We should use local volume to persist for prod, but we can use networked volume for backup. Know that local PV uses special rules for failover
- When one machine is down, the pod’s local PV may not be restored => stateful set itself is not enough
- In k8s cases, to do IOPS isolation, have to use one PV one physical disk. Bind mount does not support volume and IOPS separation
- Due to security concernts, load network PV to other nodes is dangerous,e.g., double write due to eager write
Set resource limit same as request to guarantee QoS

Debug pending pod

When a pod is pending, it means the required resources are not satisfied. The most common cases are:

CPU, memory or storage insufficient
Storage class not exist or no PV available
By default TiDB service is exposed using NodePort. You can modify it to ClusterIP which will disable access from outside of the cluster. Or modify it to LoadBalancer if the underlying Kubernetes supports this kind of service.
It’s recommended to do horizontal scaling other than vertical scaling when workload increases. Just modify the replicas of PD, TiKV and TiDB in values.yaml file.
By default the monitor data is not persistent, when the monitor pod is killed for some reason, the data will be lost. This can be avoided by specifying monitor.persistent to true in values.yaml file.

How TO works

Create a k8s config map for tidb storage. Storage’s physical resource is inside the config, e.g., machine’s IP, disk’s corresponding folder
Create a tidb-volume TPR, this resource reads the physical disk status and converts the disk resources into a TPR too
Create a volume controller to mark the usage status of the resources, VC also montiors config map’s physical resource in RT
Package the local disk resource boot instances
Create daemonset to maintain disk’s resource usage, e.g., when a tikv node is down, you need to delete and clear data.

On writing weekly report

2019-04-10 00:00:00 +0000

The presence of rigid, email-based status reports comes down to control, a lack of imagination, and a lack of trust in the organization. However, strategic assessment of the week is still needed, i.e., what is working and what is not
- Often a good candidate for 1:1

Common parts:

Date and list of stakeholders
Previous’s week’s plan
This week’s progress
Next week’s plan

Previous’s week’s plan

Quote objectives, as reminder to the management. Keep it CONCISE

This week’s progress

May order by status, keep list of completed milestones, may keep them at the front, espeically if it is ahead of schedule.
What issues and challenges did we run into? What need to be addressed?
Any part of context we need to update? Include both before value and after value
What results, not effort, did we get?
If possible, include completion percetage and scheduled completion date. Note that status (is it on track?) is more important than progress
Organize by projects, NOT employee’s tasks!
Mention projects even if it has NO progress
May use metric dashboards and other visualization tool when there are too many metrics.
Maintain a daily log on activities
Do not wait for better result. Send report in-schdule, and we can update in person later.
Focus on accompolishments than tasks! Task is for ICs!

Next week’s plan

Objectives for the next week. Should include all items from the “In Progress” section
May include important events, e.g., meetings
Milestones should be SHARP, we can even trade being measurable for sharpness
May include relationship with long term goals
Managers shall not just read, but make comments in the employee’s report. Giving employees feedback is crucially important.
Weekly report is an effective way to get credits. Also, improves your visibility
Send it 24 hours before the report

References

On measuring performance

2019-04-04 00:00:00 +0000

Getting performance data

A better qustion to ask during performance testing is, while maintaining my SLO, what is the max throughput I can achieve?
Develop test harness early, so that you can enough time to refine experienments and metrics to collect - this is a time consuming process!
One script to automate the running of the script, and another one to automate the generation of the report from the collected data
Add additional tests to ensure config change is indeed deployed
Defend against fast but incorrect implmentation. Include correctness check as non-timed part of the experiment,e.g., check data structure invariant after the timed portion
Micro benchmark is for certain aspects of the system. They are never good representation of the overall system health. Use maro-benchmark that represents the real-life workload to show the overall picture
What you measured is often response time instead of service time. Due to queueing and bufffering, customers may experience latency while you do not see it on your monitor.
Introduce a base line across the benchmarks and compare with the baseline instead of comparing with each other.
Baseline is often the optimal or state of the art solution
Do the same run twice in a row to verify unintended caching effect, and do it twice AFTER some other runs so to see the cold + hot cache case
Use neighbors of the regular/power strides and random points, so that you are very clear if you ran into corner cases or not
Run untimed warm-ups

Interpreting performance data

Many published performance nubmers measure what is optimized or the underlying system optimizes what is the measures
- Seperate calibration workload from evaluation workload(?). Otherwise, no way to show the predictive power of the model, which is the most important part of the model
Need to show clearly the trade off areas and regression performance numbers,i.e., which numbers are better, which stays same, and which becomes worse?
If subset of test is used, explain why some can be omitted. Note the omitted case may display a different trend at all
Throughput and load analysis should come together
Percetile and average metrics are very misleading, although they are good ways to tell stories. Rely on histrogram instead if possible.
- Double spikes distribution often the sign of multple behaviorial groups
- The “shoulder” of the gamma distribution often a sign of a small but separate behaviorial groups
Due to number of requests to serve a single user, even good 99.9% percentile means pretty likely your user will hit a bad latency. Max latency carries a lot of weight in showing problems.
Performance degradation easily got muffled in avg metrics. Basically averages are a good way to fool client to make them feel safe.
- Raw average must accompany with standard deviation. May also need Student’s t-test for significance
- Don’t use mean over a set of different benchmarks, even worse if they are normalized. Use geometric mean instead
- Use geometric means for multiplication problems, and harmonic mean for speeds, e.g., when speed s1 for t1 and speed s2 for t2.
Give complete results instead of ratios
Compare the exact data points instead of graphs over the same interval

On percentile

The most important thing about a metric is its distribution, and P99 does not say anything about the remaining 1% and previous 98%. Therefore, we can construct many counter-intutitive cases, and normally can not apply transitivity and composition directly

Suppose P(99,X) = 100ms, X consists of two sequential steps A and B, P(99,A) = 60ms

Can P(99,B) > 100 ms? No
If slowest 1% of A is matched with slowest 1% of B, then we can conclude P(99,B) = 40 ms
Can P(99,B) in (40, 100] ms? Yes
- suppose P(98,A) = 1ms, P(100,A) = 60ms, P(100,B) = P(98,B) = 99ms, P(1, B) = 1ms
- Then we can constuct the distribution as P1(X) = 61ms, P(99,X) = 100ms, P(100,X) = 159ms
Can P(99,B) in (0, 40) ms?
- Yes, suppose P(99,B) = 1ms, P(100,B) = 40ms, P(99,A) = P(1, A) 60ms, P(100,A) = 99ms
- Then we can construct P(99,X) = P(100,X) = 100ms, P(98, X) = 61ms

X consists of two sequential steps A and B, P(99,A) = 60ms, P(99,B) = 40ms

Can P(99,X) > 100ms?
Can P(99,X) be less than 95ms?
Yes to both cases, just use the two constructions we have in the last sections

X sends requests to M, which will batch requests into a single DB request

Can P(99,X) > P(99,M)? Yes, due to batched waits
Can P(99,M) > P(99,X)? No, the only thing we can confirm is P99 of composition >= P99 of parts

In Prometheus

Prometheus only records count and sum of buckets.
If max bucket is too small, qunatile will record only the max bucket. The end effect is a straight line
If bucket range is too big, too many data will fall into the same bucket, with the assumption that they are evenly distributed within the same bucket

References

Problems from the weekend

2019-04-01 00:00:00 +0000

exawizards2019_c: Snuke the Wizard

Note that after the each operation, the relative position of golems with the same letter remains the same. This means the golems moved out will form two continous bands ends at the left and right end respectively (may overlap). So we just bsearch on each’s end points

exawizards2019_d: Modulo Operations

The only effective sequence is strictly decreasing. Therefore, we sort the set in decreasing order and do a knapsack-ish DP. Note the value is only 1e5, so it can fit into memory.

Consider the i’s step, f(x, i) = sum of results over all permutations, the permutation can be in two cases

s[i] is the first of the permutation, i.e., this part contributes f(x %s[i], i + 1)
s[i] is NOT the first of the permutation, i.e., it can be in any of the N - i - 1 positions, each of them is in effect an i + 1, case, i.e, this part contributes (N-i -1) * f(x, i+ 1)

1105D: Kilani and the Game

Just BFS on each player in turn, don’t expand if the neighbor is already colored. Also keep track to see if everyone can progress

On PACELC

2019-03-27 00:00:00 +0000

Note that C,A, and P mean very specific things in this context.

C: linearizability among all servers’ requests, as if exeuting on a single node. If B starts after A, then B can’t see the state before A
- This definition is different from the C in ACID
- Snapshot and MVCC are by design non-linearizable
A: every request to ANY NON-FAILED node must return a response if it is successful or failure
- Node failure is outside the scope of CAP!
- Returning failure still gives A
P: network is allowed to lose any messages from one node to another.
- Note this does not mean packet loss - in the proof they use a TCP-ish protocol already. However, from a node’s perspective you can not distinguish between a failed node and partitioned node
- Retrying communication indefinitely is in essence choosing C over A.
Trivial CP system: ignore all requests, return “no result” => Not being available at all is enough to be CP
Trivial AP system: return v0 to all requests
It is possible for a system to be none of CP, AP, or CA! Most systems don’t need atomic consistency or perfect availability during P anyway.

PACELC

Network partitions(P) are too likely to happen, so CA is useless in practice, but else(E), even when P is not happening, the system should ideally have the perfect CA, i.e., the trade-off between C and latency(L).

If a system chooses to provide C over A in the presence of P, it can:

refuse to respond to some requests, e.g., don’t write in two phase commit
shutdown entirely
Only r/w if the data’s master node is inside the partitioned component

Case studies

Paxos
- AP? no when the link to the leader is cut. Note AP protocols in general need conflict resolution
- CA? because it is async model, so liveness can not be guaranteed
2PC
- CP? not consistent
- AP? not available, because we can not do any multi-node action
- Not even CA in this sense
  - What if master failed? the view is not consistent!
  - When master failed and prepared, all slaves are locked
  - A common CA solution is the full strict quorum control
Master-slave replication
- Not A - when client is partitioned from the leader, but not follower
Cassandra
- Not A - when client is partitioned from the leader, but not follower during quorum read

Codeforces Notes

2019-03-26 00:00:00 +0000

It is one of those greedy problems where you use the extreme value/boundary conditions as the starting points of optimization!!!

Suppose you have the longest LHS, then you need to right the longest RHS to minimize wait, we union the merge, and keep going until we merged all.

1032D: Barcelonian Distance

A, B can either leave, the avenue by x or y axis, so we have 4 choices on top of manhantann distance.

490D: Chocolate

First op: reduce a 2 factor Second operation: increase a 2 factor and then increase 3

So we just reduce the 3 factor to be the same, and then remove the 2 facters. Note that we can not apply the second ops to only 1 side!

993B: Open Communication

The answer has only 9 * 8 * 7 = 504 choices, for each pair, we see if both exist in the set, if there is only 1 possible pair, we know it is good, > 1 uncertain, = 0 , invalid

Notes on the Mythical Man-Month

2019-03-25 00:00:00 +0000

Chapter 1. The Tar Pit

Effort to create standalone program effort * 3 = programming system, i.e., integratedly tested with other components
Effort to create debugg progrom * 3 = programming product, which includes comprehensive tests and doc
What shipped to user is programming system product

Chapter 2. The Mythical Man-Month

Don’t mix effort with progress during estimation.
Man-month is an dangerous metric as it gives the false impression that manpower and time are interchangable
Rule of thumb
- 1/3 planning
- 1/6 coding
- 1/4 component test and early system test
- 1/4 complete system test
Hard to esitmate without hard data. Often need to have a backbone to defend again hunchs

Chapter 3. The Surgical Team

Given the 10x rule, if a 200-man project has 25 managers who are the most competent and experienced programmers, fire the 175 troops and put the managers back to programming.
Surgeon in the team settles the difference. The system should be the product of at most TWO minds to maintain conceptual integrity
Small sharp team is best for throughput but not enough output for big project
- Solution: clean separation of architecture and implementation

Chapter 4. Aristocracy, Democracy, and System Design

Design of an implementation, given the architecture, should have much design creativity. Again, need clean separation of architecture and implementation
Architecture, implementation, and realization can be done in parallel

Chapter 5. The Second-System Effect

When an architect sees an estimate he disagrees with, he can either cut the design or suggest a cheaper implementation, but since this part is builder’s responsibility, the architect should:
- just suggests and does NOT dictates!
- accept another design that meets the goal as well
- suggest QUIETLY and PRIVATELY
- FORGO credit for suggested improvement
The second system is the most dangerous due to one’s tendenacy to over-engineer and inability/not enough data points to see which part is common and which is particular. The solution is to add people who built > 2 systems.

Chapter 6. Passing the Word

Manual should be written by at most two people to keep conceptutal consistency
Explicitly define part of architecture that is not prescribed as what is
Often need both formal and prose definitions, although only one should be set as standard. “Never go to sea with two chronometers; take one or three.”
An implementation can serve as formal definition, e.g., simulator, although this approach involves great tradeoffs

Chapter 7. Why Did the Tower of Babel Fail?

The communication style is more like network rather than tree
Producer in the team is mostly for inter-team communication; technical director inside team and mostly technical
- Producer as the boss is better for a larger subtree of a big project. In this case, producer and technical director must see alike on technical philosophy. Main techinical issues must be talked out privately.
- For smaller team, producer should be the technical director’s right hand man, e.g., surgical team
- They can be the same person only when it is small 4-6 man team

Chapter 8. Calling the Shot

Do NOT estimate by estimating the coding effort and then multiply by 6. The error will be greatly magnified
Similarly, don’t extrapolate the small system/sprint effort to bigger system
Only 50% of an engineer’s time on paper is on programming and debugging in bigger projects (> hundreads of man hours), the rest is on interruptive tasks,vacation, meetings into consideration. Conversely, team often missed their schedule by one-half

Chapter 9. Ten Pounds in a Five-Pound Sack

Fostering a total-system, user-oriented attitube is among the top priorites of the programming manager. Otherwise, you may have many local optimization yet does not have positive impact on the user (common in larger projects)

Chapter 10. The Documentary Hypothesis

Project manager actuall need to care about only a small set of docs. These docs need to actively maintained:w
- Only 20% is data-based, rest is communication, and this core set of docs should be enough to fit most needs

Chapter 11. Plan to Throw One Away

Plan to retire the current design gracefully even when you are designing for the current version - you do not want to frustrate your user
May need to keep 2-3 engineers as the fire brigade
Senior people must be ready to manage groups or building things themselves
Repairing/Maintanence is an entropy increasing process (although can be slow if done carefully), while building new one is entropy decreasing
Maintanence is at least 40% of development costs
Fixing a detect has 20% - 50% to introduce a new one, because even minimized fix may have far-reaching effects

Chapter 12. Sharp Tools

One toolmaker per team is fair, but assemble is common team just for tooling often is not productive
Assign shared resources in 4-6 hour blocks, shorter than that is debatable
Build a performance simulator, outside in, top down. Start it very early. Listen when it speaks.

Chapter 13. The Whole and the Parts

Common scaffolding: dummy component, miniature file (for io formating example), tools/auxillary programs
Release quanta should be either large and wise spaced, or very small and frequently

Chapter 14. Hatching a Catastrophe

Milestones must be concrete, specific, and measurable, clearly defined.
- BAD examples are “coding”, “debugging”, “planning”
- Good exmaples are “Spec signed”, “all test cases passed”
- More importatnt to be sharp-edged and unambiguous than easily verifible by the boss
Estimate accuracy improves only AFTER project starts
- Under-estimate does not improve much until about 3 weeks before scheduled confirmation
Team should expect sharp milestons from the manager. Fuzzy milestone is a DISSERVICE to the team
Missing schedule/milestone is a moral killer! If this one is missed, try hitting the next one!
The boss must distinguish between action and status information EXPLICITLY. Do NOT act on problems when it is reviewing status time
Scheduled date from project manager, and estimated date from line manager. Project manager need to keep track of both
Preparing PERT is the job of the boss and the managers reporting to him. Maybe delegated to a separate “Plan and Control” team
The PERT chart with its frequent sharp milestones is the basis for true status, whether coorperatively
This team just gathers information and has no authority, so that line-manager can focus on making decisions

Chapter 16: No Silver Bullet—Essence and Accident in Software Engineering

Essential complexity often grows non-linearly with size
The hardest part of software is arriving at the complete and consistent spec. Much of time is on debugging that SPEC
Grow incrementally instead of building the software. IMPOSSIBLE to come up with complete spec, even with if customer working directly with the engineers
How to grow great designers?
- The best are often not the most experienced
- Conitnous mentoring with career profile carefully kept
- Apprenticeship, solo desigh, tech leadership, formal education, give these as different episodes
- Let growing designers to interact with each other

Chapter 17. “No Silver Bullet” Refined

Motivational factor increases productivity, while environmental and accidental factors does not improve productiviey, but only drags down when it is negative
The complexity often stems from the organizational malfunctions. Modelling it instead of solving the problem is highly debatable

Chapter 19. The Mythical Man-Month after 20 Years

Even the team as small as four can have separate architect and team manager. For large teams, have a recursion of architects.
Architects should guess on user attributes and frequency distributions, since a survey of target user is often too expensive, but write down the explicit guess process. Better to be explicit than to be wrong
Success factor: quality of people » organization > management » tools or technical approaches
One can move missions successfully. But in every case of attempts to move projects, the new team in fact started over, in spite of having good documentation, some well-advanced designs, and some of the people from the sending team.

Codeforces Notes

2019-03-25 00:00:00 +0000

832D: Misha, Grisha and Underground

Of the 3 nodes, which two share the most parents? We can calculate lca with binary lifting with O(logN) or with sparse tables!!!

1000D: Yet Another Problem On a Subsequence

Just DP from the head. Need to pre-compute binomial coefficent though.

965D: Single-use Stones

Bsearch on the answer, and finally, instead of answering it one by one one, we compress the paths altogether. This means, at each step, the stones we hit must be a continous band

999D: Equalize the Remainders

Calculate the current remainder on each. Start from the highest remainder, and try filling the gaps. If there is remainding ones, increase to 1 first

660D: Number of Parallelograms

sort by (slope, length)

Codeforces Notes

2019-03-22 00:00:00 +0000

961D: Pair Of Lines

Take first two points, and see if they can form a candidate line, and then the remaining dots must be on a different line

If does not work, then the third point must be on the same line as either first or second point.

To check if a, b, c are on the same line, calculate 2d CROSS PRODUCT of (b- a) * (c - a), will be 0 if (b - a) and (c - a) are colinear!!!

1012B: Chemical table

Covert a cell into an edge in bipartite graph!!!

For every path of length 3 we can create a fusion, and we can replace the fusion by reducing the total # of edges by 2.

If a component is connected, we must be able to find a path of length 3, then we can just find the longest path and keep performing the fusion, until max length of path is 2, i.e., simple components

Between simple components, we just need one additional edge to create a new component

702D: Road to Post Office

Regardless of the input value, just get the min of 4 cases: all walk, all drive, drive until the last seg and then walk, drive first leg and then walk

853B: Jury Meeting

Two pointer on k segments, on each day, we need to update, for each jury, the cheapest arrival rate (lower), the cheapest departure date (higher), is it enough the satify all requirement, what is the current cost

1118D2: Coffee and Coursework (Hard Version)

Bsaerch on the number of days, and then just take the highest (number of days) of coffee, minus the ones, not contributing

1017D: The Wu

Note that we have only 2^12 possible choices, so we can just pre-compute the answer for 2^24 = 16 mil iterations

500D: New Year Santa Network

For each edge, calculate how likely it is appear in a path between (a, b). For every delta, reduce the result by 3 * delta * probablility

405D: Toy Sum

The greedy intuition is easy, but I was stuck at proving it!!!

Claim: We can always find a solution if and only if size of X is no longer than half of S

Proof:

If n > S/2, the lower bound of X is higher than the upper bound
If n LTE S/2, Suppose X has A symmetric pairs, then the remaining possible symmetric pair numbers are s/2 - A - (n - A) = s/2 - n + 2A. If we add the constraint in, we can see that we just have enough “idle” pairs!!!

615D: Multipliers

This problem is harder that it appears!!!

What is the number of divisors d(x)? Also, f(x) = x^d(x)/2 f(a *b ) = f(b)^d(a) * f(a)^d(b)

On Salary Negotiation

2019-03-20 00:00:00 +0000

Suppose you are a hiring manager, what negotiation tactics your counterpart may use?

Make sure you negotiate. Ask for some time to think about it before making a decision
- Only AFTER they really like you, AFTER the interview. Most likely via email rather than real time
- Negotiation is an important knowledge discovery skill at work. Good way/time to demonstrate it.
Giving out current salary is debatable, giving a range is fine
Have basic idea on expected range. Do not get surprised or become unsure
Do not get emotional, e.g.,
- using another person’s salary as reference explicitly
- Avoid “I think”, “I feel”, “I need”
- Hardline on your expectation.
You can pick up (easily) $5,000 in salary negotiations. The salary part is actually not THAT huge in the total compensation of the employee
- Even if number itself can not change, you can ask for other form of concession, e.g., project assignments, travel opportunities, professional development opportunities

Common arguments

Q: “I really need a number to move the process forward.”/”This form needs a number.”
A: Focus on discovering fits and mutual interests. Giving out number, as discussed above, will compromise your position, although in the end you may have no choice
Q: “We want to figure out whether you’re an appropriate candidate for the position”
A: Then dive deep with it
Q: “I have to go to $EXTERNAL_AUTHORITY to get approval of that.”
A: Refocus the discussion on things which are within the hiring manager’s personal authority.

On re-negotiation

Make a List of Accomplishments. Do not mentions things other than that, e.g., context, requirement, times at the company, or your PERSONAL needs
Compare Industry Salaries
Prepare Your Case First
Schedule a meeting so that both sides are available, prepared and ready to have a fruitful discussion.
Anything is better than nothing
Don’t give ultmatum
give options, instead of just money
Timing is important. Don’t wait until auto raise or busy time. Also gives time for them to reach a decision
Keep it non-personal, and NOT emotional
View it as good negotiation skill practice

References

Codeforces Notes

2019-03-18 00:00:00 +0000

1114D: Flood Fill

By the rule, the final color must be either the leftmost color or the rightmost color, regardless of the starting point we picked.

cost(L,R, left) = 0 = cost(L,R, right) if [L, R] is within a seg already

cost(L, R, left) = min( cost(prevSegL, R, left), cost(prevSegL, R, right)) + 1 

cost(L, R, right) = min( cost(L, prevSegR, left), cost(L, prevSegR, right)) + 1 

459C: Pashmak and Buses

If we record the bus number for each students, this means that we can mark the arrangement as a k-based numbers!!!

This means that for each student, the condition translates to if there are two duplicates!!!

460C: Present

Bsearch on the min height, then scan from left to right to greedly assign bands

Problems for the day

2019-03-17 00:00:00 +0000

agc031 - B: Reversi

Solved this during the contest. By default, DP[i] = DP[i - 1], i.e., do nothing case.

To calculate DP[i], maintain the sum of DP[j], s.t., c[j + 1] = c[i] and c[j] != c[i]

agc031 - C: Differ by 1 Bit

Rather standard pattern of guess what does not work/what proprties the solution has -> prove that it is actually sound AND complete by construction, which is often induction

Each number differs only by 1 bit. Therefore, the oddity of # of 1 bits in A and B must differ (in total 2^n - 1 steps)

Claim: we can always find such solution as long as A and B has different oddity in terms of # of 1 bits

Proof: by induction. N = 1: trivial. Suppose it is true for N = k;

when N = k + 1, We can construct a solution by taking a different bit between A and B out, find a C, and construct A_takeout ….C , C….B_takeout bits.

Because the position we take out is has 0 and 1 in A and B, respectively, we add the that missing bit back to the left and right half to restore the sequence.

The implementation can be done recursively. Note that we need to pass taken out bits and corresponding values during recursive calls

983B: XOR-pyramid

I had problem spotting the recursive relationships!!! The problem is trivial once you figure this part out. One thing i should do is to draw out the interation steps in the form of trees/pyramids

650B: Image Preview

Standard bsearch + two pointer

808D: Array Division

Find the lowset number s.t., prefix sum > half. We know that at least 1 number must be moved from the array

Option 1: just take it out, and move to suffix

Option 2: take one out, and replace it with the suffix

708B: Recover the String

Given the # of 00s and 11s, we know exactly number of 0s and 1s

The final answer form is 0000111101111000

Notes on Toyota Kata

2019-03-15 00:00:00 +0000

Understand desired direction

Set up a vision on how work should be done in an ideal state. Process focused and NOT outcome focused, i.e., not a business vision
Example for software engineering:
- Zero defects
- Every check-in to production
- Highest value first, on demand
- Motivated people
Higher level’s target condition becomes next level’s vision

Grasp current position

Collect process metrics AND outcome metrics

Set the next challenge/target condition

Focus on process instead of outcome
Should take one step closer to the vision AND slight above your current knowledge threshould (“Frugality”)
based on the current condition - should be in absolute numbers
Set expiry date 1-3 months
Use hyphothese - experiment - compare prediction and actual outcome to move forward to the target condition.

Repeatedly move to new target condition until you reached the vision

Context, not control

2019-03-14 00:00:00 +0000

It is OK not to please the boss, it is not OK not to serve the business. Information hiding is unacceptable
Ideally employees do not guess what the manager wants and executes on that decision
Employees need to learn filter only relevent topics due to the rich contextual information
Values are shown by who gets rewarded or let go.

What should be included in the context

This is where most cross-team discussion should happen.

Metrics
- including definition of success
Assumptions
Objectives
- Result oriented. Focus on customer impact.
- Timeframe/urgency/importance
- How precise the solution needs to be? No error/can correct errors/experimental
Clearly-defined roles
- including key stakeholders, which is manager’s job to figure out

What should be avoided

Top-down decision making
Management approval, or managers make technical decisions for the team
Committes
Cross team discussion on tactics. Each team should be able to execute tactics without management approval

Comparison of flushing mechanisms: InnoDB vs Kafka

2019-03-13 00:00:00 +0000

InnoDB

Table and index data are cached in InnoDB buffer pool. It is recommend to keep innodb_buffer_pool_size 50 to 80 percent of system memory, by default it is only 128 MB.
If buffer pool does not have the data, the data will be loaded from the page cache, which is in the system’s address space (and from storage to page cache first if have not done so)
mmap data files directly into its user process address space
Uses fsync (innodb_flush_method) to flush the log to the disk immediately at commit (innodb_flush_log_at_trx_commit) or every 1 second (innodb_flush_log_at_timeout). Double buffering + sequential write means that this log flushing is fast.

Kafka

By default, Kafka fsync asynchronously, i.e., the broker returns ACK after data is copied into page cache
Also mmap its data file into user’s process address space. Note that it is mapped to off-heap memory in JVM to prevent the GC of actual log data object. (How to check JVM’s native memory usage though?)
During the send, Kafka copies directly from the page cache to the network card, and socket buffer copies a file descriptor but no data. Note all these data copying happens within the kernel space, i.e., no user -> kernel space copying happened.
Note that if JVM heap size is too huge, we may not leave enough to the OS cache, which is the actual busy area! On produciton 6G is probably enough, but leave 20G+ to OS cache on a 32G box

Note: fsync just flush buffer specific to an open file

Questions for interviewers

2019-03-11 00:00:00 +0000

This is a personal list based on web sources, order by my priority. As an interviewer, be prepared to answer these questions or you can sell your company as part of the answers.

Easy questions

How do people share learnings? How often?
What is the hardest part of the job?
What’s your favorite part about working at the company?
What motivated you to join the organisation?

Tough questions

How to define the success of this role 6 months into the job?
- How is performance measuring done?
- Who are the stakeholders of the metrics?
- How often are reviews conducted?
What is the need behind hiring for this role?
- What happened to the last person in this role?
From background/resume, what parts would make you doubt i would be a great fit for the company?
I have read that you have just launched x. What will that mean for the growth of this role?

Migrating from MySql to Google Cloud Spanner

2019-03-09 00:00:00 +0000

For existing MySql project, the migration cost to Spanner is often too high, including lock-in with GCP and code changes

Main cost changes:

No incremental id generation
HAVE to specify index to use for EACH query
No official JPA support. Have to change code to use JDBC directly
Change child table schema so that the PK of the parent table is the prefix of child table’s PK
Limit the index creation to no more than 3 per DAY per db
Avoid more than 30 DDL statements that require validation or index backfill in a given 7-day period

Migration process (copied from the official docs):

Convert your schema and data model.
Translate any SQL queries.
Migrate your application to use Cloud Spanner in addition to MySQL.
Bulk export your data from MySQL and import your data into Cloud Spanner using Cloud Dataflow.
Maintain consistency between both databases during your migration.
Migrate your application away from MySQL.

Reactor pattern in solving c10k problems

2019-03-07 00:00:00 +0000

One example is how kafka handles it.

Acceptor thread accepts connections, and pass them to processor threads
Each processor thread select() on multiple connecitons, i.e., the sync event demultiplexer in the textbook reactor pattern
On each event, processor sends the arriving requests to the request queue.
Pooled worker threads will fetech from the requeust queue and process it. Note implicitly, the queue and its maintainance logic acts as the dispatcher.
Woker thread puts the response to each processor’s response queue.
Processor thread gets result from its own response queue and answer the client

Previous Page: 10 of 23 Next

Zero-copy

Unix IO models

Debug pending pod

How TO works

Previous’s week’s plan

This week’s progress

Next week’s plan

References

Getting performance data

Interpreting performance data

On percentile

Suppose P(99,X) = 100ms, X consists of two sequential steps A and B, P(99,A) = 60ms

X consists of two sequential steps A and B, P(99,A) = 60ms, P(99,B) = 40ms

X sends requests to M, which will batch requests into a single DB request

In Prometheus

References

exawizards2019_c: Snuke the Wizard

exawizards2019_d: Modulo Operations

1105D: Kilani and the Game

PACELC

Case studies

1060D: Social Circles

1032D: Barcelonian Distance

490D: Chocolate

993B: Open Communication

Chapter 1. The Tar Pit

Chapter 2. The Mythical Man-Month

Chapter 3. The Surgical Team

Chapter 4. Aristocracy, Democracy, and System Design

Chapter 5. The Second-System Effect

Chapter 6. Passing the Word

Chapter 7. Why Did the Tower of Babel Fail?

Chapter 8. Calling the Shot

Chapter 9. Ten Pounds in a Five-Pound Sack

Chapter 10. The Documentary Hypothesis

Chapter 11. Plan to Throw One Away

Chapter 12. Sharp Tools

Chapter 13. The Whole and the Parts

Chapter 14. Hatching a Catastrophe

Chapter 16: No Silver Bullet—Essence and Accident in Software Engineering

Chapter 17. “No Silver Bullet” Refined

Chapter 19. The Mythical Man-Month after 20 Years

832D: Misha, Grisha and Underground

1000D: Yet Another Problem On a Subsequence

965D: Single-use Stones

999D: Equalize the Remainders

660D: Number of Parallelograms

961D: Pair Of Lines

1012B: Chemical table

702D: Road to Post Office

853B: Jury Meeting

1118D2: Coffee and Coursework (Hard Version)

1017D: The Wu

500D: New Year Santa Network

405D: Toy Sum

615D: Multipliers

Common arguments

On re-negotiation

References

1114D: Flood Fill

459C: Pashmak and Buses

460C: Present

agc031 - B: Reversi

agc031 - C: Differ by 1 Bit

983B: XOR-pyramid

650B: Image Preview

808D: Array Division

708B: Recover the String

Understand desired direction

Grasp current position

Set the next challenge/target condition

What should be included in the context

What should be avoided

InnoDB

Kafka

Easy questions

Tough questions