Code reading notes: sync-diff inspector
Major data structures
// TableConfig is the config of table.
type TableConfig struct {
// table's origin information
TableInstance
// columns be ignored, will not check this column's data
IgnoreColumns []string `toml:"ignore-columns"`
// field should be the primary key, unique key or field with index
Fields string `toml:"index-fields"`
// select range, for example: "age > 10 AND age < 20"
Range string `toml:"range"`
// set true if comparing sharding tables with target table, should have more than one source tables.
IsSharding bool `toml:"is-sharding"`
// saves the source tables's info.
// may have more than one source for sharding tables.
// or you want to compare table with different schema and table name.
// SourceTables can be nil when source and target is one-to-one correspondence.
SourceTables []TableInstance `toml:"source-tables"`
TargetTableInfo *model.TableInfo
// collation config in mysql/tidb
Collation string `toml:"collation"`
}
type Diff struct {
sourceDBs map[string]DBConfig
targetDB DBConfig
chunkSize int
sample int
checkThreadCount int
useChecksum bool
useCheckpoint bool
onlyUseChecksum bool
ignoreDataCheck bool
ignoreStructCheck bool
tables map[string]map[string]*TableConfig
fixSQLFile *os.File
report *Report
tidbInstanceID string
tableRouter *router.Table
cpDB *sql.DB
ctx context.Context
}
// Bound represents a bound for a column
type Bound struct {
Column string `json:"column"`
Lower string `json:"lower"`
Upper string `json:"upper"`
HasLower bool `json:"has-lower"`
HasUpper bool `json:"has-upper"`
}
// ChunkRange represents chunk range
type ChunkRange struct {
ID int `json:"id"`
Bounds []*Bound `json:"bounds"`
Where string `json:"where"`
Args []string `json:"args"`
State string `json:"state"`
columnOffset map[string]int
}
How is diff done
CheckTableData
loads the checkpointed chuncks data from the context. For the first time running, it willSplitChunks
(See below)- Spawns multiple channels to check chuncks -
checkChunkDataEqual
UseChecksum
to detect if we go for checksum mode or row-by-row mode- Checksum is computed by
dbutil.GetCRC32Checksum
bounded bychunk.Where
, which is a SQL formated by
fmt.Sprintf(
"SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', %s, CONCAT(%s)))AS UNSIGNED)) AS checksum FROM %s WHERE %s;",
strings.Join(columnNames, ", "), strings.Join(columnIsNull, ", "), TableName(schemaName, tableName), limitRange)
- An example checksum sql would be
SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', id, name, age, CONCAT(ISNULL(id), ISNULL(name), ISNULL(age))))AS UNSIGNED)) AS checksum
FROM test.test WHERE id > 0 AND id < 10;
How are chunks split
- The
Range
config in the toml is passed togetChunksForTable
- Two ways of splitting - bucket spliting and random spliting. Default config goes by random splitting
Random splitting
cnt, err := dbutil.GetRowCount(context.Background(), table.Conn, table.Schema, table.Table, limits, nil)
chunkCnt := (int(cnt) + chunkSize - 1) / chunkSize
chunks, err := splitRangeByRandom(table.Conn, NewChunkRange(), chunkCnt, table.Schema, table.Table, columns, s.limits, s.collation)
- Randomly pick
chunkCnt
number of ids as the splitting ID
SELECT `id` FROM (
SELECT `id`, rand() rand_value FROM `test`.`test`
WHERE `id` COLLATE "latin1_bin" > 0 AND `id` COLLATE "latin1_bin" < 100
ORDER BY rand_value LIMIT 5) rand_tmp
ORDER BY `id` COLLATE "latin1_bin";
- and create new trunk based on these separate id values
Summary Info
CREATE TABLE IF NOT EXISTS `sync_diff_inspector`.`summary`(" +
"`schema` varchar(64), `table` varchar(64)," +
"`chunk_num` int not null default 0," +
"`check_success_num` int not null default 0," +
"`check_failed_num` int not null default 0," +
"`check_ignore_num` int not null default 0," +
"`state` enum('not_checked', 'checking', 'success', 'failed') DEFAULT 'not_checked'," +
"`config_hash` varchar(50)," +
"`update_time` datetime ON UPDATE CURRENT_TIMESTAMP," +
"PRIMARY KEY(`schema`, `table`));
- Updates every 10 secs
SELECT `state`, COUNT(*) FROM @checkpointSchemaName.@chunkTableName
WHERE `instance_id` = ? AND `schema` = ? AND `table` = ?
GROUP BY `state`;
UPDATE `@checkpointSchemaName`.`@summaryTableName`
SET `chunk_num` = ?, `check_success_num` = ?, `check_failed_num` = ?, `check_ignore_num` = ?, `state` = ?
WHERE `schema` = ? AND `table` = ?
Chunk Info
CREATE TABLE IF NOT EXISTS `sync_diff_inspector`.`chunk`(" +
"`chunk_id` int," +
"`instance_id` varchar(64)," +
"`schema` varchar(64)," +
"`table` varchar(64)," +
"`range` text," +
"`checksum` varchar(20)," +
"`chunk_str` text," +
"`state` enum('not_checked', 'checking', 'success', 'failed', 'ignore', 'error') DEFAULT 'not_checked'," +
"`update_time` datetime ON UPDATE CURRENT_TIMESTAMP," +
"PRIMARY KEY(`schema`, `table`, `instance_id`, `chunk_id`));
Atcoder notes
C - Candles
ans = dist between ends + min distance from either end
C - Strange Bank
- No need for memorization. Note in knapsack solution, we just need >= and A[0] instead of checking >
- Note that 6 coins and 9 coins they don’t intersect, so we just brute force on the 6s amount, and the remaining will be shared by 9s, followed by 1s
C - String Transformation
We can prove that the transformation retains the isomorphism if it starts with one. so we can just verify that final mapping is an isomorphism
C - K-th Substring
- Brute force won’t work because it will be s ** 2 substrings
- K no more than 5 means that the top K answer will not be longer than 5 anyway. Otherwise, we can reduce the length of the answer and improve it
- So we just sort all 25000 strings, each with length 5
C - Factors of Factorial
Calcuate the prime factorization of each number and then build the divisor by the PRODCUT of numbers
D - Rain Flows into Dams
Since it is odd, we can do the odd - even trick and sum up together
C - AB Substrings
Corner cases:
- no BA
-
no BX and AX
The general theme is that every time you see a counting formal with minus component, have a dicussion on the cases where the sum could be negative and when the positive item could be 0
C - String Transformation
Every swap will keep the isomorphism, and we can transform to any isomorphism by swapping. Therefore, we just need to ensure the final isomorphsim is good, because it is always reachable
C - Snuke Festival
Good for coding excercise
D - Flipping Signs
Simple problem, but similar to two segment insight
D - DivRem Number
N = q * m + q = q (m + 1), with q LT m. Just need to find all divisor pairs
D - Remainder Reminder
Similar to divisor finding, iterate on all possible divisor b >= max(2, K + 1) to generate a, check how many full b “bands” we have and calculate mod >= K in the last partial band. Note that mod band start at 1! so the partial bond contributes to N % b - K + 1 pairs
Atcoder notes
D - Harlequin
- One state is “some has odd”, it is counter is “all have even”, and it maps to win/lose conditions
C - Align
- It is obvious the solution will be zigzaged. Therefore, we can generate a formula for the final answer
- we can store and sort the coefficents, and apply it to the sorted factors - way cleaner than my classification solution
B - Simplified mahjong
- When A[i] != 0, it is obvious the anser will be floor(S/2) - by sorting and we will reach the upper bound
- When A[i] == 0, we know the problem can be decomposed into segments
- So we just sort the cards, and calcuate the size of 0-separated segments, and sum up the result applied by (1)
D - Coloring Dominoes
- first col needs classification
- I miscounted 2 * 2 + 2 * 2 case, should have 3 choices instead of 2
C - Base -2 Number
- Classify based on the mod 4, 00, 01, 10, 11, add 4 to the original number before shifting
B - ABC
- Replace BC with D and for a list of new strings
- The inverse cost can be calculated by scan/bubble sort, every time we find a need to swap, the cost is increased by the number of to-be-swapped items
D - Blue and Red Balls
- Star and band. Separating K into i empty buckets (could be empty) has choose(K + i - 1, K) ways, in the non empty case it becomes (K - 1, K - i). Note it is a direct application of formula
- My first guess was incorrect. It was too high, and didn’t consider combiniation case
D - Coloring Edges on Tree
The official approach is cleaner than mine
- We know that answer >= deg(v). In fact, max(deg(v)) is sufficient to give a solution
- We can just BFS, add keep assigning the color form 1 to K while skipping the color of the edge from grand parent to parent. Such construction ensures the conditions are satisfied
- Note that DFS is not feasible here because we lose the coloring information on siblings
E - Colorful Hats 2
The key insight is that given colors in 1…i, the color in i+1, can be uniquely determined by picking one color same as the predecessor. The mulitplication factor acts as the “pick”
Index 0 acts as the sentinel value
I got WA in formula - tried to classify 2 color and 3 color case. This implys at the each index, the pick is not unique even with the multiplier - classic anti pattern
D - Lucky PIN
The official brute force solution is cleaner than mine
for i in range(100):
ds = [i/100, (i/10)%10, i% 10]
find = 0
for i in range N:
if match:
find+=1
if find == 3:
ans+= 1
It also has a DP approach with 240M ops and 240M memory
C - HonestOrUnkind2
The official soultion is cleaner than mine. Mainly because even if I used bitmask, I still use a second array to track correctness, which is not needed, due to the bit mask
B. Counting of Trees
For reason unknown, i implemented count in a very verbose way…Just a simply dict and update is enough. Although, I missed the check where D[0] has to be 0 due to the condition in the problem
D. Disjoint Set of Common Divisors
The answer is the prime factorization of the GCD. Note that prime factorization can run in sqrt(N), and after sqrt(N), if the number is not 1, then we know that number is the only divisor left, and it is prime.
Proof: by contradiction
- suppose that remaining number is not prime, then we should have divied it in the previous step, where all divisors no more than sqrt(N) have been discoverd
- Because it is prime, we know it is the only possible divisor left, otherwise, it would be a product of two divisors > sqrt(n) => contradiction
- Note by this proof, during coding we don’t have to limit the upperbound to tight sqrt(n), as long as we exhause all divisors under an upperbound it still holds
A - 01 Matrix
A construction solution that I was not able to come up with. One lesson learned is that in matrix solutions, at least one of row or column is “nicely” cut into bands (as min number as possible), and then we twist the other dimension to fit the requirements
C - Strawberry Cakes
The insight is similar to above, cut the matrix into bands and then twist it in the other dimension. Otherwise, implementation will be messy
D - Powerful Discount Tickets
- I used the bsearch approach. One case to cover is when the remainng tickets are not enough to cover all tickets
- Official solution uses PQ, the key insight is that floor(X/2 ** Y) = floor(floor(X/2 ** (Y-1))/ 2)
D - Ki
- Python version gives TLE. C++ version is fine
- Note that when we traverse trees, after we visit the node, we init the children properties on the spot, i.e., every time we reach a node, all its parent/parent edge info has been known
A - Triangle
- The key insight that I missed is that suppose one vertex is at (0, 0), then the area of the triangle is abs(x2 * y3 - x3 * y2)/2
- Another key insight is to set another ONE coordinate to 1 and the other 10^9 to cover the max value case, so we convert to Euclidian divison formula:
a = bq + r = b(q+ 1) - (b - r)
However, I still don’t understand why I need an additional %b
to pass the last test case
D - Enough Array
- Idea 1: prefix sum with bsearch
- Idea 2: two pointers. Note that when you move the head pointer, the checked index should be i instead of i + 1
System design: service registry
Requirements
- Data consistency
- How long it takes to discover offline service?
- How many service instances can it support?
- How many machines you need to run this registry?
Eureka
- P2P, all provides to outside services, when it is registred on node, it will be gossiped to other nodes
- ReadOnly default sync time is 30 sec. Service refreshes on 30 sec interval
- Default: Every 60 sec to detect offline service. Heartbeat expires at 90 sec
My Design
How Spring resolves circular dependency between beans
- Object’s instantiation is done via reflection
ApplicataionContext.getBean()
, and after that object’s properties are done via object’s init - When object A needs another object B, we will recursively uses
ApplicataionContext.getBean()
to get A, and then inject into B. At this time, the objects are created, but their properties are not set. - Suppose A has B, and B has A as member, on init
- A insantiated first
- A finds that B is needed,
getBean()
on B- B is instantiated
- B finds that A is needed,
getBean()
on A. At this time, A’s reference is registered already, so Spring returns reference to A in the first step - B sets the member reference to A, finishes the init, and return
- A now gets the reference to B, set its member reference to B, and returns
Implementation
protected Object getSingleton(String beanName, boolean allowEarlyReference) {
Object singletonObject = this.singletonObjects.get(beanName);
if (singletonObject == null && isSingletonCurrentlyInCreation(beanName)) {
//first time ever to create
synchronized (this.singletonObjects) {
//singletonObjects is { bean_name : ObjectFactory} map
singletonObject = this.earlySingletonObjects.get(beanName);
if (singletonObject == null && allowEarlyReference) {
//bean is marked in creation, but not created, do it now
ObjectFactory<?> singletonFactory = this.singletonFactories.get(beanName);
if (singletonFactory != null) {
singletonObject = singletonFactory.getObject();
this.earlySingletonObjects.put(beanName, singletonObject);
this.singletonFactories.remove(beanName); //make sure we init only once
}
}
}
}
return (singletonObject != NULL_OBJECT ? singletonObject : null);
}
RocksDB in the context of TiKV
RocksDB concepts
- Column family: Each k-v belongs to only 1 CF. CFs share the same WAL (to support atomic writes), but different memtable and table files. Note classic CF mapping would be (row_key, a sorted list of (column, value)), but in RocksDB it is just (cf, key, value)
- Write buffer stores the memtable. Block cache is where RocksDB caches data in memory for reads
- Memtable flushes are by default scheduled on HIGH thread pool, while compactions on LOW thread pool. Stalling memtable flush can stall writes, increasing p99 latency. Set the thread pool with
max_background_jobs
Compaction
- Size-tiered compaction strategy: small SSTable to medium SSTable. trade read and space for write amplification. Compaction merges all sorted runs in one level to create a new sorted run in the next level.
- A common approach for tiered is to merge sorted runs of similar size, without having the notion of levels
- Level-based compaction strategy: Each level has at most 1 sorted run. Some data will go to the next level if current level is approaching limit. Trade read/write amplification for space amplification. In paper is all-to-all merge, but in RocksDB it is some-to-some
- Leveled compaction in RocksDB is also tiered+leveled. There can be N sorted runs at the memtable level courtesy of the max_write_buffer_number option– only one is active for writes, the rest are read-only waiting to be flushed. A memtable flush is similar to tiered compaction – the memtable output creates a new sorted run in L0 and doesn’t read/rewrite existing sorted runs in L0.
- Sub compaction speed up a compaction job by partitioning it among multiple threads.
Read & write amplification
- RA counts the number of disk reads to perform a query. It is defined separately for point query and range queries
- Flash-based storage can be written to only a finite number of times, so write amplification will decrease the flash lifetime
- LSM with level-based compaction has better write amplification than B-tree
How TiKV uses RocksDB
- All data in a TiKV node shares two RocksDB instances. One is for data, and the other is for Raft log.
- The default RocksDB instance stores KV data in the default, write and lock CFs
- raft logs are stored as region_id/log_id -> value
- data is stored as key + bit inverted ts -> value. TS is inverted, so that highest bit will be the first item
- Tikv uses prefix bloom filter, i.e., it predicts on the prefix instead of the whole key
- Use TableProperties in Split Check, MVCC GC, and compact region range
- When adding a replica to a new server, generates a SST snapshot and sends to the new server directly
- Disk space will be released only when tombstones have been compacted
On LSM tree
Motivations
- More writes than reads, but still need to keep an index for retrival, e.g., histroy tables, logs, and other time-series data
- The original paper focuses on the hardware cost LSM brings, and claims it is much better than B-tree. The root cause is the lower disk arm cost
Overview
- Mutliple level of trees, for the simplicity, we assume 2 levels: c0 and c1
- c0 is a small, in memory, AVL tree. Implmentation wise, it is the MemTable backed by WAL, which stores keys sorted
- c1 is an on disk B tree, but its hot node’s pages are most likely in memory
- In practice,the LSM tree typically consists of multiple levels, each representing a sorted run of data. Levels closer to the top have newer and more frequently accessed data, while lower levels contain older and less frequently accessed data.
- On insertion, the new index entry goes to c0, and will be merged to c1 when c0 becomes too big
- Retrival will look at c0 and after that c1
- The design gives trade single key read for key range scan and write performance
In memory c0
- Recovery is done by reconstrution from the record logs, since c0 hosts only the most recent inserts. However, we still need to define the replay starting point carefully
- Deletion is handled as a special deletion node
- When the MemTable is too huge, convert current MemTable to an Immutable MemTable (IMT),and create a new MemTable.
On disk c1
- single page nodes on each level below the root is place together in multi-page disk blocks. Each node 100% full
- Note direct write to c1 except during the merges. So that we don’t have to read-modify-write an entire block due to the random write
- On disk data is sorted runs of data with no overlapping key ranges
- Implementation wise it is the SSTable. Again, keys are store sorted
Read
- start from lowever level SSTable and keep going deeper until we first find one.
Update
- just like insert. This means we accept that multiple versions of same key exists at different SSTables
Delete
- insert a k-del mark. The value will be truly deleted when SSTables merge
How to merge
- Idea similar to merge sort. Assuming we have a pointer ,for each tree, to the next to leaf to merge
- During the merge the indices are still available except for short locking period
- Starting from the smallest value of c0, and load the next multi-page block from c1 into memory
- Generate the new multi-page block similar to merge sort, and write them back to the right most positon of c1. Note that parent node references will be updated too, and old c1 blocks will not be recycled immediately for recovery purposes
- the merged leaves from c0 will be removed from memory
- The process repeats after the right most c0 leave is written to c1
- Merge process itself will checkpoint and flush to disks periodically
Prometheus file structure
- Every two hours’ data is in the same block. Each of block is a directory with chunks, one metadata file, and one index file
- On average each sample takes no more than 2 bytes
- Almost every section starts with a length field, and ends with a CRC32 checksum
Index file
- Symbol table: a sorted list of strings, which are used in label pairs
- Series (more below): a list of series. Each series hold the label set of the series and its chunks (more below) with blocks
- List of postings: Each posting is a list of series references that contain a given label pair
- Posting offset table: list of entries, each is a label pair to offset to its series list in the postings list section. This section is sorted
- Table of content: entry point to the entire index and points to various sections in the file.
Series section in the index file
- Each series holds number of lables, followed by the tuple of symbol table references
- The next section is the number of indexed chunks, each of chunk index contains the min time and max time of the time, and their position in the chunk file. Note that all ts except the first one is stored as deltas
chuck
max 512 MB, each record of chuck is len + encoding + data + CRC
WAL
128MB by default. A segment is written to in pages of 32KB. Mostly borrowed the idea from rocksdb
- type
- len
- CRC
- data
WAL accepts
- Series records encode the labels that identifies a series and its unique ID
- Sample records encode samples as a list of triples (series_id, timestamp, value)
- Tombstone records encode tombstones as a list of triples (series_id, min_time, max_time)
Example
Suppose i have one serive deployed on two machines, each will output [{service: sid}, {node: nid}] label sets
Inside the index file
- Symbol table: [n1, n2, node, service, sid]
- Series:
- [2, 0, 3, 4]
- min time in c1, delta for max in c1, offset in c1
- [2, 1, 3, 4]
- min time in c2, delta for max in c2, offset in c2
- Postings:
- [0], [0, 1], [1]
- Posting offset table:
- ((node, n1), 0), ((node, n2), 2), ((service, s1), 1)
Note in the design, we have direct index on only single lable pair, or the raw data. This means most queries, we will have to merge to find series entries, and then scatter-merge
On working in Japan
Notes form HBR review articles
- What we can learn from Japanese Management
- How to negotiate in Japan
Working with interpretors
- If possible, share the draft doc/presentation beforehand
- Loudly and slowly, use simple words
- Explain the same idea in 2-3 different ways as checksum
- 30 secs break to let interpretors go
- Number is easy to mistranslate, write it down if possible
- Do not interrupt interpreters
- Use positive form instead of negative wording. Also avoid double negatives
-
Use body movements
- Japanese decision process is very democratic - ringi means buying in from all stakeholders, instead of LCA of those stakeholders. (I got burned by the slow process many time)
- When the negotiation is stuck, keep the channel open and ping from time to time
- Ask Japanese lower levels for more info on friends and enemies (nemawashi?)
On Product Managment
Reading notes on 俞军产品方法论. Page number in brackets
- AB test lowered barrier to become PM, and weaker PM can use good result to slack off on thinking on customers (13)
- Bar of PM (23)
- to understand the domain model of a user - able to predict with reasonable accurancy on the effect of iteration
- research user model, and analyze with real individual as sample
- For user with consistent perference and congnition, no need for special scenarios. Otherwise, need extra work needed to guide user so that he would max expected utility by conducting the desired behavior
- Hierarchy is more efficient than democratic process in the context of company. Note hierarchy here means the professionals and domain experts. Each hierchary inside company needs to have higher efficiency than market, and each expert needs to make better than average decision at his position.
- Outcome of the product (useful to write after action report)
- financial gain
- domain know-how, more accurate self-evaluation and awareness, and innovation
- a team which works well with each other
- trust between user and upstream/downstream stakeholders. This reduces transaction cost of new transactions and products
- Once owning something and then selling it means some form of loss. For the same amount of change, negative unitily loss is about 2.5 times of positive gain.
- Problematic to analyze user as a natural person, i.e., # of registered user is a flawed metric - company may ignore what utilities it provides and how many user requirements are satified. Similarly, same DAU may mean much more user value, if the product satifies more user requirements as it goes
- Tech metrics improvement may have low corelation with the utility the user feels. This means we should decompose products into utilities, analyze the iteration’s effect on the utilites, and then predict user behavior change caused by utilities changes
- source of transaction cost (112)
- bounded rationality
- opportunism: mistrust between parties means increased monitoring cost
- uncertainty and complexity: have to price them in
- small number/volume market
- information asymmetric
- atmosphere
- Transaction cost in the context of market (113)
- search and measure
- compare and decide
- execution and warranty
- User’s trust may be far off from facts. In this case need extra work to fix that(115)
- standarize product to make quality easy to manage, and thus to reduce transaction cost (115)
- for bigger platforms, it is about trade off instead of innovation and design (132)
- convert users new to the domain asap, this will increase the conversion cost for competitors (136)
- Data-driven slows down the decision process, and only possible when data is cheap and readily available (142)
- Extra profit comes from people with good judgement - often not data driven (146)
- When have an idea, write it down and try to disprove it, and update it with new info and context in the future
- User should mean a combination of requirements in a certain scenario (157)
- If your manager or partner lacks critical thinking skills, agree and praise on minor points, make them feel that you are similar, only after than insist on major points (159)
- Discover, but not create requirement(283)
- Use (price user willing to pay) - (company cost) to decide if it is good business
- PM needs to have judgement on the decision’s certainty and feasibility (288)
- Care about user feedback but not suggestion
- PM needs to decide when to data-driven and when to use data only as reference point (315)
- PM should be gave to able to split the difference. (320)
- Avoid right but aggressive person
- PM should have good sense in decision making, commercial, and professional/domain. (326)
- If only one product, then CEO or founder is the biggest PM
How to read source code
- Start from easier ones, e.g., Spring Boot
- Read the dependency first, e.g., kafka -> zookeeper -> java.concurrent
- Create a hello world, add debug points. Go top-down upfront only after you are familiar with the codebase
- Try NOT to understand every single line, just focus on the most important flow
- Draw graph
- Need to repeat this a few times to trully understand it
Finding problem to work on
- Wait for problems before you introduce solutions (esp. processes or infrastructure)
- Your boss expects you to give input on what should be worked on. Use your down time to research that
- If you hear other teams complaining on a problem, don’t bringing in the solution yourself. Wait until they raise it, or inform that team’s manager, and let him decide
- Need to convince all stakeholder they want you to solve this problem. This buy-in process will take a while. If it is too tiring, then maybe you should not solve it
- If you find a problem in another team with clear examples, and they are not able to address it to your satisfaction, it is time to escalate to your manager, but not before that.
- Solving unowned problem is fun from time to time, but should not be normal - you are not following the right company/boss
- To build trust, teach your team how to show work, and fix how to give feedback
- Talking to customers directly is more effective than any observation data. Observation data needs formal process for evalutaion
- To affect opinions of people, not only the person himself, but also whoever that can directly affect the said person too. Build common interest between the people around the authority
- Easier to advance your career with profit center than cost center
On ReentrantLock
- ReentrantLock is just an encapsulation of AQS: it has a Sync object which extends AQS
- Need to implement
tryAcquire
in AQS
vs synchronized
- Synchronized is based on JVM, RL is based on AQS in JDK
- RL supports
lockInterruptibly
, so that other waiting threads can stop waiting - RL supports fair lock, synchronized is unfair lock
synchronized
usesnotify()/notifyAll()
, i.e, only oneCondition
for the whole monitor. No such limitation forReentrantLock
State: 0 - nolock. Every acq +1, release -1
ReentrantReadWriteLock (RRWL) uses 16 bits to save write lock state, and 16 bits for read lock state
On setting goals
- Stretch goal means diffculty and/or novel. Note that a series of small wins itself do not lead to stretch goals, because path to stretch goal is uncertain, and can be broken down up front. However, small wins build the resources, energy, and learning needed for stretch gaols
- Slack resources + recent wins = stretch goal. Be self-disruptive
- no resource + recent wins = small wins with incremental successes.
- slack resources + recent loss = experiement for small losses,i.e., mildly risky experiments. This builds resillence and confidence in the org
- no resource + recent loss = small wins with incremental successes. However, orgs in such states are most likely to use stretch goals
- Slack resources has be conciously built. One way is to focus on learning to enchance existing resources
High Output Management
Reading notes on this famous book by Andy Grove
- Manager is responsible for the output of the neighboring orgs they influence
- Leverage is best done via information gathering, decision making, and “nudging” others by providing suggestions
- For mission-oriented meeting, the meeting chair should be responsible that stakeholders are well prepared
- Centralized for leverage, decentralized for speed
- The author favors matrix managment structure, one for functional, one for mission.
- The decision making will be done via this peer group
- When complexity, uncertainty, and ambiguity is low, the behavior is influenced by the expectations, when they are high, culture is the main driver. Which is more suitable depends on the context
- However, when user’s focus is more on the self-interest rather than group interest, low CUA means people will be driven by incetives, and such people are not manageable in high CUA
- It is not a problem if people left for money or perks. It is a problem when they feel unheard or unappreciated
- When people want to quit, just listen. Don’t argue. Wait until the prepared part is let out, so the real issue will emerge
- Ask for time before you come back with solution
- Pair the the output indicator with the quality indicator
- Measure output not activity
- Batch up issues and questions to reduce interruption
- No more than 8 people for decision making meetings. No more than 25% of time spend on ad-hoc mission meetings
- turning the workplace into a playing field where subordinates become athletes dedicated to performing at the limit of their capabilities
- Don’t make friend with people you manage if you find it hard to deliver a hard performance review. Otherwise, it is OK
- You need to be optimist to build a product, but that also means you are more likely to ignore bad news
- For conflicts, for each opinion pick the lead, and form a meeting with their lowest common ancestor to break ties
- Don’t make your own decision
- Make sure you get involved
- Breaking tie also means the conflicting sides give up responsibility
- People follow company values way more than strategy
Running meetings
- Mission-oriented meeting: solve one problem. It should have no more than eight people. After this it is hard to reach a consensus. Once the meeting is over it is the responsibility of the chairman to send minutes that summarize the discussion that occurred, the decision made and the actions to be taken.
- Operato reviews: process-oriented,i.e., scheduled at regular intervals, present results to people not related to the area. Supervisor should ask questions
On deadline and time estimation
Goal
- 80% of requirements can be released in 2 weeks
- Deadline is the best driver: so small deadlines
- The true end of task - complete work and get evaluation
- An outcome is typically defined by a quantitative metric. a number that is both meaningful to the business and measurable by the team.
- Not easy to define the correct metrics and infrastructure to measure it. Part of the reason OKR uses multiple iteratons to refine the metrics
- Work complicates to fill the available time.
- 80% of requirement dev can be done in a week
-
80% of release can be done 1 hour after the submit
- Assign a coordintor for progress coordniate if >= 3 people involved - most likely owner
- Key to the problem is almost always blocked requirment instead of blocked resources
- Don’t commit a hard deadline unless you have to - your schedule will become highly inflexible
Estimation
- Underestimates do not change significantly during the activity until about three weeks before the scheduled completion
- Imagine 3 times effort from a debugged program to a product, and 3 times effort from debugged problem to the integrated system
- Insider tends to under-estimate time (even with prior knowledge on similar problems), outsiders tend to over-estimate
- However, if we break down tasks, the segmentation effect tells us that the sum of sub-task estimates are greater than the singel time allocated to the whole tasks
Add/Remove elements from Java ArrayList inside foreach
Foreach
is the syntactic sugar for iterator for that collection
In ArrayList’s Itr
int cursor; // index of next element to return
int lastRet = -1; // index of last element returned; -1 if no such
int expectedModCount = modCount; //modCount is from the list
public boolean hasNext() {
return cursor != size; //size is from the list
}
If we call ArrayList.remove()
inside foreach
to remove the non-first element in the list
- hasNext() will return true because cursor is now > size, and Itr.next() will be triggered
- next() will
checkForComodification()
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
At this moment, because we didn’t remove via iterator, the expectedModCount
will be less than modCount
, and exception will be thrown!
Note that Itr has a remove()
but does not have add
public void remove() {
if (lastRet < 0) //this check ensures that we can't call Itr.remove() twice in a row
throw new IllegalStateException();
checkForComodification();
try {
ArrayList.this.remove(lastRet);
cursor = lastRet;
lastRet = -1;
expectedModCount = modCount;
} catch (IndexOutOfBoundsException ex) {
throw new ConcurrentModificationException();
}
}
In ArrayList, it has a modCount.
In ArrayList, we have an internal Itr time, which has a expectedModCount, it implments Iterator, i.e., the instance you get from ArrayList.iterator and use that to prevent concurrent change but iterator.remove is OK
On Java volatile
Happens before rule
- For a lock, all ops before lock releases happens before ops after lock is acquired
- All volatile writes and before its writes happens after reads and ops after that read
- Thread start and ops before that start happnes before things within that start
- Volatile forces the program to write the value immediately to the memory in an atomic action by passing cpu cache. reading bypasses CPU cache too, or force the CPU cache to expire early
- This means if we have only one writer, and all reader/writers are volatile, then the program is thread safe. However, it is not true if you have more than one writer
- Visibility rule: if the thread reads a volatile value, then ALL variables visible to the thread will be re-read from the memory. This means the order of instructions matter! If you have read AFTER the volatile read, or it got reordered by the compiler, than the vars after the volatile is NOT affected. Luckily, JVM prevents this
- Note ++ is a syntactic sugar, it is compiled into a read-update-write operation - so you can’t use volatile with it!
Java singleton with volatile
public class Singleton {
private voltile Singleton _instance;
private Singleton() {
}
private Singleton getInstance() {
if (_instance == null) {
synchronized(Singleton.class) {
if(_instance == null)
_instance = new Singleton();
}
}
return _instance
}
}
Note that without this volatile
the code is incorrect
- Compiler might reorder the so that
_instance
is assigned the memory address before the instance finishes init - This means the another thread may get an object with only defaults
static
class King{
private static King kingInstance;
static {
kingInstance = new King();
}
private King(){
}
public statiic King getKingInstance() {
return kingInstance;
}
}
class King{
private static final King kingInstance = new King()
static King getInstance() {
return kingInstance;
}
private King(){
}
}
Lazy loading with internal static class
public class King {
private King(){
}
private static class KingInstance {
private static final King KINGINSTNACE = new King();
}
public static King getInstance(){
return KingInstance.KINGINSTANCE;
}
}
- static interanal type will not be instantiated immediately at class loading time, and only be instantiated at the first call of getInstance()
- JVM’s class loading is thread safe, so only class is loaded
On decision making
- Try to keep only ONE owner who is also the decision maker. Decision by committee is OK if it is weighted
- Make sure input provider feels heard. Record people’s inputs and decisions
- After that, vote with weights, privately, and the whole team will commit to the decision
- The team should agree on for long they will commit the decision before revisiting it. The decision maker has an emergency button to revisit it sooner
- Email the minutes to all stakeholders. In the email, include input providers, and each options’ pros and cons.
- Avoid the analysis paralysis
- Have a deadline by which the decision has to make.
- Don’t wait for the complete info to arrive
- Delegate the decision making to someone on the team if possible.
- If you are OK with whatever decision the delegate makes
- Once done so, try not to override the decision in the last minute, which is a sign it should not be delegated!
- The approver does not veto on decision itself, but on the quality of the decision
- People start with opinion rather than fact. So just ask people to search for facts that defend this decision
- Decision is rarely between right and wrong, but between different trade-offs.
On innodb index
Why does innodb pick B+ tree over B tree
- Non-leaves no longer store the data, so each node can store more childs, i.e., the tree height will be lower
- Linked list between leaves means range search/scan is easier
- Leaves goes to disk, non-leaves good for memeory
How to estimate index size
- B+ Tree each node fits into a page, with k keys, k+1 pointers the children, and 2 pointers to siblings
- An OS page is about 4KB, innodb is 16KB
- Estimate 64 bit key + 64 bit address = 16 bytes, so each node will point to about 1k nodes
- Therefore, a B+ tree of level 3 (inlcuding the root and leaves) can host almost 1B rows
- For 1B rows, total index for 1 key will be around 16G. In effect, it will be more considering padding and non-long PKs