atCoder Notes
arc098_b: Xor Sum 2
By case discussion, we can see that we can treat each bit independently. For each bit, we can just caluate the last 0 to the left of it, and the add the min distance from all last 0 to the answer
arc098_c: Range Minimum Queries
Lets fix X first and discuss cases!!!
Extreme case: suppose X is the min value of whole array, to optimize the target value, we just pick the min Q values
Now suppose, we try all possible X values, then we can separate the array into bands, each of which has value >= X, and then we combine all bands with length >= K, sort the combined band, and take the top 1 and Q entries. Note if we have < Q entries, the the X value is too big.
agc025_b: RGB Coloring
We can view the combination as painting two colors independently, i.e., A * a + B * b = K. So for each a, we have a fixed b. This means, for each a, we have a fixed b, and introduces (n choose a) * (n choose b) ways.
How To Drive Change as a Software Engineer - My Notes
-
Persuading bigger team is MUCH harder than the smaller teams, but working with people IS the work in a people-organization!
-
Social proof often matters as much, if not more, than facts and logics.
-
Presentation matters! It can change people’s view even if there is nothing they don’t already know
-
When getting a “NO”, ask what you need to do to get a “YES”, and ideally let the other side commit on that.
-
Basic principle: short feedback loop so that you can gain momentum for your work
-
Find who might support you, pitch your ideas to them individually, and gather them together so that you can have a self-reinforcing concensus.
TiDB in OLAP
Note: all links are in Chinese
-
Use case: aggregate isolated MySQL clusters for OLAP. They run TiSpark on top of it
-
Production TiDb cluster has tens of nodes, with tens of TBs of data.
-
To keep the production table size in check, they regularly purge production tables and move old entries to separate archive tables
-
Need to watch out for the performance issues of DDL operations in production and archive tables
-
For migration, they use the official TiDB syncer tool to run TiDB as a Mysql read slave
-
Daily archived > 100 mil rows, > 100 G. Current TiDB volume “tens of TBs”
-
As the post was written, TiDB Binlog relies on Kafka
-
Use TiDB as dataware house. It replaces AliCloud’s ADS and ES
-
ADS has cost issue, and ES has difficulty handling complex queries with high dev/op costs.
-
Current cluster 5 nodes, each with 16 cores and 32G ram
-
Sync data from Alicloud’s DRDS to TiDB, and runs TiSpark against TiKV, TiDB’s storage layer
-
2T raw data incoming per day.
A Restaurant Merchant/Order/Cashier Saas
-
TiDB supports the operational datastore. Check the link to see the test queries they run.
-
Before: RDS -> Mongo via Kafka -> Hive. After: RDS -> TiDB via Kafka -> Hive. TiSpark queries both TiDB and Hive
-
Cluster setup: 8 nodes. 5 of them are for storage layer. Each TiKV/Storage node is 16 core/128 G ram with 2 1.8T SSD
-
Peak QPS 23K. Data volume “couple of Ts”
Another Restaurant Merchant/Cashier SaaS
-
Near real time complex queries. TiDB replaces Solr.
-
Current deployment 8 nodes. Storage layer on 16 cores and 64G ram
-
They need real time analysis capabilities.
-
Their raw data source is Mysql databases, but they don’t want to spend too much effort on Mysql -> Hive/Hbase ETL pipeline.
-
They need Spark support, so they choose TiSpark + TiDb
Centroid Decomposition
Motivation
Given a tree, how many paths of length k are there?
Centroid
- Claim: We can find a vertex v in tree s.t., none of the subtree has size > N/2 if v is the root of the tree
- Proof: Suppose we pick a random v as a candidate, if one of its subtree has size > N/2, we move to that subtree’s root.
- Note that finding the centroid will be similar given the DFS
Centroid decomposition
- Starting for the whole tree, find the centroid
- find the centroid for each subtree, and connect them to the centroid find in the previous step
- recursively apply, until we form a new tree
Properties of Centroid Tree
- the new tree contains all nodes of the original tree => by tree induction
- The height of the tree is O(logN) => because at each level, the size of max subtree is reduced by at least half
- The construction of the tree takes O(nlogn). Because at each level, we traverse each node in the tree at most once, and there are O(logN) levels
- Consider any path A - B in the original tree, the path can be broken down to A - C - B where C is the LCA of A and B.
Proof: Both A and B lie in the subtree where C is the original subtree, and they were separated into different subtrees when C is chosen, i.e., C will be node with smallest label in the original A -B path
- So we decompose the given tree into O(nlogn) different paths, each of which is a centroid to all its children. Again, the total # is O(nlogn) is because at each level , we introduce O(n) paths, and there are O(logN) levels in total
CS Academy Notes
Round 77: Two Rows
DP state = cost(col, isFirstRow, isFirstRowFilled, isSecondRowFilled, isFirstPlayer)
Round 77: Expected Lcp
Uses trie
Round 78: Banned Digits
Starndard DP - With the same prefix, how many #s we can assemble given the constraint
-
go for possible #s < next digit, and then pow of remaninings
-
go for the same digit, keep going
Round 78: Strange Matrix
Uses segment tree
Google Code Jam 2018 Round 1
1A: Waffle Choppers
Consider only one dimension, we can see that we can determistics decide which row to cut. If a row is empty - it doesn’t matter!!! Either way would do
1A: Bit Party
-
bsearch on the final answer
-
Note that R <= C, so we can just calculate the capacity for each cashier!!!
1B: Mysterious Road Signs
-
Every time we see a new segment, we can assume its left is the start of new segment, and keep scaning until we have more than two possible choices. Then we know it is the end, of the segment, and we need to start again. Of course, we need to try both directions
-
large dataset can be computed with divide and conquer - just need to guess the 3 possible cases: all in the seg, left works with left, and left works with right => T(S) = 2T(S/2) + O(S) gives O(SlogS)
atCoder Notes
arc096_b: Static Sushi
Try any of the four possible options, so for each sushi, we need to calculate
-
net gain at this point from the start clockwise, and max possible netgain seen so far
-
net gain at this point form the start cclockwise, and max possible netgain seen so far
We can scan each point, and check all possible net gain + maxpossible negain seen so far from from the opposite direction
arc097_b: Equals
We can form pairs as disjoint sets, inside each set, see how many index values appear also as value itself
arc097_c: Sorted and Sorted
Hard problem for me!!!
The inversion number is the cardinality of inversion set. It is a common measure of the sortedness of a permutation[5] or sequence.
Standard comparison sorting algorithms can be adapted to compute the inversion number in time O(n log n)
From the constraints in the statement, at any moment, the set of placed balls must be of the form ”i black balls numbered 1 through i, and j white balls numbered 1 through j”, and use an O(n^2) dp to find the min inversion # = cost
agc024_b: Backfront
Find the longest consecutive increaseing subsequence at each index, and take the longest one
agc024_c: Sequence Growing Easy
-
x1 = 0, and x(i + 1) - x(i) <= 1
-
Compute the lowerbound for the operations, and claim that the lowerbound is achievable. I got stuck at the lowerbound part!!!
-
The observation is that we need to compute # of different consecutive pairs, which is either 1, or X(i) contributed by each entry
Zipkin in the Context of Spring Cloud
-
Such tracing tool is good for latency/performance troubleshooting, and investigating of frequently-occuring bugs. HOWEVER, you still need tradiitonal ELK-ish logging infrastructure to do customer support, mainly because sampling rate is kept low for performance
-
Need to config message format in to follow sleuth’s format. By default it listens to stdout appender
-
The out-of-the-box zipkin jar/docker doesn’t handle service discovery and registration. You need to wrap zipkin-server in your own spring boot app to register it.
-
To disable zipkin via config, need to turn off both sleuth and zipkin
-
Zipkin doesn’t have access control, you probably need to build a reverse proxy for this.
-
Need to create zipkin db/tables before connection. It won’t auto create
-
If you don’t see some trace, make sure the very begining of the trace is recorded,i.e., if the very first receiver of the request is not recorded, the trace won’t appear
Codeforces Notes
959D: Mahmoud and Ehab and another array construction task
hard problem for me!!!
961D: Pair Of Lines
take 3 points, if they are on the same line, keep scanning, and exclude all other points => they should be on the smae line
Otherwise, try one of the 3 possible lines, as the first line
960D: Full Binary Tree Queries
-
order of operations doesn’t matter
-
so we can just, keep track of shifts of two ways at each level
-
every level we move up, we need to calculate current shift - parent level shift, which, happens to be continous, means we can different apply delta as shift, and apply mod ops
962D: Merge Equals
-
Keep track of each number’s occurances
-
from min # to max #, get occurances, sort by the index, and populate the 2 * X, entry with the right value
-
scan through the elements agent and populate the remaining elemnets
Each operation reduces the size of the array by 1, so it is O(nlogn)
963B: Destruction of a Tree
If |V| is even, then it is impossible to win, because |E| = |V| - 1 is odd, and we take out only even # of edges at each step.
Claim: if | V | is odd, then we can always do it. |
Proof: By induction, on the size of the tree
-
By induction, we can delete all subtrees with EVEN # of child nodes, and the oddity of the # of nodes with current tree is preserved
-
Current root must have an even degree, because in the non-true root key case, the remaining subtrees each has ODD # of noeds, so the total nodes must be even
-
Now we delete the root, and can induct on the full tree with ODD # of nodes case
CS Academy Notes
Round 74: Two Squares
Bruteforce search on the TL corner of the first square. And then update all possible choices with squares with TL corner with the chonse square. In the end, pick the remaining one between the one with TL outside the sqaure, and within sqaure
Round 74: Minimum by Xor
Need to print all subsets, but how to prove it?!!!
Round 75: Race Cars
Essentially bsearch on each way, and see which one can be better
Round 75: Electric Cars
Sort by arrival time, and start simulation
-
maintain a list of interrupted cars, current time, current charge, remining to charge
-
for each incoming car, try finish the current car first
-
if not, add the current car to the interrupted list, update current car
-
if yes, keep removing head of the list, and try finishing them, add the remaining back for the last one
-
in the end, clean the current, and all cars in the list
Round 76: Candy Boxes
just keep track of budget on each box, and assign them in sequence
Round 76: Pyramids
-
For each column, we need only the lowest point, and then we add point from left to right, and take out all shadowed points. Note the new point itself can be shadowed. This works because if the right most can’t shadow the new point, the points to the left can’t either!!!
-
To calculate the intersection between pyramids, we can use math or bsearch. Note that if we deduct intersections at the end of each step, we can just compute the end points from left to the right, one by one!!!
Codeforces Notes
954C: Matrix Walk
-
scan through all diffs, they must be either 1, or a delta => which we find y
-
now we can go through the steps again, and deduct x, i.e., just need to ensure that cols match
954D: Fight Against Traffic
Compute all dists from s, and all dist from t
For each pair (i,j), see if (s,i) + (j, t) + 1 or (s,j) +(t, i) + 1 is smaller
955C: Sad powers
Generate all possible answers for P >= 3, and discard all perfect squares!!!
At each query, can get the band range in two bsearchs, and perfect squares can be calculated via math
934D: A Determined Cleanup
Polynomial long division
957D: Riverside Curio
Hard problem for me!!!
-
Note that total marks at each day t(i) = m(i) + d(i) + 1
-
lower bound: t(i) >= max(t(i-1), m(i) + 1)
-
Each day leaves at most 1 mark => t(i) >= t(j) - (j - i)
Claim: we will try finding a solution that satifies all 3 conditions, then we have a valid sequence of actions, and this solution is optimal
Proof: Going backwards, we reduce the current number by one - try lower the upper bound as much as we can, even to the point of too low, and lowerbound it with the current m(i) + 1 to satify condition 2.
After that we just run forward again to pad the lower bound we reduced too much in the first pass. This simulation process also ensured that the solution is valid
On OAuth and Open ID Connect
Terms
-
Grants: the mode/scenario/method a client acquires the access token, as per offical OAuthSpec
-
Resource owner: the entity that can grant access to the resource
-
Resource server: the server hosting the resource
-
Client: the entity that wants to access the resource. Client will need authorization from resource owner.
-
Authorization Server: the entity trusted by the resource server. AS will issue access token to client, once resource owner authenticates and authorizes client’s request. Often called identify provider
-
Scope: the permission/access client wants to resources
Notes
-
OAuth is an AUTHORIZATION protocol but NOT an authentication one. How authentication is undefined, but normally uses Open ID Connect, which builds on top of the OAUTH.
-
OAuth flow is actually very similar to Open ID (NO CONNECT)’s. The key difference being that the former returns the access token, while the latter just a signed claim that client got the the authorization from the user. Hence, Oauth workflow is often called pseudo-oauth-authentication
-
Which OAuth grant to use depends greatly on the context.
a. Client Credential: authorizing machine to access, no need of permission from user, i.e., machine to machine authorization = client is the resource owner. Example: a cron job that uses an API to import to DB, and the application can use the oauth access token to call the API on behalf of ITSELF
b. Otherwise, Authoerzation Code: need permission from user, and client is a web app and runs on a server. This is considered safest, because AT is to the web server hosting the client, without going through user’s web browser
c. Otherwise, Resource Owner Password Credential: Client is trusted with user credentials. Note that this flow is a legacy, and should be used ONLY WHEN ACG is not available
d. Otherwise, ACG if client is a native app
e. Otherwie, client is an SAP: implicit grant.
-
Authorization is mostly role-based instead of ACL-based. Note that there exists a standard DB model for RBAC.
Using JWT in the context of Spring Cloud
Authorization and Role Management in Microservices
The discussions I found give VERY diverted opinions on how to implement authorization in a microservice environment. I will just list my findings and conclusions here
- Most authorization rules I found in a microservice environments are role-based instead of ACL-based. May consider a specific user role/identity service to handle the user-role mapping. However, must be careful because this user-role mapping service can quickly become the kitchen skin of contexts from different domains
- Often, a service has a very domain specifc authorization rules and roles. In this case, such logic and roles, should remain within the service, but the trade off means we need to replicate main user database, or at least access a centralized user-role service
- Most likely you will have some global default roles, e.g., admin, billing admin. These roles should stay with the general global auth service, i.e., when we generate the JWT, we should attach the applicable global roles to the JWT. Moreover, these admin roles should be the only roles that can bootstrap service-specific roles and assign them to the users.
- JWT should have only global role info, because:
- JWT is normally part of header, which is limited to 8KB.
- We don’t want to leak other service’s context.
- This setup suggests that role popluation happens twice, once at the auth service, once at the actual request processing service, and the actual role/permission check is decentralized instead of centralized.
- Favor coarse-grained roles, modelled on how organization works
JWT
- Three parts: header, payload and signature.This means microservices need to know the public key to verify the signature.
- Normally sent with the authorization header as the bearer token,i.e., use the bearer schema
- API gateway will forward the request with JWT to the service, which will in turn will decide if it will grant the resource or not.
- We may choose to let user use a reference token, and API gateway will translate the reference token to the jwt token. Note the reference token can come from an OAuth server. This also suggests that JWT stays within the private network, and only reference token is on user’s browser session
Note that Spring Cloud Security is exclusively for OAuth2 scenarios. Treat it as a complete different thing from Spring Security! Also, Spring Security has a very specific, pre-defined authentication flow, from where you can find names of many auto-wired/injected classes
Common Choices
- Spring cloud security
- keycloak by JBoss
- OpenID-Connect-Java-Spring-Server
- Apereo CAS
Workflow
- Client logs in to acquire access token
- Client sends access token to the API gateway
- Gateway uses auth server to validate token and convert to JWT
- Gateway sends jwt token along with requests to backend microservices
- Every microservice has JWT client to decrypt user context inside the jwt
On the authentication service side, generate JWT and add it as “Autherization: Bearer” token. May consider writing it in an authentication filter that extends RequestHeaderAuthenticationFilter
On the resource service side, to parse the incoming JWT, we have two implementation options
Option 1
- add an AUTHORIZATION filter that extends BasicAuthenticationFilter.
- This filter will parse the token text into a UsernamePasswordAuthenticationToken
- this authentication token will be placed in the context via
SecurityContextHolder.getContext().setAuthentication(authentication)
- We also need to extend WebSecurityConfigurerAdapter. Inside
protected void configure(HttpSecurity http) throws Exception
, we configure authoriazation for each path. Note that we need to explicitly set CORS config here to enable traffic from all source. This class also sets what AuthenticationManager to init/pass-in/config.
Option 2 (less common)
- Add an AUTHENTICATION filter that extends RequestHeaderAuthenticationFilter. This filter will have a reference to the AuthenticationManager, which ,by the design of Spring Security, passes requests through a list of AuthenticationProviders
- This also means we need to extend AuthenticationProvider to parse token and authenticate
- Don’t forget to inject our filter and provider into the context
atCoder Notes
arc093_b: Grid Components
We just a grid with 41 rows, and 100 columns
row 0 - 20, fill with #
row 21 - 41, fill with .
for each additional ., iterate to add to (2i, 2j)
for each additional #, iterate to add to (21 + 2i, 21 + 2j)
agc022_b: GCD Sequence
Overall, a hard problem for me!!!
Note that the constraint is equivalent of saying gcd(a(i), S) != 1
Note that we have to pick about 2/3 of all numbers, so we can define S to be multiple of small primes => S = 6K
So we pick 6K + 2 , 6K + 3, 6K + 4, and 6K, and make sure we pick 2, 3 to force all gcds = 1
Note that N <= 5 case is different from N >= 6 case.
arc095_c: Symmetric Grid
Overall a hard problem for me!!!
-
notice that the order of ops doesn’t matter, so we try all row ops and then col ops
-
When W is odd, we know (W + 1)/ 2 is a palindrome
-
Keep track of unused pairs of cols, and see if any of the reversed strings are the same, and populate assignment
-
However, H! possible row combos are too slow, so we consider only all possible parings of rows 11!! = 11 * 9 * 7 … * 1
On Refactoring
- Win you the trust and respect of your teammates, before attempting any major rewrite
- Project level refactoring should not go in parallel with day-to-day requirments dev. Use a mock refactoring to discover potential depenedency problem, and refactoring them in day-to-day devs. Finally, halt dev for 2-3 days and focus on project refactoring.
- Refactoring rarely-changed code gives little benefit and most likely doesn’t outweigh the cost. Espcially a lot of such legacy codes are hard to maintain and extend.
- Refactor as little as possible (change ONLY what you need), and unit test is basis for refactoring
- starting very coarse-grained end-to-end test, and gradually add more, finer-grained tests. End-to-end, integration, then unit tests, in that order.
- don’t start making changes until you have a suite of tests that can spot behavior changes. There are times when things that appear to be bugs are actually weird features that the business needs to function.
- Focus more on the user scenario than code logic 7. To solve branch conflicts, use feature flag to hide development in progress 8. Open to extension, close to changes, because changing existing code is easy to introduce new bug 9. Refactor code to be extended, let existing unit tests cover that (only change code), and then enhance with TDD (only add code) 10. Refactoring across services is hard. Options:
- Don’t do it - perfectly valid
- One big version change
- Long term ownership can’t be ambigious. Build a temp team to build the third service - Conway’s law
atCoder Notes
arc091_b: Remainder Reminder
Just interate through i * f + K, for all i in [K + 1, N]
arc091_c: LISDL
A >=1, B>= 1, A + B <= N + 1.
Claim: A * B > N Proof: I got stuck here!!!
arc092_a: 2D Plane 2N Points
I used maximal bipartite matching. given N = 100, can be solved with Ford-Fulkerson as network flow problem. Note that this algo has a simplified form for implementation!!!
Note that this approach costs O(V * E), because for each node, we essetinally tranverse all edges in a DFS way, during the shift update with the help of visitor
Official solution
Similar to all pseudo bipartite matching problems, we can just sort one side, try matching head at each time, and take the most restricted matching at the time, so that we leave more possibilites to the remaining ones. Note that translating it to bipartite matching loses too much contextul information, and thus a harder implementation.
Note the core reason the greedy approach works is that suppose the optimal solution didn’t match the current head or matched a different one, we can swap the mapping while maintaining the count
arc092_b: Two Sequences
Consider each bit indepdenently
CS Acedemy Notes
Round 70: Min Distances
the classic shortest path relationship is that minD(a, b) = min (a, c) + min(c, b) over all cs
Note that when we can’t find c that satifies such relationships, either our graph is invalid, or a,b is direct!!!
so we just populate matrix and re-run f-w
Round 71: Binary Differences
Claim: the set must be continous
Proof: suppose we find the max diff, we can keep taking off left and right end and reduce the diff by 0, 1, or 2, as we like
so we can just find the max and min possible k value. Note that for the ease of calc, we can map f(0) = 1 and f(1) = -1
Round 71: Russian Dolls
easy to calculate # will gain by reducing to the next level, and keep track of the max # we have seen so far, we stop, when the new # >= seen
Round 72: Beautiful Matrix
only 4 possible cases
-
swaping top row
-
swapping bottom row
-
swapping left col
-
swapping right col
Round 72: Spring Cleaning
So the graph is a cycle or sunny graph. Easy to see we can clean all but 1 node. So just DFS and print it in reverse order except the last one
Round 73: Binary Isomorphism
Hard problem for me!!!
Codeforces Notes
931D: Peculiar apple-tree
Claim: if a level has an odd # of apples, it will contribute an apple in the end!!!
Proof: by induction on levels, when we move the apple to the level above, the cancel rule means the oddity is preserved. => at the root level, even means empty, odd means 1 => so all even cases lead to cancel at the root
937D: Sleepy Game
We need to find a path, s.t .
-
even length
-
the final node has no outgoing edges
So we assign a node into two cases: arriving at the node, with the oddity of the path traveled so far, and build a graph!!!
So we just do a DFS, and try find one such paths, and also keep track of existance of cycles, so that we can ensure that we can keep at least a draw
940D: Alena And The Heater
for each new digit we processed, we can see that
-
11110 case, up can push down ub of r
-
00001 case, we can inc ub of l
-
11111 case, we can increase lb of r
-
00000 case, we can dec up of l
-
if the last 4 digits are not 0000 or 1111, we can not infer anything
In the end, we take ub of r and lb of l, see if they intersects
946D: Timetable
Official answer uses naive dp, although i doubt it will pass the run time check
950C: Zebras
Simple greedy solution would work
950D: A Leapfrog in the Array
Note that all shift happens between even indices, while start with an odd index
Let’s conside the moment where an odd inde is first “merged” with the band to the right
-
we know for sure the length of the band to the right
-
for the odd indices to the right, they haven’t moved yet
The key insight I missed is that, at each step, the move size reduced, and conversely, increased by itself!!!
Proof: in the jump, we go from p + (n - p/2) to p, i.e. the jump length is n - p/2,
so first jump length is 2 * n - from, to 2 * (from - n), the second jump length becomes 4 * n - 2 * from, we can see that the jump length doubles at each step!!!
So that for every even index, we calculate the jump length by reverce, until the jump length is 1 => that is the first “merge” we made, half that is the answer
948D: Perfect Security
Use trie to optimize the O(n^2) solution
Vault DR Drills
Drill 1: Couple of vault instances down
-
log into each downed instance, unseal the vault
-
Make sure we can still read the values from vault from command line/curl
-
Check consul and make sure the vault instance back online
Drill 2: The whole vault cluster is down and unable to restart
-
retrieve the last stable AMI, or rerun packer to generate one
-
Use tf to generate the new vault ASG, but with the same consul tags as the prvevious one
-
login into each instance in the new cluster, unseal the instance, and verify that the instance is healthy on consul
-
use tf to destroy the old vault cluster’s ASG
Drill 3: Restore vault data stored on consul
-
take a snapshot of current consul cluster
-
create a new consul cluster with new tags
-
restore the consul cluster from the snapshot, and deploy a new vault cluster on top of the new consul cluster
-
unseal the vault with original keys and verify that read is successful
Internal Services on HTTPS
HTTPS requires an SSL/TLS certificate, which in turn requires a domain name for your internal service.
Two options
-
use a .local domain, e.g., our vault is on vault.service.local. Note that this domain is a reserved TLD by the internet standard
-
use a private subdomain in a public domain, e.g., vault.internal.mycompany.com.
Based on my research, there is no conclusion on which way is preferred. The major benefit of the second appraoch is that you can sign the certifcate with the public ca, whereas in the first case, you will need your own CA to sign it
In either case, this means you need to send up DNS multi-cast, e.g., via dnsmasq and internal DNS resolver, so that domains with these specifal prefixes will be resolved by your company’s internal DNS.
If you use the first apprach, make sure add ca.cert to SSL, NSS (which curl uses), and java keytool.
Another catch is when you want to place a (most likely internal) ELB in front of your internal service. The ELB needs to be a TCP ELB, so that it won’t try terminating the SSL connection