817D: Imbalanced Array
Why I didn’t solve it
I got the idea, but I had problem handling the tiebreaker case!!!, i.e., what if within the same segment, there are multiple minival values, which one to pick?
To handle the tie breaker, we just calculate for each value, how many values this value is the LEFTMOST smallest value, i.e., to handle tie breaker, we introduce a secondary condition that helps us identify.
Similarly, for the rightside, we just use a silghtly different procecure to calculate the first number to the right that is greater than the current one
Codeforces Notes
703D: Mishka and Interesting sum (!!!)
Insights
1. answer to query = XOR of all elements XOR distinct elements
2. To answer XOR of distinct elements => we need to solve "find # of distinct values in a segment", need to help of Fenwicks Tree
3. Need to process queries in sorted by ending index way, so that we update each entry in the segment tree only once
4. Final runtime is O(nlogn)
144D: Missile Silos
A modification of dijkstra’s algorithm, try adding to the answer when we are expanding
679B: Bear and Tower of Cubes(!!!)
Consider the upper bound m, the higher upper bound the better result is
Assume the first block we pick is max a s.t. a^3 <= m
Then answer is 1 + num(m - a^3)
If the first block we pick is (a-1), the then answer is 1 + num(a^3 - 1 - (a-1)^3)
by the same logic, we can see the only possible choice that yield the best solution is either block with side a or a-1, because if we pick a-2, the answer will be 1 + num((a-1)^3 - (a -2)^3), which can not be better do the greedy propety of the upper bound m
During implementation, when we are calcuate n-1 cases, we need to take the cube too, because otherwise, the two branch will remain on the same magnitute => linear solution!!!After taking the max-1 cube in the same step, the second branch shrinks quickly and will not pose problem to us
Educational Codeforces Round 22
813B: The Golden Age
Need to take care of long long overflow. I use a log based solution to limit the max value for a and b. The official solution uses
while x <= L/ a
x *= a;
and calculate the powers inside iteration
813D: Two Melodies(!!!)
Since we care only the last entry, we will keep track of dp[x][y] => last index of the each list
Obviously, we can enforce x > y due to symmetric rule,
The key is that to avoid intersecitons, we will update dp[x][y] only with dp[i][y], with i != y and i < x, because based on our dp definitely, we know that dp[i][y] gives no interseciton, and x has not been used by sequence ending at either i or y.
Conversely, if we update dp with dp[x][i], we can not guarantees that y has not been used by sequence ending at x yet.
Above proves that our scheme is sound. It is also complete because we cover all possible cases with a given y.
Another case to watch out for is the empty melody case, i.e., we need to decide when/where to start a new melody. To fix it, we change index from 0-based to 1-based, and mark dp[0][y] as the case with a single melody
then we start y from 0 to n, and then scan x from y + 1 to n need to know max of dp[k][y] s.t. a[i] % 7 = a[k] % 7 need to know max of dp[k][y] s.t. a[i] - 1 = a[k]
so 4 cases to consider when we update dp[i][y]
Codeforces Notes
121B: Lucky Transformation
Note that when we are a 447 or 477 at the starting at the odd position, we run into an infinite case. Therefore, we just need to detect such infinite case.
61D: Eternal Victory
Notice it is a connected graph with n-1 edges, so it is a tree So we have 2 cases:
- cover all nodes and then return to the root
- cover all nodes and without returning to the node
so we can just DP on these 2 cases
487B: Strip
when we start from the left, and expanding, when we first reach the case where max - min > s, we will have to break current strip into two. To maintain max flexibilty for later choices, and satifies the min length requirement, we will just keep the last violating one
arc075E: Meaningful Mean
Why I did not solve it
-
I did realize that the problem can be transformed into, given number a[i], how many numbers a[j] < a[i] while j < i
-
I also realize that we can use some sort of segment-tree DS to solve the problem above. But I didn’t have enough experience with such DS
Official answer
1. compress the values to 0 and N
2. Use a fenwick tree, here we can calculate prefix sum, in this case, total number of elements < compress value, quickly with update supported, and add answer to the final answer
Codeforces Notes
319B: Psychos in a Line (!!!)
This problem is not easy even though it appears so.
Idea 1
Observe that we definitely want to know something about the number that
1. j < i
2. a[j] > a[i]
3. j is the biggest about all that satifies 1 and 2
if a[i] < a[i - 1], then time[i] = 0;
if a[i] > a[i - 1], then
1. if time[i - 1] = -1, i.e., always alive, then time[i] = -1;
2. if time[i - 1] >= 0, and there exists j that satifies the conditions we mentioned above, time[i] = max(time [(j...i)] + 1. Note that this upper bound is achievable and thus is the optimal answer
3. Otherwise, time [i - 1] = -1, it is always alive;
To calculate j, we can use a stack to keep track of numbers, and popping until we reach the one that is greater than the current one, and then add the i onto the stack
Note that number of steps = time to kill + 1
Also note that, the answer for case 3 2 1 is 1 instead of 0!
Idea 2
Consider a brute force O(n^2) solution
1. For each turn, we maintin the state of the array, and we add new item one by one to all turns
2. If end of the list > current to add, we mark current dead at turn i + 1, and add it to all at turn 0 to i
3. The final answer is the max of all kill times
Now we need to optimize it to O(n)
1. For space, obviously, we need to keep track of only the right most entry
2. For time, instead of iterating all entries, we just put it on the stack, and remember them top to bottom
3. each interval got put on stack at most once.
359D: Pair of Numbers(!!!)
My idea
Assume they are all distinct Suppose we have one such collection, it must be of V shape in terms of values
So we start with each local minimal points, and try expanding, because of that v shape-thing, we know we cover each point at most once So we try calculating longest right-expanding sequence, and then reverse and calculate the longest left-expanding sequence, and then sum it up
Note the corner case where all values are same, we will have multiple entries
Official solution
Binarly search on the value of r-l, use a data structure to answer GCD(l,r) and min(l, r),e.g., a segment tree-ish DS
358D: Dima and Hares(!!!)
Brute force DP seems to be O(n^3). How can we improve that?
Notice that we have a lot of overcalculation here: place 1 and then 3 is same as placing 3 and then 1,i.e., we need to find a different angle.
Consider the leftmost entry. We have two choices
1. Fill 0 with 1 is empty => equivalent of totalCost(1...n, leftFilled) + cost(0, single)
2. Fill 0 while 1 is filled already => equiavlent of totalCost(1..n, leftNotFiled) + cost(0, double)
For easier coding, we can reason the same thing but start with the rightmost
Codeforces Notes
333B: Chips(!!!)
Assume there is no obstables, what is the maximal number we can add?
-
Because of rule 2 and 3, we can not add chips to both ends
-
Because of rule 2, given i, we can add to either row i or col i, but not both
-
However, if we add to (i, 1), we can also add to (n, i), (n - i + 1, n), (1, n - i + 1)
-
Therefore, the maximal possible is around 2 * n. Note that if n is even, the middle row is the special case.
-
Also, need to try all from 2 to n -1, but need to consider the case where previously added chips blocks later additions!!!
115B: Lawnmower(!!!)
However, I had problem implementing the solution quickly and cleanly. The reason being my greedy algorithm is not concise enough.
Suppose we’re on a row, facing right. This strategy say that we need to move to the right as long as there is a weed to the right of us either on this row or on the row directly below us.
Note that this insight can handle the empty row cleanly. Also, we need to track of the last row to calculate how many move down operations we need
34D: Road Map
The old map represents a parent-child relationship easily. Therefore, we can construct the graph and do a simple tree traversal to update the new map
367B: Sereja ans Anagrams
Notice that p is fixed. We can just group entries by ps, and then do a slide window scan to match all possible starting indices
776D: The Door Problem
My WA approach(!!!)
Suppose there exists a solution, what property it has?
-
If the door is closed : then t(d1) + t(d2) is even, i.e., they have the same oddity
-
If the door is open: then t(d1) + t(d2) is odd, they have the same oddity
Note that each switch has only 2 choices, even or odd, any numbers can be reduced to this choice.
Conversely, if we can find an arrangement that satifies such odditiy requirment, then we know in the end all doors are open => i.e., we are good, and this condition is strong enough.
So what we can do
- In case 1, we union them
- For case 2, we union all it opponents
- In the end, we scan through all contradictions again, to make sure they are not unioned, i.e., no conflict
Official Approach
Since we care only the oddity, this means we can reduce to toggles to 2 cases, either 0 or 1 Model switch as nodes and door as edges, try coloring node from switch 1, and see if we can introduce any conflicts
Note that implementation wise, it is OK to have duplicates edges in our DS.Although need to consider the case of multiple components - I missed that case!!!
Codeforces Notes
799D: Field expansion
I tried a greedy approach, but couldn’t prove it. This is often the sign a greedy approach won’t work.
Insight
-
Obviously, we only care highest factors
-
The key insight I missed is that given the input is only 10^6, number of factors <= 34, i.e., we can do a brute force dp approach!!!
maxV[i][h] = max w value when we are at factor i, with h value h
max[i][h] = MAX(max[i-1][h] * f[i], max[i-1][h/f[i]])
when h, w is greater than the bound, we will use a dummy value for that!
140D: New Year Contest
Obviously, we pick the quickest problems, as long as total sum <= 720
Claim: we should solve problems in the order of length, that gives the best solution
Proof: consider the the last different choice. If the finish the time is before midnight, we can just make changes, and won’t affect the overall result. But if the finish time of that block is after midnight, let us move the longest unassigned one after the different choice
penalt after <= oldPentalty - min(diffFinishTime, longestLen) + 0, + 0 because the end time is the same => i.e., swapping such means optimal solution
229B: Planets (!!!)
Idea similar to dijkstra, the only different is that upon expanding, see if we need to wait
I tried re-inserting entry but got TLE, so we just wait until we can expand to next edge.
The speical case is when we are at n already
Codeforces Notes
811C: Vladik and Memorable Trip
Note the English transaltion of the problem is a bit misleading: the problem actually means for each city’s traveller, either all on the same segment, or none selected at all, i.e., impossible to have case where some people are going to city x while some won’t
This means that we will have the cascading effect, i.e, we need to merge all overlapping [l, r] segments until there is no overlapping one, if we ever want to pick one city in the overlapping segments
after that we can just do a brute force DP, for each r that is an end, look for all start l, can calculate the XOR, and update the DP result.
808D: Array Division
since all numbers are positive, we can calculate the prefix sum, and then for each number, we look for (half of total) + that number , see if there exists one such prefix sum.
807D: Dynamic Problem Scoring(!!!)
What I did wrong
I was trying to brute force search on the final band of problems, but realize that one new account will affect the band of ALL past problems bands!!!
Insights
-
From a new account, implementation wise, we don’t need to submit wrong answers at all, since only the total account number matters
-
For each problem, we either need increase its good rate or lower good rate
-
Consider the case when we only need to increase the good rate. To minimize number of accounts created, each account should submit right to as many problems as possible, so as to improve the pass rate of multiple questions, e.g., we can do better with one account submit right to two problems than two accounts each submit to one??
-
When we need to lower the good rate, the account just do nothing at that problem, and is optimal
-
Therefore, we have an optimal scheme that satifies 3 and 4 at the same time, the key insight being because of 2, the two cases don’t interfere with each other
-
Note that we can not binary search on answers, because we can not pile on new account when there is unsolved problems => which will only lower the rate of the problems we want to increase
-
Worse case, we just need to increase n by 32 times, to dilute all problems to 500 or make it 3000, so its 32 * 5 * 120
794C: Naming Company
Consider the final string will have 2/n chars from each person. Obviously, Oleg will take the smallest n/2 chars, and Igor biggest n/2 chars from their own sets, since we can improve the answer from each’s perspective otherwise.
When O is playing
if current O’s smallest < I’s largest, O will need to fill the left most unoccupied with its own smallest, otherwise I will fill it with largest => not optimal
if current O’s smallest >= I’s largest, O will need to fill the right most unoccupied with O’s largest
When I is playing
if current O’s smallest < I’s largest, I’s largest goes left most
if current O’s smallest >= I’s largest, I’s smallest goes to right most
What I did wrong
The tie breaker case, the player should not occupy the the leftmost position with their own char, instead, they should try to force the opponent to take that position, while preserve their best candidate for the next round!!!
809B: Glad to see you!
Why I didn’t solve it
I realizee that I can use bsearch and detect the direction of points by asking (i, i+1). However, I got stuck at handling the answer to my query => i.e., what does the answer mean?
Insights
-
(mid, mid+1) returns true => there must exist one point to the left of mid, including mid
-
returns false => there must exist one point to the right of mid+1, include mid + 1
-
So we can just do a bsearch, and we know there must exist an point in our range, and we can definitely find one => when r - l = 1
-
Now that we repeat similar bsearch in [1, mid) and (mid + 1, n], we know that at least one points in the intervals for sure
-
Note that to root out false positvies, we need to specifically check the second point’s validity by issuing another query!!!
-
Need to handle the case of first return is 1 or n with special care!!!
arc073c: Ball Coloring
Insights
-
Obviously, the global max and min value will appear in the final formula.
-
When global min is red, we need to paint all smaller value in each bag red to minize red max, this also anchors blue max to the global max and max the blue min => since it is the bigger of each bag already => i.e., it must be optimal
-
From now on, we try checking all locally optimal solutions, classified the rank of red min, and the globally optimal solution would be the best of the locally optimal solutions.
-
When the globally second smallest value is in the same bag as globally min value, we must paint global min to blue, which is exactly same case as 2)
-
When the globally second smallest value is not in the same bag as the globaly min value, the optimal solution is exactly same as case 2, except that for the globally min pair, we paint global min as blue, and its counter part is red
-
Therefore, we can check all locally optimal points one by one, by doing such flipping, and increase the possible rmin values. Notice that this impliies both the max value and min value will be blue in these classes
This turned out to be a very hard problem for me to solve. The second try gives WA too!!!
793C: Mice problem
What I did wrong
-
since the given precision is 1e-6, the epsilon should be 1e-12, because of the base offraction arithmetic is (1e-6)^2
-
To calculate the range of valid time, i calculate the segment as a whole, instead of separating them into two dimesnions. This introduced additional float point calculation and introduced errors!!!
Idea
1. Mouses in ranges <=> each mouse is in range
2. Mouse in range <=> x is in x-axis range and y is in y-axis range. So we just need to search for a time range that satifies such condition.
3. Therefore, we just calculate the range it takes to stay in x-axis range and y-axis range, respectively, if there exists a common intersection, we know it is the answer, otherwise, it is impossible
4. When v < 0, we can just mirror the whole thing to the opposite sign => the result is still the same yet saves our coding
Reflections
Seems taht I keep having trouble with problems where we are searching for things that satifies both conditions at the same time. The main difficulty almost always seems that the two conditions interfere with each other.
Therefore, the key step is too process the two conditions in two steps, with each step handling one, without worrying about the interference of the other. Ideas include
1. search for first condition, when we are inside the first condition, we look for the second condition
2. Separate the construction, such that, each part handles one condition, and the final answer is a simple composition of parts, with proof that both parts can be reached at the same time
3. similar to 1, filter by the first conditon, and then filter by the second conditon, the result should should be an intersection of the two parts
As for this problem, we need to concisously transalte the orignal condition to two conditions and then handle the two in two steps. This way is easier to code, and in general often is the only way to solve the problem!
799C: Fountains
When one is C and one is D, simple greedily choose the highest beatuty within limit for each type
When both fountains are of the same type, call them F1, F2, if we search for all F1, we need to answer the query, max beauty for all price < T - P(F1). Therefore, we can build such array while we scan through, moreoever, since maxB(v1) >= maxB(v2) whenever v1 > v2, we can bsearch on the weight or value.
Pseudo code
sort the (p, b) tuple by p
maxB(0) = 0
ans = 0;
for i in 1 to n
maxB(i) = max(maxB(i - 1), PB(i).second)
if(PB(i).first < totalValue)
ans = max(ans, PB(i).second + bsearch(0, i-1))
807C: Success Rate
Why I didn’t solve it
I realize that the direct math insight might be too complicated, so I tried turning it into a search problem. Given the input range, I realize that it could be some sort of binary search, but I couldn’t model it into a clear cut range problem => the ranges are mixed. My later attempt at a math properties does not work either
Idea 1: Binary search
Instead of focusing on p/q, we can introduce a varaible t and binary search on t from 1 to 1e9, i.e., consider the final y value = p * t, we know if t is a good solution, t + 1 must be a good solution too. To test t, we just need to calculate number of new correct submissions in the formula, the result should be within [0, p * t - y]
Idea 2: Maths
based on the insight the final target we reach must be of form p * t / q * t (t >= 1), we know dy = q * t - y, dx = p * t -x, with 0 <= dx <= dy
Therefore, t >= ceiling(x / p), t >= ceil((y - x) /(q - p)). For the formula we can obviously see the two corner cases: p = 0 and q = p
Notes on mutil-dc replication of kafka
Based on https://www.slideshare.net/HadoopSummit/building-largescale-stream-infrastructures-across-multiple-data-centers-with-apache-kafka
Put replicas directly across DCs
-
on failed dc, the consumer will just switch to another replica.
-
when dc recovers, the existing catch up mechanism will kick in
-
If replicas are across region, latency will be high, plus demands a lot of cross dc bandwidth from replicator - hard for Kafka
Active-Passive
-
consumer on either active or passive dc
-
upon failure, the passive dc becomes the new active, consumer may need to switch too
-
note that offset may not match, because of at-least-once semantics of producer
-
normally for real time consumers, just start from the end and accept data loss
-
When the dc comes back, need to MM back changes from the new active to the former active - hard to manage the DC offset problem again.
Active-Active
-
each one has an active and an aggregate cluster
-
On failure and recovery, no need to reconfigure MM
-
To avoid aggregate cluster, we can preifx topics with DC tag,and confg MM to mirror remote topic only. and consumer need to sub to topics with both DC tags
How to make DB data available in all DCs
- active-active: same consumer concurrently in both DCs
- active-passive: only one consumer per dc at any given time
Ideally DB replication policy should be same as kafka cluster’s
Code Jam 2017 1C
What I did right
1. I recognize that to advance to round 2, I have to solve B-large
2. When seeing a cycle, I realize that I should add the first item in the list to the end of list again, so that all segments are covered
3. I see the greedy insights
What I did wrong
I was not able to prove the greedy insights!!!
Proof
There are 3 types of intervals, a C-J interval, a J-J interval, and C-C interval. Note that our target function is the sum of switches introduced in each intervals.
Obviously, for each C-J interval, we increase the final answer by one
For each J-J interval, we don’t have to increase final answer at all,if we can plug in the gap with the same interval when we have enough J time left. Otherwise, we will have to increase the number of swtiches by 2. Similar argument applies to C-C interval.
Therefore, our greedy solution can give the best value in each category at the SAME time (This is what I couldn’t convince myself!!!) => our overall solution is optimal.
Proof: When we have filled maximum number of J-J interval, but still some J-J interval there. Since each miunte is covered, we know C-C interval doesn’t contribute any to the final result, and we have miniized J-J’s contributions
When we have filled all J-J interval, with more Js remaining, it is the exactly same case as above, except
Now in retrospect, for greedy problems during contest, I should probably go directly for solution instead of spending all time trying to prove it.
KIP-98: Exactly once processing
Sequence of actions
-
producer ask the broker to discover TC
-
producer ask the TC for a producer id (PID), and bump the PID version number/epoch to avoid zombie PID => synchronous Note: this PID/epoch is to identify the producer, so that TC can keep track of the current sequence number from the producer => necessary to implement idempotent semantics
-
rollback or forward any transactions left incomplete by the previous instance of the producer => synchrounous
-
Producer marks locally the transaciton has begun, and for TC it doesn’t begin until the first record is sent
-
producer add topicPartiion to the transaction on TC, so that we can mark commit or abort marker on each transaction Like most states in Kafka, TC state is backed by a log
-
producer produce to broker with PID, epoch, and sequence number Here broker will accept only if the sequence number = current stored + 1, if seqence # > current stored + 1, that makes fatal error. We need the PID so that later consumer can decide if messages from that PID can be materialized.
-
producer send Map<TopicPartitions, OffsetAndMetadata> and a groupId to TC of course this will be logged in the transaction log
-
producer will send a TxnOffsetCommitRequest to the consumer coordinator to persist the offsets in the __consumer-offsets topic
-
producer issues an EndTxnRequest to TC, with additional data indicating whether the transaction is to be committed or aborted
-
TC writes a PREPARE_COMMIT or PREPARE_ABORT message to the transaction log. Notice here the steps are similar to commit step in 1PC
-
TC begins the process of writing the command messages known as COMMIT (or ABORT) markers to the leader of each TopicPartition which is part of the transcation
-
Each broker will write a COMMIT(PID) or ABORT(PID) to the log, this message shows that whether the messages with the given PID must be delivered to the user or dropped. Cosumer will buffer messages which have a PID until it reads a corresponding COMMIT or ABORT message until it sees such message. Speical case: __consumer-offsets being one of the Topic Partitions in the transaction
13.After all the commit or abort markers are written the data logs, the TC writes the final COMMITTED or ABORTED message to the transaction log,
Code Jam 2017 1A
A: Alphabet Cake
The key mistake I made during contest is that I didn’t notice the condition “no initial appears more than once on the cake.”!!!! Idea 1: Recursion ———-
1. If there is only 1 letter, we can fill the whole square with that letter
2. If there is more than 1 letter, we can make a cut repeatedly until we reach that base case, but how to make a cut?
Idea 2: Greedy
1. for each row, expand to all existing letters to its right
2. for each row, expand to all existing letters to its left
3. the remaining rows must be completely emtpy, we just run two copy passes to ensure they are copied, one from top to bottom, and then one
from bottom to top
B: Ratatouille
The intuition is easy, but how to prove it?!!! Goal: we try to find the optimal solution which also has the minimum number of total servings
So we just sort the ranges by upper bound, and start with the the smallest possible servings, if we can form a kit, go, otherwise, discard all ranges that is not enough to make a kit
Proof: Suppose we have an optimal solution s(1)…s(2)…s(n)..s(n+1) and our solution is o(1)….o(n), while sum of s is smallest among all possible optimal solutions, and s picks lowest upper bound whenever possble for each ingredient
-
By induction, we can prove that, when s(1)…s(i) = o(1)…o(n), we pick the same solution
-
consider the first s(i) s.t. s(i) < o(i), this means when we are trying forming s(i), some of the ranges are picked by previous choices already, but it is not picked in S before i => contradiciton
-
consider the case s(n+1) does not exist, the argument is similar
Codeforces Notes
432D: Prefixes and Suffixes
-
If we have a prefix that matches a suffix, then we have z[i] = n - i
-
To know how many times prefix of length n - i appeared in the substring, we just need to look for number of indecies s.t. z[index] >= n - i
-
Therefore, we first run z, and then calculate the prefix sum of the inverse array, the answer will be ps[n] - ps[n - i - 1]
-
Note that in z-algorithm, z[0] = 0, in this case , we need to hard set z[0] = n, so that we don’t ignore the whole string case
535D: Tavas and Malekas
-
If there is no overlap between the end of previous match and start of the next match, we can fill in whatever we want, because what we get is a subsequence, so extra match is OK
-
If there is overlap between the indices, we need to make sure for the overlapped region (next start, previous end), the prefix is same as suffix, Otherwise, the result is invalid
772D: Volatile Kite
During the contest, I did see the triangle case is enough to infer the insight, and moving the 3 points in a certain direction. But instead of moving them to remove an angle, I tried let 2 points meet first, which proved to be too greedy
If we just want to reduce an angle to a straight line, once we draw the triangle case, it becomes obvious the radius should be half the distance between B and AC, where A, B, C are in clockwise order