Codeforces notes

2017-07-18 00:00:00 +0000

828C: String Reconstruction(!!!)

A brute force way will timeout. What can we do to remove duplicate operations?

My idea

So we need to track of continuous intervals, and skip them if it is already filled.

We can keep each segment’s previous empty and next empty index, with path compression to help run time. However, this gives TLE?!

Official solution

Sort strings by index, and keep track of the index before which all has been seen. At each step, either increase index and fill partial strings, or ignore the current string because it is within the processed prefix

831C: Jury Marks

Calculate prefix sum of scores, and sort both deltas. We know the min value of B must match [0, k - n] of the delta array, so we can just brute froce on each potential match to see if it is feasible to backtrack, calculate the backtrack and add the calculated result to the result set

Note that once we fix ONE mapping between b(i) and a(j), the whole sequence becomes deterministic, i.e., the answer <= k - n + 1

Weekend Contests

2017-06-26 00:00:00 +0000

agc076b: Built?

A naive MST approach is too slow, namely, there are too many potential edges. Now that we are interested in an optimal solution, can we use some greedy insight to eliminate edges that are not actually required? This is where I got stuck!!!

What I should have done is to experiment simpler cases, and see if there is any edge that I know will never appear in the final solution.

Consider 3 points, a, b, c, with xa < xb < xc, we know the edge from xa to xc will never appear in the final solution, because we can just keep xa-xb, and xb-xc in the final MST anyway. Therefore, we just need to the add the edge between its immediate neighbors, and reduce number of edges to O(n)

821D: Okabe and City

Easy to see that we can model it as a shortest path problem. But the problem is how to reduce the number of edges in the naive approach. Similar to the atCoder problem, I am stuck proving/reducing number of edges!!!

Insights

  1. In the final solution, we know we will light each row and column at most once. Otherwise, there is cycle in our path

  2. This means we will visit each node in 3 ways ``` a. by adjacent cells

b. by lighting the row above or below or same

c. by lighting the col above or below or same

```

  1. Therefore, during BFS traversal, we need to update all 3 cases, because of insight 1), in case b and c , each cell gets updates once max each (If we use a pq to dequeue next to visit, we know the first time we light the row/column we absolutely need it, and this by 1, is the first and only time we light it in the solution) . Overall cost linear to k

  2. To detect if we can reach the destination, we need to check if the node itself is reachable, or anything from the last and second last row/col is reachable!!!

agc016c: +/- Rectangle

2017-06-22 00:00:00 +0000

Key insight I missed!!!

If sum of any h by w rectangle is negative, how come the sum as a whole is positve?

Consider the 1 row case, it becomes clear that if such thing can happen only if W % w != 0. Otherwise, the sum of the whole row can be broken down the the sum of W/w non-intersecting subrows => but the sum of all these negative numbers can not be postivie.

By extension, in H by W case, as long as we can’t divide cleanly into h by w squares, we can fulfill such need. We can prove by constructing one solution for all such cases.

Construction

We will assume sum of each h by w rectangel is -1. Then total number of complete squares ncs= W/w * H/h. Number of padding cells npc = H * W - ncs * h * w.

So we can fill each cell by value v = ncs/npc + 1, when i % h != h-1 || j % w != w- 1 Otherwise, we fill the cell by value -v * (h * w - 1) - 1

817D: Imbalanced Array

2017-06-15 00:00:00 +0000

Why I didn’t solve it

I got the idea, but I had problem handling the tiebreaker case!!!, i.e., what if within the same segment, there are multiple minival values, which one to pick?

To handle the tie breaker, we just calculate for each value, how many values this value is the LEFTMOST smallest value, i.e., to handle tie breaker, we introduce a secondary condition that helps us identify.

Similarly, for the rightside, we just use a silghtly different procecure to calculate the first number to the right that is greater than the current one

Codeforces Notes

2017-06-07 00:00:00 +0000

703D: Mishka and Interesting sum (!!!)

Insights

1. answer to query = XOR of all elements XOR distinct elements

2. To answer XOR of distinct elements => we need to solve "find # of distinct values in a segment", need to help of Fenwicks Tree

3. Need to process queries in sorted by ending index way, so that we update each entry in the segment tree only once

4. Final runtime is O(nlogn)

144D: Missile Silos

A modification of dijkstra’s algorithm, try adding to the answer when we are expanding

679B: Bear and Tower of Cubes(!!!)

Consider the upper bound m, the higher upper bound the better result is

Assume the first block we pick is max a s.t. a^3 <= m

Then answer is 1 + num(m - a^3)

If the first block we pick is (a-1), the then answer is 1 + num(a^3 - 1 - (a-1)^3)

by the same logic, we can see the only possible choice that yield the best solution is either block with side a or a-1, because if we pick a-2, the answer will be 1 + num((a-1)^3 - (a -2)^3), which can not be better do the greedy propety of the upper bound m

During implementation, when we are calcuate n-1 cases, we need to take the cube too, because otherwise, the two branch will remain on the same magnitute => linear solution!!!After taking the max-1 cube in the same step, the second branch shrinks quickly and will not pose problem to us

Educational Codeforces Round 22

2017-06-06 00:00:00 +0000

813B: The Golden Age

Need to take care of long long overflow. I use a log based solution to limit the max value for a and b. The official solution uses

while x <= L/ a
	x *= a;

and calculate the powers inside iteration

813D: Two Melodies(!!!)

Since we care only the last entry, we will keep track of dp[x][y] => last index of the each list

Obviously, we can enforce x > y due to symmetric rule,

The key is that to avoid intersecitons, we will update dp[x][y] only with dp[i][y], with i != y and i < x, because based on our dp definitely, we know that dp[i][y] gives no interseciton, and x has not been used by sequence ending at either i or y.

Conversely, if we update dp with dp[x][i], we can not guarantees that y has not been used by sequence ending at x yet.

Above proves that our scheme is sound. It is also complete because we cover all possible cases with a given y.

Another case to watch out for is the empty melody case, i.e., we need to decide when/where to start a new melody. To fix it, we change index from 0-based to 1-based, and mark dp[0][y] as the case with a single melody

then we start y from 0 to n, and then scan x from y + 1 to n need to know max of dp[k][y] s.t. a[i] % 7 = a[k] % 7 need to know max of dp[k][y] s.t. a[i] - 1 = a[k]

so 4 cases to consider when we update dp[i][y]

Codeforces Notes

2017-06-04 00:00:00 +0000

121B: Lucky Transformation

Note that when we are a 447 or 477 at the starting at the odd position, we run into an infinite case. Therefore, we just need to detect such infinite case.

61D: Eternal Victory

Notice it is a connected graph with n-1 edges, so it is a tree So we have 2 cases:

  1. cover all nodes and then return to the root
  2. cover all nodes and without returning to the node

so we can just DP on these 2 cases

487B: Strip

when we start from the left, and expanding, when we first reach the case where max - min > s, we will have to break current strip into two. To maintain max flexibilty for later choices, and satifies the min length requirement, we will just keep the last violating one

arc075E: Meaningful Mean

2017-06-03 00:00:00 +0000

Why I did not solve it

  1. I did realize that the problem can be transformed into, given number a[i], how many numbers a[j] < a[i] while j < i

  2. I also realize that we can use some sort of segment-tree DS to solve the problem above. But I didn’t have enough experience with such DS

Official answer

1. compress the values to 0 and N 

2. Use a fenwick tree, here we can calculate prefix sum, in this case, total number of elements < compress value,  quickly with update supported, and add answer to the final answer 

Codeforces Notes

2017-06-02 00:00:00 +0000

319B: Psychos in a Line (!!!)

This problem is not easy even though it appears so.

Idea 1

Observe that we definitely want to know something about the number that

1. j < i

2. a[j] > a[i]

3. j is the biggest about all that satifies 1 and 2

if a[i] < a[i - 1], then time[i] = 0;

if a[i] > a[i - 1], then

1. if time[i - 1] = -1, i.e., always alive, then time[i] = -1; 

2. if time[i - 1] >= 0, and there exists j that satifies the conditions we mentioned above,  time[i] = max(time [(j...i)] + 1. Note that this upper bound is achievable and thus is the optimal answer

3. Otherwise, time [i - 1] = -1, it is always alive;

To calculate j, we can use a stack to keep track of numbers, and popping until we reach the one that is greater than the current one, and then add the i onto the stack

Note that number of steps = time to kill + 1

Also note that, the answer for case 3 2 1 is 1 instead of 0!

Idea 2

Consider a brute force O(n^2) solution

1. For each turn, we maintin the state of the array, and we add new item one by one to all turns

2. If end of the list > current to add, we mark current dead at turn i + 1, and add it to all at turn 0 to i 

3. The final answer is the max of all kill times

Now we need to optimize it to O(n)

1. For space, obviously, we need to keep track of only the right most entry

2. For time, instead of iterating all entries, we just put it on the stack, and remember them top to bottom 

3. each interval got put on stack at most once.

359D: Pair of Numbers(!!!)

My idea

Assume they are all distinct Suppose we have one such collection, it must be of V shape in terms of values

So we start with each local minimal points, and try expanding, because of that v shape-thing, we know we cover each point at most once So we try calculating longest right-expanding sequence, and then reverse and calculate the longest left-expanding sequence, and then sum it up

Note the corner case where all values are same, we will have multiple entries

Official solution

Binarly search on the value of r-l, use a data structure to answer GCD(l,r) and min(l, r),e.g., a segment tree-ish DS

358D: Dima and Hares(!!!)

Brute force DP seems to be O(n^3). How can we improve that?

Notice that we have a lot of overcalculation here: place 1 and then 3 is same as placing 3 and then 1,i.e., we need to find a different angle.

Consider the leftmost entry. We have two choices

1. Fill 0 with 1 is empty => equivalent of totalCost(1...n, leftFilled) + cost(0, single)  
2. Fill 0 while 1 is filled already => equiavlent of totalCost(1..n, leftNotFiled) + cost(0, double)

For easier coding, we can reason the same thing but start with the rightmost

Codeforces Notes

2017-06-01 00:00:00 +0000

333B: Chips(!!!)

Assume there is no obstables, what is the maximal number we can add?

  1. Because of rule 2 and 3, we can not add chips to both ends

  2. Because of rule 2, given i, we can add to either row i or col i, but not both

  3. However, if we add to (i, 1), we can also add to (n, i), (n - i + 1, n), (1, n - i + 1)

  4. Therefore, the maximal possible is around 2 * n. Note that if n is even, the middle row is the special case.

  5. Also, need to try all from 2 to n -1, but need to consider the case where previously added chips blocks later additions!!!

115B: Lawnmower(!!!)

However, I had problem implementing the solution quickly and cleanly. The reason being my greedy algorithm is not concise enough.

Suppose we’re on a row, facing right. This strategy say that we need to move to the right as long as there is a weed to the right of us either on this row or on the row directly below us.

Note that this insight can handle the empty row cleanly. Also, we need to track of the last row to calculate how many move down operations we need

34D: Road Map

The old map represents a parent-child relationship easily. Therefore, we can construct the graph and do a simple tree traversal to update the new map

367B: Sereja ans Anagrams

Notice that p is fixed. We can just group entries by ps, and then do a slide window scan to match all possible starting indices

776D: The Door Problem

My WA approach(!!!)

Suppose there exists a solution, what property it has?

  1. If the door is closed : then t(d1) + t(d2) is even, i.e., they have the same oddity

  2. If the door is open: then t(d1) + t(d2) is odd, they have the same oddity

Note that each switch has only 2 choices, even or odd, any numbers can be reduced to this choice.

Conversely, if we can find an arrangement that satifies such odditiy requirment, then we know in the end all doors are open => i.e., we are good, and this condition is strong enough.

So what we can do

  1. In case 1, we union them
  2. For case 2, we union all it opponents
  3. In the end, we scan through all contradictions again, to make sure they are not unioned, i.e., no conflict

Official Approach

Since we care only the oddity, this means we can reduce to toggles to 2 cases, either 0 or 1 Model switch as nodes and door as edges, try coloring node from switch 1, and see if we can introduce any conflicts

Note that implementation wise, it is OK to have duplicates edges in our DS.Although need to consider the case of multiple components - I missed that case!!!

Codeforces Notes

2017-05-31 00:00:00 +0000

799D: Field expansion

I tried a greedy approach, but couldn’t prove it. This is often the sign a greedy approach won’t work.

Insight

  1. Obviously, we only care highest factors

  2. The key insight I missed is that given the input is only 10^6, number of factors <= 34, i.e., we can do a brute force dp approach!!!

maxV[i][h] = max w value when we are at factor i, with h value h

max[i][h] = MAX(max[i-1][h] * f[i], max[i-1][h/f[i]])

when h, w is greater than the bound, we will use a dummy value for that!

140D: New Year Contest

Obviously, we pick the quickest problems, as long as total sum <= 720

Claim: we should solve problems in the order of length, that gives the best solution

Proof: consider the the last different choice. If the finish the time is before midnight, we can just make changes, and won’t affect the overall result. But if the finish time of that block is after midnight, let us move the longest unassigned one after the different choice

penalt after <= oldPentalty - min(diffFinishTime, longestLen) + 0, + 0 because the end time is the same => i.e., swapping such means optimal solution

229B: Planets (!!!)

Idea similar to dijkstra, the only different is that upon expanding, see if we need to wait

I tried re-inserting entry but got TLE, so we just wait until we can expand to next edge.

The speical case is when we are at n already

Codeforces Notes

2017-05-30 00:00:00 +0000

811C: Vladik and Memorable Trip

Note the English transaltion of the problem is a bit misleading: the problem actually means for each city’s traveller, either all on the same segment, or none selected at all, i.e., impossible to have case where some people are going to city x while some won’t

This means that we will have the cascading effect, i.e, we need to merge all overlapping [l, r] segments until there is no overlapping one, if we ever want to pick one city in the overlapping segments

after that we can just do a brute force DP, for each r that is an end, look for all start l, can calculate the XOR, and update the DP result.

808D: Array Division

since all numbers are positive, we can calculate the prefix sum, and then for each number, we look for (half of total) + that number , see if there exists one such prefix sum.

807D: Dynamic Problem Scoring(!!!)

What I did wrong

I was trying to brute force search on the final band of problems, but realize that one new account will affect the band of ALL past problems bands!!!

Insights

  1. From a new account, implementation wise, we don’t need to submit wrong answers at all, since only the total account number matters

  2. For each problem, we either need increase its good rate or lower good rate

  3. Consider the case when we only need to increase the good rate. To minimize number of accounts created, each account should submit right to as many problems as possible, so as to improve the pass rate of multiple questions, e.g., we can do better with one account submit right to two problems than two accounts each submit to one??

  4. When we need to lower the good rate, the account just do nothing at that problem, and is optimal

  5. Therefore, we have an optimal scheme that satifies 3 and 4 at the same time, the key insight being because of 2, the two cases don’t interfere with each other

  6. Note that we can not binary search on answers, because we can not pile on new account when there is unsolved problems => which will only lower the rate of the problems we want to increase

  7. Worse case, we just need to increase n by 32 times, to dilute all problems to 500 or make it 3000, so its 32 * 5 * 120

794C: Naming Company

2017-05-29 00:00:00 +0000

Consider the final string will have 2/n chars from each person. Obviously, Oleg will take the smallest n/2 chars, and Igor biggest n/2 chars from their own sets, since we can improve the answer from each’s perspective otherwise.

When O is playing

if current O’s smallest < I’s largest, O will need to fill the left most unoccupied with its own smallest, otherwise I will fill it with largest => not optimal

if current O’s smallest >= I’s largest, O will need to fill the right most unoccupied with O’s largest

When I is playing

if current O’s smallest < I’s largest, I’s largest goes left most

if current O’s smallest >= I’s largest, I’s smallest goes to right most

What I did wrong

The tie breaker case, the player should not occupy the the leftmost position with their own char, instead, they should try to force the opponent to take that position, while preserve their best candidate for the next round!!!

809B: Glad to see you!

2017-05-22 00:00:00 +0000

Why I didn’t solve it

I realizee that I can use bsearch and detect the direction of points by asking (i, i+1). However, I got stuck at handling the answer to my query => i.e., what does the answer mean?

Insights

  1. (mid, mid+1) returns true => there must exist one point to the left of mid, including mid

  2. returns false => there must exist one point to the right of mid+1, include mid + 1

  3. So we can just do a bsearch, and we know there must exist an point in our range, and we can definitely find one => when r - l = 1

  4. Now that we repeat similar bsearch in [1, mid) and (mid + 1, n], we know that at least one points in the intervals for sure

  5. Note that to root out false positvies, we need to specifically check the second point’s validity by issuing another query!!!

  6. Need to handle the case of first return is 1 or n with special care!!!

arc073c: Ball Coloring

2017-05-20 00:00:00 +0000

Insights

  1. Obviously, the global max and min value will appear in the final formula.

  2. When global min is red, we need to paint all smaller value in each bag red to minize red max, this also anchors blue max to the global max and max the blue min => since it is the bigger of each bag already => i.e., it must be optimal

  3. From now on, we try checking all locally optimal solutions, classified the rank of red min, and the globally optimal solution would be the best of the locally optimal solutions.

  4. When the globally second smallest value is in the same bag as globally min value, we must paint global min to blue, which is exactly same case as 2)

  5. When the globally second smallest value is not in the same bag as the globaly min value, the optimal solution is exactly same as case 2, except that for the globally min pair, we paint global min as blue, and its counter part is red

  6. Therefore, we can check all locally optimal points one by one, by doing such flipping, and increase the possible rmin values. Notice that this impliies both the max value and min value will be blue in these classes

This turned out to be a very hard problem for me to solve. The second try gives WA too!!!

793C: Mice problem

2017-05-14 00:00:00 +0000

What I did wrong

  1. since the given precision is 1e-6, the epsilon should be 1e-12, because of the base offraction arithmetic is (1e-6)^2

  2. To calculate the range of valid time, i calculate the segment as a whole, instead of separating them into two dimesnions. This introduced additional float point calculation and introduced errors!!!

Idea

1. Mouses in ranges <=> each mouse is in range

2. Mouse in range <=> x is in x-axis range and y is in y-axis range. So we just need to search for a time range that satifies such condition.

3. Therefore, we just calculate the range it takes to stay in x-axis range and y-axis range, respectively, if there exists a common intersection, we know it is the answer, otherwise, it is impossible

4. When v < 0, we can just mirror the whole thing to the opposite sign => the result is still the same yet saves our coding

Reflections

Seems taht I keep having trouble with problems where we are searching for things that satifies both conditions at the same time. The main difficulty almost always seems that the two conditions interfere with each other.

Therefore, the key step is too process the two conditions in two steps, with each step handling one, without worrying about the interference of the other. Ideas include

1. search for first condition, when we are inside the first condition, we look for the second condition

2. Separate the construction, such that, each part handles one condition, and the final answer is a simple composition of parts, with proof that both parts can be reached at the same time

3. similar to 1, filter by the first conditon, and then filter by the second conditon, the result should should be an intersection of the two parts

As for this problem, we need to concisously transalte the orignal condition to two conditions and then handle the two in two steps. This way is easier to code, and in general often is the only way to solve the problem!

799C: Fountains

2017-05-12 00:00:00 +0000

When one is C and one is D, simple greedily choose the highest beatuty within limit for each type

When both fountains are of the same type, call them F1, F2, if we search for all F1, we need to answer the query, max beauty for all price < T - P(F1). Therefore, we can build such array while we scan through, moreoever, since maxB(v1) >= maxB(v2) whenever v1 > v2, we can bsearch on the weight or value.

Pseudo code

sort the (p, b) tuple by p
maxB(0) = 0
ans = 0;

for i in 1 to n
  maxB(i) = max(maxB(i - 1), PB(i).second) 
  if(PB(i).first < totalValue)
    ans = max(ans, PB(i).second + bsearch(0, i-1)) 

807C: Success Rate

2017-05-07 00:00:00 +0000

Why I didn’t solve it

I realize that the direct math insight might be too complicated, so I tried turning it into a search problem. Given the input range, I realize that it could be some sort of binary search, but I couldn’t model it into a clear cut range problem => the ranges are mixed. My later attempt at a math properties does not work either

Instead of focusing on p/q, we can introduce a varaible t and binary search on t from 1 to 1e9, i.e., consider the final y value = p * t, we know if t is a good solution, t + 1 must be a good solution too. To test t, we just need to calculate number of new correct submissions in the formula, the result should be within [0, p * t - y]

Idea 2: Maths

based on the insight the final target we reach must be of form p * t / q * t (t >= 1), we know dy = q * t - y, dx = p * t -x, with 0 <= dx <= dy

Therefore, t >= ceiling(x / p), t >= ceil((y - x) /(q - p)). For the formula we can obviously see the two corner cases: p = 0 and q = p

Notes on mutil-dc replication of kafka

2017-05-01 00:00:00 +0000

Based on https://www.slideshare.net/HadoopSummit/building-largescale-stream-infrastructures-across-multiple-data-centers-with-apache-kafka

Put replicas directly across DCs

  1. on failed dc, the consumer will just switch to another replica.

  2. when dc recovers, the existing catch up mechanism will kick in

  3. If replicas are across region, latency will be high, plus demands a lot of cross dc bandwidth from replicator - hard for Kafka

Active-Passive

  1. consumer on either active or passive dc

  2. upon failure, the passive dc becomes the new active, consumer may need to switch too

  3. note that offset may not match, because of at-least-once semantics of producer

  4. normally for real time consumers, just start from the end and accept data loss

  5. When the dc comes back, need to MM back changes from the new active to the former active - hard to manage the DC offset problem again.

Active-Active

  1. each one has an active and an aggregate cluster

  2. On failure and recovery, no need to reconfigure MM

  3. To avoid aggregate cluster, we can preifx topics with DC tag,and confg MM to mirror remote topic only. and consumer need to sub to topics with both DC tags

How to make DB data available in all DCs

  1. active-active: same consumer concurrently in both DCs
  2. active-passive: only one consumer per dc at any given time

Ideally DB replication policy should be same as kafka cluster’s

Code Jam 2017 1C

2017-04-30 00:00:00 +0000

What I did right

1. I recognize that to advance to round 2, I have to solve B-large

2. When seeing a cycle, I realize that I should add the first item in the list to the end of list again, so that all segments are covered

3. I see the greedy insights 

What I did wrong

I was not able to prove the greedy insights!!!

Proof

There are 3 types of intervals, a C-J interval, a J-J interval, and C-C interval. Note that our target function is the sum of switches introduced in each intervals.

Obviously, for each C-J interval, we increase the final answer by one

For each J-J interval, we don’t have to increase final answer at all,if we can plug in the gap with the same interval when we have enough J time left. Otherwise, we will have to increase the number of swtiches by 2. Similar argument applies to C-C interval.

Therefore, our greedy solution can give the best value in each category at the SAME time (This is what I couldn’t convince myself!!!) => our overall solution is optimal.

Proof: When we have filled maximum number of J-J interval, but still some J-J interval there. Since each miunte is covered, we know C-C interval doesn’t contribute any to the final result, and we have miniized J-J’s contributions

When we have filled all J-J interval, with more Js remaining, it is the exactly same case as above, except

Now in retrospect, for greedy problems during contest, I should probably go directly for solution instead of spending all time trying to prove it.