KIP-98: Exactly once processing

2017-04-22 00:00:00 +0000

Sequence of actions

  1. producer ask the broker to discover TC

  2. producer ask the TC for a producer id (PID), and bump the PID version number/epoch to avoid zombie PID => synchronous Note: this PID/epoch is to identify the producer, so that TC can keep track of the current sequence number from the producer => necessary to implement idempotent semantics

  3. rollback or forward any transactions left incomplete by the previous instance of the producer => synchrounous

  4. Producer marks locally the transaciton has begun, and for TC it doesn’t begin until the first record is sent

  5. producer add topicPartiion to the transaction on TC, so that we can mark commit or abort marker on each transaction Like most states in Kafka, TC state is backed by a log

  6. producer produce to broker with PID, epoch, and sequence number Here broker will accept only if the sequence number = current stored + 1, if seqence # > current stored + 1, that makes fatal error. We need the PID so that later consumer can decide if messages from that PID can be materialized.

  7. producer send Map<TopicPartitions, OffsetAndMetadata> and a groupId to TC of course this will be logged in the transaction log

  8. producer will send a TxnOffsetCommitRequest to the consumer coordinator to persist the offsets in the __consumer-offsets topic

  9. producer issues an EndTxnRequest to TC, with additional data indicating whether the transaction is to be committed or aborted

  10. TC writes a PREPARE_COMMIT or PREPARE_ABORT message to the transaction log. Notice here the steps are similar to commit step in 1PC

  11. TC begins the process of writing the command messages known as COMMIT (or ABORT) markers to the leader of each TopicPartition which is part of the transcation

  12. Each broker will write a COMMIT(PID) or ABORT(PID) to the log, this message shows that whether the messages with the given PID must be delivered to the user or dropped. Cosumer will buffer messages which have a PID until it reads a corresponding COMMIT or ABORT message until it sees such message. Speical case: __consumer-offsets being one of the Topic Partitions in the transaction

13.After all the commit or abort markers are written the data logs, the TC writes the final COMMITTED or ABORTED message to the transaction log,

Code Jam 2017 1A

2017-04-20 00:00:00 +0000

A: Alphabet Cake

The key mistake I made during contest is that I didn’t notice the condition “no initial appears more than once on the cake.”!!!! Idea 1: Recursion ———-

1. If there is only 1 letter, we can fill the whole square with that letter

2. If there is more than 1 letter, we can make a cut repeatedly until we reach that base case, but how to make a cut?

Idea 2: Greedy

1. for each row, expand to all existing letters to its right

2. for each row, expand to all existing letters to its left

3. the remaining rows must be completely emtpy, we just run two copy passes to ensure they are copied, one from top to bottom, and then one
   from bottom to top

B: Ratatouille

The intuition is easy, but how to prove it?!!! Goal: we try to find the optimal solution which also has the minimum number of total servings

So we just sort the ranges by upper bound, and start with the the smallest possible servings, if we can form a kit, go, otherwise, discard all ranges that is not enough to make a kit

Proof: Suppose we have an optimal solution s(1)…s(2)…s(n)..s(n+1) and our solution is o(1)….o(n), while sum of s is smallest among all possible optimal solutions, and s picks lowest upper bound whenever possble for each ingredient

  1. By induction, we can prove that, when s(1)…s(i) = o(1)…o(n), we pick the same solution

  2. consider the first s(i) s.t. s(i) < o(i), this means when we are trying forming s(i), some of the ranges are picked by previous choices already, but it is not picked in S before i => contradiciton

  3. consider the case s(n+1) does not exist, the argument is similar

Codeforces Notes

2017-04-16 00:00:00 +0000

432D: Prefixes and Suffixes

  1. If we have a prefix that matches a suffix, then we have z[i] = n - i

  2. To know how many times prefix of length n - i appeared in the substring, we just need to look for number of indecies s.t. z[index] >= n - i

  3. Therefore, we first run z, and then calculate the prefix sum of the inverse array, the answer will be ps[n] - ps[n - i - 1]

  4. Note that in z-algorithm, z[0] = 0, in this case , we need to hard set z[0] = n, so that we don’t ignore the whole string case

535D: Tavas and Malekas

  1. If there is no overlap between the end of previous match and start of the next match, we can fill in whatever we want, because what we get is a subsequence, so extra match is OK

  2. If there is overlap between the indices, we need to make sure for the overlapped region (next start, previous end), the prefix is same as suffix, Otherwise, the result is invalid

772D: Volatile Kite

During the contest, I did see the triangle case is enough to infer the insight, and moving the 3 points in a certain direction. But instead of moving them to remove an angle, I tried let 2 points meet first, which proved to be too greedy

If we just want to reduce an angle to a straight line, once we draw the triangle case, it becomes obvious the radius should be half the distance between B and AC, where A, B, C are in clockwise order

471D: MUH and Cube Walls

2017-04-15 00:00:00 +0000

The key insight here is to introduce a difference array, i.e., the different between previous and next entry.

Obviously, if the pattern exists in the wall, then the same range of in the difference pattern match too.

Claim: if we can find the difference array A in the B’s difference array, then in the same range in the orignal arrays, we can find the pattern in the wall by shifting pattern up or down as a whole Proof: when we locate such pattern match in the difference array, we can then add a delta to make the first element match, and then because each of the following diff entry matches, delta + prefixSum[i] match for all entries in the pattern range => this means the actual shift matches

Therefore, we convert both to the diff array, contactiate 2, with a speical char separate the concact, and then run Z-algorithm, and we look for all i where z[i] = length(p) - 1. -1 because for this diff array we start with a[1] - a[0], a[0] is completely free so that we can shift it up or down!!!

796D: Police Stations

2017-04-11 00:00:00 +0000

Insights

1. Consider if a node u can be reached from two police nodes, v and w, with distance d(v) and d(w), assume d(v) <= d(w), then we know the path w....u can be broken up. More specifically, the last edge can be deleted, since we know it is redundant for sure,i.e., in an optimal solution, we know for sure that such d(v) <= d(w) case won't happen, i.e., each node is cover by exactly one police node. 

2.At the same time, it becomes obvious that when such condition suffices, this solution is optimal => otherwise the requires each node is covered is no longer true!

3. Therefore, whenever we can discover a node covered by 2 police nodes, we can delete at least 1 edge, until we reach the optimal solution

But how to proceed from here?!!!

Instead of trying delete edges, we can add edges from police nodes!

Therefore, we start from each police node, and expand them until we cover the whole tree. Note that because we want to choose d(v) instead of d(w), we have to expand the police nodes in a breadth first way, so that every time we reach a new node, we know it is the shortest distance to any police node

arc071b: ###

2017-04-08 00:00:00 +0000

Total area: sum of all i < j, k < l, with (x(j) - x(i)) * (y(l) - y(k))

Note that because (i,j) pair is independent of (k, l) pair, we can break up the sum and associate each of them with each dimentions!!! i.e., the final answer becomes

sum of all i < j x(j) - x(i)

times

sum of all k < l y(l) - y(k)

To calculate each sum term, we notice that for each item i, it appears (i - 1) times as the LHS of the delta, and n - i times as the RHS of the delta, i.e.,

sum over all i, (i - 1) x(i) - (n - i) x(i) = (2 * i - n - 1) * x(i) !

Z-Function: Notes form e-maxx-eng

2017-04-02 00:00:00 +0000

z[i] = the LCP between S and the suffix of S starting at i

1. Index is 0-based

2. z[0] is not well defined, normally not a problem

We keep track of (l, r), the rightmost segment that matches the prefix of the string so far, and now we process i

1. if i > r, then compute z[i] in naive algorithem, and upate (l, r) if z[i] > 0, because the new r = i + z[i] - 1 

2. if i <= r, note that s[l..r] = s[0...r-l], so s[i...r] = s[i-l...r-l], so to compute z[i], we can skip the first z[i-l] chars starting
   from z => they will be match anyway, provided that i + z[i-l] - 1 <= r. Otherwise, we can only skip the first  r - i + 1  chars

3. after what we skip, we will have to compute the new entries and update (l, r)

Implementation Details

1. l, r, z[0] are all inited to 0, we start populating z[i] from 1

2. we update (l, r) only when i + z[i] - 1 > r

Runtime proof

Case by case analysis, prove that in each iteration of the trivial algorithm, we will increase r

Usage: search text for pattern

1. Contatinate a new string s.t. s = pattern + speical char + text. 

2. Calculte z-function for the new string

3. For i in [len[p] + 1, len(s) + len(p)], check if z[i] = len(p). Note that z[i] can not be > len(p) due to our speical character rule 

Usage: number of distince substrings in a string

Usage: String compression

Given a string of length, find a string t of shortest length s.t. s can be represtned as a contactenation of one or more copies of t.

loop throug all i s..t. i divides n, stop at the first i s.t. i + z[i] = n, then the strin s can be compressed to the length i

789B: Masha and geometric depression

2017-03-31 00:00:00 +0000

This costed me 3 WAs during the contest!!!

What I did wrong

Given the sequence goes from beginning to end, in order, I should at least try validating in that order, i.e., classification based on the case if b1 can be written.

Instead, I classified on the value of q, and therefore missed one case where abs(b1) > l, i.e., I worried about the latter cases without thinking if the former, dependent cases are even valid to begin with.

787C: Berzerk

2017-03-27 00:00:00 +0000

The kernel of this problem is to handle the unvisited state

Why the DFS approach does not work?!!!

We may have a case where we visit an ancestor when we have no conclusion on it yet, and thus we will conclude the state as LOOP
prematurely

In other words, a DFS suggests that when we visit that ancestor again, there is no conclusion on that ancestor yet, and therefore, we can
not form any conclusion on the child node we are on yet --> A deadlock here because by DFS, the ancestor can do nothing either!

Concretely, in our DFS we have an implicit ancetor -> child order in the final steps already, but we can not rule out child -> ancetor
possiblity in a winning play. That is why DFS will terminate prematurely!

Then how do we break the premature loop? => Don’t form it in the first place! We start from the blackhole, which has edges from only 1 side

Insight

1. We can use an event-driven approach, similar to 781B 

2. If the current state is losing, then mark all neighbors as winning, and expand on those neighbor
s. Compared with DFS which builds on unknowns, we expand on known states, 

3. If the current state is winning, then all 1 bad count to all neighbors, if any neighbor is full of bad notes, we conclude it is in a bad
   state, and enter that node 

4. Use arrays instead of separate variables to make coding easier

5. When calculating the previous pos we from from, need to exclude 1, which is invalid case

Problems from Suffix Array - A Programming Contest Approach

2017-03-21 00:00:00 +0000

Consider a string of length n (1 <= n <= 100000). Determine its minimum lexicographic rotation.

Idea

  1. to handle rotation, we just cocatinate the string again. But how do we handle to problem of fixed size n?
  2. If we forget about the fixed size n requirement, we can just construct a suffix array, and the first entry will be lex smallest.
  3. Now if we look at the first entry in the suffix array with length at least n, we know that a rotation must be its prefix. Moreoever, all rotations are covered in our suffix array already
  4. Now consider the first and second such suffixes, we know that lexigraphically, the n-size of first prefix <= n-size of the second prefix. Therefore, the rotation starting at i <= the rotation staring at i+1
  5. Apply such arguments to all rotations appearing in the suffix array, we know the first entry has the lex smallest rotation

Codeforces Notes

2017-03-20 00:00:00 +0000

770C: Online Courses in BSU

Build a dependency graph, and then DFS on all k main courses. In the end, order by the post finish time

769C: Cycle in Maze

Note that we want lexicographically mininal cycle, this means we don’t need to find minimal length at all, because to get the lexi graphically smallest cycle, we need to insert D as many times as we can

also note, we want to find cycle with length exactly k, k = mahanttan distance + 2 * number opposite the distances. i.e., given K, we know how many extra D and Ls we can insert, if we know the the shortest path we need to k

So we need to calcualte the shortest path from the desitination to all grids, and then from the starting point, we try inserting D as much as we can, until step + shortestPath(grid) = k, and then L, and then U

785D: Anton and School - 2(!!!)

Consider when we are at a ( bracket, so we need to know: # of (s before i, and number of )s after i, and any same number combination would work.

Turns out there is a closed formula for that! When we have a answer-ish string, i.e., starts with x (s followed by y )s, the total choice is (x+y choose x)

Proof

  1. Consider our combination, # of opening brackets in x picked o => x - o closing brackets are picked, however, at this time we also have x - o open brackets UNPICKED (because we have in total x opening brackets). Therefore, each pick maps to a combination

  2. Each of the mapping translation is equivalent and reversible. Therefore, this closed formula is sound and complete

Note that we are going through the string. To avoid duplicate, we need to mark that the current ( we run into must be selected. This change the formula to choose(lb + rb -1, lb)!!!

791D: Bear and Tree Jumps

2017-03-19 00:00:00 +0000

Idea

Suppose k = 1, what the answer woule be?

For each edges, it will be the combination of the vs on the two sides, i.e., suppose the edge connects to (u, v), with v as the root of the subtree, the contribution of that edge will be S(v) * n - S(v), and the final answer is the sum of contribution over all edges

Insight: we have a closed form for this answer, and yet, it can be expressed as sum(L /1), where L is the length of a single path

When k > 1, we know the final answer should be around sum(L/k), to be more specific, it will be sum(ceil(L/k))

Now we can separate the final answer into 2 parts

1. sum(L/k) = sum(L/1) / k = (answer in k = 1 case ) / k

2. sum(f(L, k)), where f(L, k) = k - (L % k), or 0 if L % k = 0 over all L

Now the problem becomes calculating the second part. The key is to swap sum over order, i.e., instead of caculating over all L, we can calculate over all k!!!

That is, for each remainder, how many paths we have with L s.t. L % k = remainder, for each remainder, we need to add remainder * numer to the second part

How to calculate the number of paths?!!!

Insight

1. We decompose any shortest path into 3 parts: a -> LCA -> root of the tree -> LCA -> b

2. Therefore, we need to keep track of number of paths with length L % k, which starts at the root and passes through each node

3. During bottom-up calculation, we will calculate only the count in 2, and update the final answer 

4. since in DFS we add child one by one, we need to update the answer, and then update thr subtree root state based on the child state

Implementation

def dfs(root, parent, depth) =

count[root][depth %k] = 1 //just reach the root of this subtree

for each child
  dfs(child, root, depth + 1) //calculate bottom up

  for pathLenPassingRootFromOrigin in [0.. k) //for simplicity, we call the root of the whole tree graph Origin
    for pathLenPassingChildFromOrigin in [0..k)
        int distMod = (pathLenPassingRootFromOrigin + pathLenPassingChildFromOrigin - 2 * depth) % k + k % k;
        int needToAdd = k - distMod

/*
need to
multiply here, because count[root] represents all visited child of the root including itself, and yet we are visiting a new child, so any
new path we found on the child can form a combinition with discovered paths going through roots so far.
*/
        answer += needToAdd * count[root][pathLenPassingRootFromOrigin]  * count_subtree[child][pathLenPassingChildFromOrigin]

/*
done processing that child, we need to update the counts of the paths with we have discovered so far, not e if we have path from origin to
child with length L, then we also have a pth from origin to parent with lenght L!

*/
  for pathLenPassingRootFromOrigin in [0..k)
        count[root][pathLenPasingRootFromOrigin] += count[child][pathLenPassingRootFromOrigin]
    

Reflections on Reflection

2017-03-18 00:00:00 +0000

Reflection is commonly used with annotation

1. JUnit look for the @Test annotaion, and then use method.invoke(ourObject, null)

2. Similarly, we can introduce our custom anotation, and then find it via method.isAnnotationPresnet(OurAnnoation.class), so that we then can method.invoke

Reflection is commonly used for class instantiation

1. Often we want to specify want class to run in config, we can just specify the full calss path in the AppSettings, and then instantiate it via Class.forName and then class.newIntance. Note that with Guice, we can just use Injector.getInstance(class)

2. Similarly, we can pass the class name as part of serialization form, and on the consumer side, along with the states, and on the consumer side, use Constructor.newInstance or class.newInstance

3.One other use is when we don't want to supply all parameters to constructer, e.g., via dependency injection. We can just supply only the types we need, and then custruct the array with the unsupplied with null

agc011 D: Half Reflector

2017-03-18 00:00:00 +0000

TOOHARD

Since interaction happens only between 2 neighbors, let’s focus on the interaction when the ball is between 2 neighbors

Claim: When a ball is between two neighbors, the left neighbor must be of form A
Proof: Try out both cases of A and B
Claim: When after the ball left the right neighbor, it will never come back to the this gap
Proof: Consider both cases AB and AA, it will be in the form of AA and BA, respectively, after it left, and we run into the same case again! By induction this means the ball will move from left to right gaps, one by one

Claim: After ball left the whole sequence, the state of each one is the negation of the next one before it starts
Proof: By the previous claim, we know once we leave the current gap, we are done. So we look at AA and AB gap, and we notice that the left item, A in this case, is just the negation of right item pre-action, i.e., we proof by complete search

Note that the last one will always be A, as shown in the AA and AB case

Codeforces Notes

2017-03-12 00:00:00 +0000

283B: Cow Program

  1. Note that there are two steps, so we need 2 states for entry.
  2. Because cycle may exists, we use DFS pseudo-code, i.e., we recursive call without parent and then check result instead of doing it before the recursive call.

25D: Roads not only in Berland

1. use DFS to detect redundant edges in a tree

2. For each compoent, add that edge to any node in that component (can force order here)

274B: Zero Tree

Just start from the root and cascade down

452B: 4-point polyline

Follow the example, we can see the pattern is a Z-shaped polyline along the diagonal. use this takes top 3 of the 4 longest line segments in the square.

Alernatively, we can try including both diagonals, and compare results to see which are the best

611D: New Year and Ancient Prophecy(!!!)

A naive DP solution gives O(n^3), how to improve it?

1. Use a prefix sum to calculate total ways(n - l, l1)

2. to do quick comparision, we just need DP to calc the first index where the seg starts at and the seg starts at b are different

Codeforces Notes

2017-03-10 00:00:00 +0000

371D: Vessels

The idea is similar to union-find’s path compression, i.e.,we need to know the next empty ship, therefore, every time we do a pour, we can do a path compression.

Implementation-wise, the fill() should return the next possible vessel which may have the capacity, if we fill at that vessel again. This means this could the vessel itself. This is to handle the case where the fill does not cover the existing capacity=> Otherwise, we will short circuit the link , skipping the current vessel regardless of whether it can accept more

91B: Queue(!!!)

For each number, we need to find the position of the right most number that is less than the current number.

Note that min(i, n) <= min(i+1, n) for all i, and thus forms a monotonic functions. The right most one will be the highest i such that min(i,n) < target and min(i+1, n) > target

461B: Appleman and Tree(!!!)

Doing a count(v) dp seems hard, this suggests the DP state should have more granularity

Since we need tree with exactly 1 sub tree, we consider the sub problem

1. ways(v, i) = number of ways to cut tree at v, with the cut part at v has i black vertex , while the rest all have exactly 1 black vertex.
   This separate i for the root is to help us merge between child and parent

2. for each child u, consider the case if u is not included in the subtree, update ways(v, 0) and ways(v, 1)

3. for each child u, consider the case if u IS included in the subtree rooted at v after cut, updates ways(v, 0) and ways(v, 1)

4. Final answer is ways(root, 1)

781B: Innokenty and a Football League

2017-03-09 00:00:00 +0000

During the contest, I got the idea that we can start with an all first arrangement, and gradually improve it step by step, until we reach a final form.

What I missed is to prove that the each step will replace some a with b, and not versa(!), if a solution exists

1. If two share the same first form, then both must choose the second form, because if one chooses first form and one chooses 2nd form, the
   extra rule in the problem will be violated

2. Therefore, we can just discover all first form collisions, and replace them with second form

3. After that, we can not revert any second form to first form, because that would mean the reverted first form has another second form
   whose first form is same as this reverted first form (same reasoning as in step 1)

4. This means, from now on, if we have any collsions of second form, we know we can not make any other changes, and thus answer is no

5. This also means, from now on, if we have a collision of first form and second form, we can only go from first to second, and not second
   to first

6. So we scan again, and update all the second form collision, and then a final scan for sanity check

Now for the soundness and completeness proof

1. If the algo gives YES, of course, there will be no collsion, because each step solves one of the 3 possible cases of collisions  => the
   algorithm is sound
 

2. If a solution does exist, and our algo gives no => this means we discover a second form collision,

3. If this second form collsion is from a first form collsion, then by our rules we can have no answer => contradition

4. If the second form collsion is the result of fixing a first-second form collsion, similar to the reasoning we give in the algorithm. we are there because
   any other option will yield an NO, i.e., we know for each change we made, it HAS TO be in any solution => contradiction

5. This also means our algo gives the mimium number of changes

But implementation wise, every time we switch from a first form to a second form, we may potentially introduce new conflicts between the newly-introduced second form and existing first form. Therefore, this switch should be done is a graph-like, event-driven way. Instead of scanning the array once and call it done!

validate(v)
  if v uses first form
      done

  if v uses second form
      loop all u in  1...n save v
      if u uses first form, and first(u) = second(v)
          mark u to use second form
          validate(u)

The for loop will be triggered O(n) times, with each one O(n) run time => O(n^2) in total

Codeforces Notes

2017-02-27 00:00:00 +0000

538D: Weird Chess

Start with one piece, let it move to all potential positions, and then check other pieces one by one, if it can attack a piece that is not really the case, take that move out of final answer

In the end, do a final vefification of answers, make sure the pattern is same

550D: Regular Bridge(!!!)

k = 1, trivial case k = 2, no solution? Total edges: n * k / 2

after we take out a bridge, each composnent should have (|C| * k - 1)/2 edges, i.e., when k is even there is no solution. when k is odd, easy to construct a solution, with each component a complete graph minus the bridge, but note that the skipped edges should be 2-3, 4-5, instead of 2-3, 3-4!!!

To make coding easider, we can assign 1..k+ 2 to the left half, and 1 + k+2 … 2k +4 to the right half

763B: Timofey and rectangles

since all lengths are odd, we know that for the bottom right corner, the oddity of the corrdinate must not be same for adjancent rectangles, i.e, at least the oddity of x-coordinate or y-coornidate are different. Therefore, we can just assign color in this schema, this means even if there is adject rectangles, we will guarantee that they have different color

Codeforces Round 402

2017-02-26 00:00:00 +0000

Another round at 5am….

779A: Pupils Redistribution

I failed this A problem!!! To calcualte the final answer, i focused on the case where a[digit] - b[digit] > 0. However, I didn’t consider the case when a[digit] < b[digit] , and the delta is odd. This means an exchange can never happen!

For B, spend 10 mins on the index/length of the proper prefix/suffix length, should have realized that when i set start = n-1, it is the last index of the prefix:w

C is at least k. At first I coded the case = k, wasted time on changing that.

I wasted too much time on D. I should have realized that when it is hard to get insight to solve it in a closed form, probably should try do searching + simulation, i.e., find a small search space or binary search

Codeforces Round 401

2017-02-25 00:00:00 +0000

777B: Games of Credit Cards

This problem took me too much time!!! Probably because I had to wake up at 4 am….

To give max hits, map M’s 9 to S’s 8, 8 to 7,….etc. With carry over potentially positive balances

To recieve min # of hits, map S’s 9 to M’9, the hit is whenever there is a positve delta

Note that we care about only positive deltas, we ignore negative delta, because in the end we will use left over postive deltas to fill those negative deltas, because two strings have same length, we know they will match

777C:Alyona and Spreadsheet

for each row, calculate the earliest start of the increasing subsequenceo

when answering the query [l,r], just look up r and find if answer <= l

777D:Cloud of Hashtags

Instead of calculating min # of removal, we calculate max # of keeps, because adding letter only “increases” a word, we start from the tail, and try adding letter as much as we can

so the last one, we can keep all letters the previous one, we can keep max prefix <= last one
keep going until the first one

776C: Molly’s Chemicals

Calculate the prefix sum, and find how many prefix sum before the current one, such that ps(current) - ps(previoius) = power of k

Since we care only about the previous prevous sums, we can calculate final answer while we build that, so that we don’t have to worry about interference of prefix sums after i

Overall run time O(n * log(k, 10^14))

Corner case k = 1 or -1 !!!