KIP-98: Exactly once processing
Sequence of actions
-
producer ask the broker to discover TC
-
producer ask the TC for a producer id (PID), and bump the PID version number/epoch to avoid zombie PID => synchronous Note: this PID/epoch is to identify the producer, so that TC can keep track of the current sequence number from the producer => necessary to implement idempotent semantics
-
rollback or forward any transactions left incomplete by the previous instance of the producer => synchrounous
-
Producer marks locally the transaciton has begun, and for TC it doesn’t begin until the first record is sent
-
producer add topicPartiion to the transaction on TC, so that we can mark commit or abort marker on each transaction Like most states in Kafka, TC state is backed by a log
-
producer produce to broker with PID, epoch, and sequence number Here broker will accept only if the sequence number = current stored + 1, if seqence # > current stored + 1, that makes fatal error. We need the PID so that later consumer can decide if messages from that PID can be materialized.
-
producer send Map<TopicPartitions, OffsetAndMetadata> and a groupId to TC of course this will be logged in the transaction log
-
producer will send a TxnOffsetCommitRequest to the consumer coordinator to persist the offsets in the __consumer-offsets topic
-
producer issues an EndTxnRequest to TC, with additional data indicating whether the transaction is to be committed or aborted
-
TC writes a PREPARE_COMMIT or PREPARE_ABORT message to the transaction log. Notice here the steps are similar to commit step in 1PC
-
TC begins the process of writing the command messages known as COMMIT (or ABORT) markers to the leader of each TopicPartition which is part of the transcation
-
Each broker will write a COMMIT(PID) or ABORT(PID) to the log, this message shows that whether the messages with the given PID must be delivered to the user or dropped. Cosumer will buffer messages which have a PID until it reads a corresponding COMMIT or ABORT message until it sees such message. Speical case: __consumer-offsets being one of the Topic Partitions in the transaction
13.After all the commit or abort markers are written the data logs, the TC writes the final COMMITTED or ABORTED message to the transaction log,
Code Jam 2017 1A
A: Alphabet Cake
The key mistake I made during contest is that I didn’t notice the condition “no initial appears more than once on the cake.”!!!! Idea 1: Recursion ———-
1. If there is only 1 letter, we can fill the whole square with that letter
2. If there is more than 1 letter, we can make a cut repeatedly until we reach that base case, but how to make a cut?
Idea 2: Greedy
1. for each row, expand to all existing letters to its right
2. for each row, expand to all existing letters to its left
3. the remaining rows must be completely emtpy, we just run two copy passes to ensure they are copied, one from top to bottom, and then one
from bottom to top
B: Ratatouille
The intuition is easy, but how to prove it?!!! Goal: we try to find the optimal solution which also has the minimum number of total servings
So we just sort the ranges by upper bound, and start with the the smallest possible servings, if we can form a kit, go, otherwise, discard all ranges that is not enough to make a kit
Proof: Suppose we have an optimal solution s(1)…s(2)…s(n)..s(n+1) and our solution is o(1)….o(n), while sum of s is smallest among all possible optimal solutions, and s picks lowest upper bound whenever possble for each ingredient
-
By induction, we can prove that, when s(1)…s(i) = o(1)…o(n), we pick the same solution
-
consider the first s(i) s.t. s(i) < o(i), this means when we are trying forming s(i), some of the ranges are picked by previous choices already, but it is not picked in S before i => contradiciton
-
consider the case s(n+1) does not exist, the argument is similar
Codeforces Notes
432D: Prefixes and Suffixes
-
If we have a prefix that matches a suffix, then we have z[i] = n - i
-
To know how many times prefix of length n - i appeared in the substring, we just need to look for number of indecies s.t. z[index] >= n - i
-
Therefore, we first run z, and then calculate the prefix sum of the inverse array, the answer will be ps[n] - ps[n - i - 1]
-
Note that in z-algorithm, z[0] = 0, in this case , we need to hard set z[0] = n, so that we don’t ignore the whole string case
535D: Tavas and Malekas
-
If there is no overlap between the end of previous match and start of the next match, we can fill in whatever we want, because what we get is a subsequence, so extra match is OK
-
If there is overlap between the indices, we need to make sure for the overlapped region (next start, previous end), the prefix is same as suffix, Otherwise, the result is invalid
772D: Volatile Kite
During the contest, I did see the triangle case is enough to infer the insight, and moving the 3 points in a certain direction. But instead of moving them to remove an angle, I tried let 2 points meet first, which proved to be too greedy
If we just want to reduce an angle to a straight line, once we draw the triangle case, it becomes obvious the radius should be half the distance between B and AC, where A, B, C are in clockwise order
471D: MUH and Cube Walls
The key insight here is to introduce a difference array, i.e., the different between previous and next entry.
Obviously, if the pattern exists in the wall, then the same range of in the difference pattern match too.
Claim: if we can find the difference array A in the B’s difference array, then in the same range in the orignal arrays, we can find the pattern in the wall by shifting pattern up or down as a whole Proof: when we locate such pattern match in the difference array, we can then add a delta to make the first element match, and then because each of the following diff entry matches, delta + prefixSum[i] match for all entries in the pattern range => this means the actual shift matches
Therefore, we convert both to the diff array, contactiate 2, with a speical char separate the concact, and then run Z-algorithm, and we look for all i where z[i] = length(p) - 1. -1 because for this diff array we start with a[1] - a[0], a[0] is completely free so that we can shift it up or down!!!
796D: Police Stations
Insights
1. Consider if a node u can be reached from two police nodes, v and w, with distance d(v) and d(w), assume d(v) <= d(w), then we know the path w....u can be broken up. More specifically, the last edge can be deleted, since we know it is redundant for sure,i.e., in an optimal solution, we know for sure that such d(v) <= d(w) case won't happen, i.e., each node is cover by exactly one police node.
2.At the same time, it becomes obvious that when such condition suffices, this solution is optimal => otherwise the requires each node is covered is no longer true!
3. Therefore, whenever we can discover a node covered by 2 police nodes, we can delete at least 1 edge, until we reach the optimal solution
But how to proceed from here?!!!
Instead of trying delete edges, we can add edges from police nodes!
Therefore, we start from each police node, and expand them until we cover the whole tree. Note that because we want to choose d(v) instead of d(w), we have to expand the police nodes in a breadth first way, so that every time we reach a new node, we know it is the shortest distance to any police node
arc071b: ###
Total area: sum of all i < j, k < l, with (x(j) - x(i)) * (y(l) - y(k))
Note that because (i,j) pair is independent of (k, l) pair, we can break up the sum and associate each of them with each dimentions!!! i.e., the final answer becomes
sum of all i < j x(j) - x(i)
times
sum of all k < l y(l) - y(k)
To calculate each sum term, we notice that for each item i, it appears (i - 1) times as the LHS of the delta, and n - i times as the RHS of the delta, i.e.,
sum over all i, (i - 1) x(i) - (n - i) x(i) = (2 * i - n - 1) * x(i) !
Z-Function: Notes form e-maxx-eng
z[i] = the LCP between S and the suffix of S starting at i
1. Index is 0-based
2. z[0] is not well defined, normally not a problem
We keep track of (l, r), the rightmost segment that matches the prefix of the string so far, and now we process i
1. if i > r, then compute z[i] in naive algorithem, and upate (l, r) if z[i] > 0, because the new r = i + z[i] - 1
2. if i <= r, note that s[l..r] = s[0...r-l], so s[i...r] = s[i-l...r-l], so to compute z[i], we can skip the first z[i-l] chars starting
from z => they will be match anyway, provided that i + z[i-l] - 1 <= r. Otherwise, we can only skip the first r - i + 1 chars
3. after what we skip, we will have to compute the new entries and update (l, r)
Implementation Details
1. l, r, z[0] are all inited to 0, we start populating z[i] from 1
2. we update (l, r) only when i + z[i] - 1 > r
Runtime proof
Case by case analysis, prove that in each iteration of the trivial algorithm, we will increase r
Usage: search text for pattern
1. Contatinate a new string s.t. s = pattern + speical char + text.
2. Calculte z-function for the new string
3. For i in [len[p] + 1, len(s) + len(p)], check if z[i] = len(p). Note that z[i] can not be > len(p) due to our speical character rule
Usage: number of distince substrings in a string
Usage: String compression
Given a string of length, find a string t of shortest length s.t. s can be represtned as a contactenation of one or more copies of t.
loop throug all i s..t. i divides n, stop at the first i s.t. i + z[i] = n, then the strin s can be compressed to the length i
789B: Masha and geometric depression
This costed me 3 WAs during the contest!!!
What I did wrong
Given the sequence goes from beginning to end, in order, I should at least try validating in that order, i.e., classification based on the case if b1 can be written.
Instead, I classified on the value of q, and therefore missed one case where abs(b1) > l, i.e., I worried about the latter cases without thinking if the former, dependent cases are even valid to begin with.
787C: Berzerk
The kernel of this problem is to handle the unvisited state
Why the DFS approach does not work?!!!
We may have a case where we visit an ancestor when we have no conclusion on it yet, and thus we will conclude the state as LOOP
prematurely
In other words, a DFS suggests that when we visit that ancestor again, there is no conclusion on that ancestor yet, and therefore, we can
not form any conclusion on the child node we are on yet --> A deadlock here because by DFS, the ancestor can do nothing either!
Concretely, in our DFS we have an implicit ancetor -> child order in the final steps already, but we can not rule out child -> ancetor
possiblity in a winning play. That is why DFS will terminate prematurely!
Then how do we break the premature loop? => Don’t form it in the first place! We start from the blackhole, which has edges from only 1 side
Insight
1. We can use an event-driven approach, similar to 781B
2. If the current state is losing, then mark all neighbors as winning, and expand on those neighbor
s. Compared with DFS which builds on unknowns, we expand on known states,
3. If the current state is winning, then all 1 bad count to all neighbors, if any neighbor is full of bad notes, we conclude it is in a bad
state, and enter that node
4. Use arrays instead of separate variables to make coding easier
5. When calculating the previous pos we from from, need to exclude 1, which is invalid case
Problems from Suffix Array - A Programming Contest Approach
Consider a string of length n (1 <= n <= 100000). Determine its minimum lexicographic rotation.
Idea
- to handle rotation, we just cocatinate the string again. But how do we handle to problem of fixed size n?
- If we forget about the fixed size n requirement, we can just construct a suffix array, and the first entry will be lex smallest.
- Now if we look at the first entry in the suffix array with length at least n, we know that a rotation must be its prefix. Moreoever, all rotations are covered in our suffix array already
- Now consider the first and second such suffixes, we know that lexigraphically, the n-size of first prefix <= n-size of the second prefix. Therefore, the rotation starting at i <= the rotation staring at i+1
- Apply such arguments to all rotations appearing in the suffix array, we know the first entry has the lex smallest rotation
Codeforces Notes
770C: Online Courses in BSU
Build a dependency graph, and then DFS on all k main courses. In the end, order by the post finish time
769C: Cycle in Maze
Note that we want lexicographically mininal cycle, this means we don’t need to find minimal length at all, because to get the lexi graphically smallest cycle, we need to insert D as many times as we can
also note, we want to find cycle with length exactly k, k = mahanttan distance + 2 * number opposite the distances. i.e., given K, we know how many extra D and Ls we can insert, if we know the the shortest path we need to k
So we need to calcualte the shortest path from the desitination to all grids, and then from the starting point, we try inserting D as much as we can, until step + shortestPath(grid) = k, and then L, and then U
785D: Anton and School - 2(!!!)
Consider when we are at a ( bracket, so we need to know: # of (s before i, and number of )s after i, and any same number combination would work.
Turns out there is a closed formula for that! When we have a answer-ish string, i.e., starts with x (s followed by y )s, the total choice is (x+y choose x)
Proof
-
Consider our combination, # of opening brackets in x picked o => x - o closing brackets are picked, however, at this time we also have x - o open brackets UNPICKED (because we have in total x opening brackets). Therefore, each pick maps to a combination
-
Each of the mapping translation is equivalent and reversible. Therefore, this closed formula is sound and complete
Note that we are going through the string. To avoid duplicate, we need to mark that the current ( we run into must be selected. This change the formula to choose(lb + rb -1, lb)!!!
791D: Bear and Tree Jumps
Idea
Suppose k = 1, what the answer woule be?
For each edges, it will be the combination of the vs on the two sides, i.e., suppose the edge connects to (u, v), with v as the root of the subtree, the contribution of that edge will be S(v) * n - S(v), and the final answer is the sum of contribution over all edges
Insight: we have a closed form for this answer, and yet, it can be expressed as sum(L /1), where L is the length of a single path
When k > 1, we know the final answer should be around sum(L/k), to be more specific, it will be sum(ceil(L/k))
Now we can separate the final answer into 2 parts
1. sum(L/k) = sum(L/1) / k = (answer in k = 1 case ) / k
2. sum(f(L, k)), where f(L, k) = k - (L % k), or 0 if L % k = 0 over all L
Now the problem becomes calculating the second part. The key is to swap sum over order, i.e., instead of caculating over all L, we can calculate over all k!!!
That is, for each remainder, how many paths we have with L s.t. L % k = remainder, for each remainder, we need to add remainder * numer to the second part
How to calculate the number of paths?!!!
Insight
1. We decompose any shortest path into 3 parts: a -> LCA -> root of the tree -> LCA -> b
2. Therefore, we need to keep track of number of paths with length L % k, which starts at the root and passes through each node
3. During bottom-up calculation, we will calculate only the count in 2, and update the final answer
4. since in DFS we add child one by one, we need to update the answer, and then update thr subtree root state based on the child state
Implementation
def dfs(root, parent, depth) =
count[root][depth %k] = 1 //just reach the root of this subtree
for each child
dfs(child, root, depth + 1) //calculate bottom up
for pathLenPassingRootFromOrigin in [0.. k) //for simplicity, we call the root of the whole tree graph Origin
for pathLenPassingChildFromOrigin in [0..k)
int distMod = (pathLenPassingRootFromOrigin + pathLenPassingChildFromOrigin - 2 * depth) % k + k % k;
int needToAdd = k - distMod
/*
need to
multiply here, because count[root] represents all visited child of the root including itself, and yet we are visiting a new child, so any
new path we found on the child can form a combinition with discovered paths going through roots so far.
*/
answer += needToAdd * count[root][pathLenPassingRootFromOrigin] * count_subtree[child][pathLenPassingChildFromOrigin]
/*
done processing that child, we need to update the counts of the paths with we have discovered so far, not e if we have path from origin to
child with length L, then we also have a pth from origin to parent with lenght L!
*/
for pathLenPassingRootFromOrigin in [0..k)
count[root][pathLenPasingRootFromOrigin] += count[child][pathLenPassingRootFromOrigin]
Reflections on Reflection
Reflection is commonly used with annotation
1. JUnit look for the @Test annotaion, and then use method.invoke(ourObject, null)
2. Similarly, we can introduce our custom anotation, and then find it via method.isAnnotationPresnet(OurAnnoation.class), so that we then can method.invoke
Reflection is commonly used for class instantiation
1. Often we want to specify want class to run in config, we can just specify the full calss path in the AppSettings, and then instantiate it via Class.forName and then class.newIntance. Note that with Guice, we can just use Injector.getInstance(class)
2. Similarly, we can pass the class name as part of serialization form, and on the consumer side, along with the states, and on the consumer side, use Constructor.newInstance or class.newInstance
3.One other use is when we don't want to supply all parameters to constructer, e.g., via dependency injection. We can just supply only the types we need, and then custruct the array with the unsupplied with null
agc011 D: Half Reflector
TOOHARD
Since interaction happens only between 2 neighbors, let’s focus on the interaction when the ball is between 2 neighbors
Claim: When a ball is between two neighbors, the left neighbor must be of form A
Proof: Try out both cases of A and B
Claim: When after the ball left the right neighbor, it will never come back to the this gap
Proof: Consider both cases AB and AA, it will be in the form of AA and BA, respectively, after it left, and we run into the same case again! By induction this means the ball will move from left to right gaps, one by one
Claim: After ball left the whole sequence, the state of each one is the negation of the next one before it starts
Proof: By the previous claim, we know once we leave the current gap, we are done. So we look at AA and AB gap, and we notice that the left item, A in this case, is just the negation of right item pre-action, i.e., we proof by complete search
Note that the last one will always be A, as shown in the AA and AB case
Codeforces Notes
283B: Cow Program
- Note that there are two steps, so we need 2 states for entry.
- Because cycle may exists, we use DFS pseudo-code, i.e., we recursive call without parent and then check result instead of doing it before the recursive call.
25D: Roads not only in Berland
1. use DFS to detect redundant edges in a tree
2. For each compoent, add that edge to any node in that component (can force order here)
274B: Zero Tree
Just start from the root and cascade down
452B: 4-point polyline
Follow the example, we can see the pattern is a Z-shaped polyline along the diagonal. use this takes top 3 of the 4 longest line segments in the square.
Alernatively, we can try including both diagonals, and compare results to see which are the best
611D: New Year and Ancient Prophecy(!!!)
A naive DP solution gives O(n^3), how to improve it?
1. Use a prefix sum to calculate total ways(n - l, l1)
2. to do quick comparision, we just need DP to calc the first index where the seg starts at and the seg starts at b are different
Codeforces Notes
371D: Vessels
The idea is similar to union-find’s path compression, i.e.,we need to know the next empty ship, therefore, every time we do a pour, we can do a path compression.
Implementation-wise, the fill() should return the next possible vessel which may have the capacity, if we fill at that vessel again. This means this could the vessel itself. This is to handle the case where the fill does not cover the existing capacity=> Otherwise, we will short circuit the link , skipping the current vessel regardless of whether it can accept more
91B: Queue(!!!)
For each number, we need to find the position of the right most number that is less than the current number.
Note that min(i, n) <= min(i+1, n) for all i, and thus forms a monotonic functions. The right most one will be the highest i such that min(i,n) < target and min(i+1, n) > target
461B: Appleman and Tree(!!!)
Doing a count(v) dp seems hard, this suggests the DP state should have more granularity
Since we need tree with exactly 1 sub tree, we consider the sub problem
1. ways(v, i) = number of ways to cut tree at v, with the cut part at v has i black vertex , while the rest all have exactly 1 black vertex.
This separate i for the root is to help us merge between child and parent
2. for each child u, consider the case if u is not included in the subtree, update ways(v, 0) and ways(v, 1)
3. for each child u, consider the case if u IS included in the subtree rooted at v after cut, updates ways(v, 0) and ways(v, 1)
4. Final answer is ways(root, 1)
781B: Innokenty and a Football League
During the contest, I got the idea that we can start with an all first arrangement, and gradually improve it step by step, until we reach a final form.
What I missed is to prove that the each step will replace some a with b, and not versa(!), if a solution exists
1. If two share the same first form, then both must choose the second form, because if one chooses first form and one chooses 2nd form, the
extra rule in the problem will be violated
2. Therefore, we can just discover all first form collisions, and replace them with second form
3. After that, we can not revert any second form to first form, because that would mean the reverted first form has another second form
whose first form is same as this reverted first form (same reasoning as in step 1)
4. This means, from now on, if we have any collsions of second form, we know we can not make any other changes, and thus answer is no
5. This also means, from now on, if we have a collision of first form and second form, we can only go from first to second, and not second
to first
6. So we scan again, and update all the second form collision, and then a final scan for sanity check
Now for the soundness and completeness proof
1. If the algo gives YES, of course, there will be no collsion, because each step solves one of the 3 possible cases of collisions => the
algorithm is sound
2. If a solution does exist, and our algo gives no => this means we discover a second form collision,
3. If this second form collsion is from a first form collsion, then by our rules we can have no answer => contradition
4. If the second form collsion is the result of fixing a first-second form collsion, similar to the reasoning we give in the algorithm. we are there because
any other option will yield an NO, i.e., we know for each change we made, it HAS TO be in any solution => contradiction
5. This also means our algo gives the mimium number of changes
But implementation wise, every time we switch from a first form to a second form, we may potentially introduce new conflicts between the newly-introduced second form and existing first form. Therefore, this switch should be done is a graph-like, event-driven way. Instead of scanning the array once and call it done!
validate(v)
if v uses first form
done
if v uses second form
loop all u in 1...n save v
if u uses first form, and first(u) = second(v)
mark u to use second form
validate(u)
The for loop will be triggered O(n) times, with each one O(n) run time => O(n^2) in total
Codeforces Notes
538D: Weird Chess
Start with one piece, let it move to all potential positions, and then check other pieces one by one, if it can attack a piece that is not really the case, take that move out of final answer
In the end, do a final vefification of answers, make sure the pattern is same
550D: Regular Bridge(!!!)
k = 1, trivial case k = 2, no solution? Total edges: n * k / 2
after we take out a bridge, each composnent should have (|C| * k - 1)/2 edges, i.e., when k is even there is no solution. when k is odd, easy to construct a solution, with each component a complete graph minus the bridge, but note that the skipped edges should be 2-3, 4-5, instead of 2-3, 3-4!!!
To make coding easider, we can assign 1..k+ 2 to the left half, and 1 + k+2 … 2k +4 to the right half
763B: Timofey and rectangles
since all lengths are odd, we know that for the bottom right corner, the oddity of the corrdinate must not be same for adjancent rectangles, i.e, at least the oddity of x-coordinate or y-coornidate are different. Therefore, we can just assign color in this schema, this means even if there is adject rectangles, we will guarantee that they have different color
Codeforces Round 402
Another round at 5am….
779A: Pupils Redistribution
I failed this A problem!!! To calcualte the final answer, i focused on the case where a[digit] - b[digit] > 0. However, I didn’t consider the case when a[digit] < b[digit] , and the delta is odd. This means an exchange can never happen!
For B, spend 10 mins on the index/length of the proper prefix/suffix length, should have realized that when i set start = n-1, it is the last index of the prefix:w
C is at least k. At first I coded the case = k, wasted time on changing that.
I wasted too much time on D. I should have realized that when it is hard to get insight to solve it in a closed form, probably should try do searching + simulation, i.e., find a small search space or binary search
Codeforces Round 401
777B: Games of Credit Cards
This problem took me too much time!!! Probably because I had to wake up at 4 am….
To give max hits, map M’s 9 to S’s 8, 8 to 7,….etc. With carry over potentially positive balances
To recieve min # of hits, map S’s 9 to M’9, the hit is whenever there is a positve delta
Note that we care about only positive deltas, we ignore negative delta, because in the end we will use left over postive deltas to fill those negative deltas, because two strings have same length, we know they will match
777C:Alyona and Spreadsheet
for each row, calculate the earliest start of the increasing subsequenceo
when answering the query [l,r], just look up r and find if answer <= l
777D:Cloud of Hashtags
Instead of calculating min # of removal, we calculate max # of keeps, because adding letter only “increases” a word, we start from the tail, and try adding letter as much as we can
so the last one, we can keep all letters
the previous one, we can keep max prefix <= last one
keep going until the first one
776C: Molly’s Chemicals
Calculate the prefix sum, and find how many prefix sum before the current one, such that ps(current) - ps(previoius) = power of k
Since we care only about the previous prevous sums, we can calculate final answer while we build that, so that we don’t have to worry about interference of prefix sums after i
Overall run time O(n * log(k, 10^14))
Corner case k = 1 or -1 !!!