CEPC'09

Solutions' Sketches

Here you can find solution sketches for all the problems. They are by no means meant to be a comprehensive description, but you should be able to get the general idea in case you are stuck with any task.

Accountant notes

The problem statement can be easily reformulated as the following pattern matching problem: you are given an alphabet consisting of two separate parts: terminals, denoted by small letters, and nonterminals denoted by large letters. We consider two strings over this alphabet similar iff there exists a permutation of all the nonterminals which transforms one string into another. For example string "aXbYcX" is similar to "aZbVcZ", but it is not similar to "aZbZcZ". We say that one string occurs in another one if it is similar to its substring. The task is to find (first) occurrences of a given set of strings in a large text.

The defined above similarity is an equivalence relation thus it might be useful to assign a representative (canonical) string to each equivalence class. An obvious choice for such representation would be the the (lexicographically) smallest string but it turns out to be not very convenient to work with. Therefore, the representation that we are going to use will actually be defined over a completely different alphabet (so in fact it is not going to belong to the given equivalence class).

If you think about it for a moment, the only important thing about the nonterminals is equality, we need to know which of them are the same and which are not. Therefore, we replace each nonterminal with a number, which is the distance from the previous occurrence of the same nonterminal. In case the nonterminal is completely new, we use INF (or, more precisely, a number which is greater or equal to the length of prefix read so far). So, we will encode "aXbYcX" (and "aZbVcZ" as well) as "a2b4c4".

How do we decide if one string is similar to another? In the ideal world (that is, if we had chosen representative to actually be a representative) it would suffice to compare representatives. This works quite well for two strings of same length, however we need to find occurrence of one string in another one, which is possibly longer, and thus the fragment may contain numbers larger than the length of the prefix. For example a text "YcaXbYcX" is encoded as "1ca4b6c4", which makes it difficult to realize that it contains "a2b4c4".

To check if we have found an occurrence, you just need to replace all numbers greater than their position by their position itself, which will effectively convert "a4b6c4" to "a2b4c4", and now you can compare it byte by byte.

This leads to the following algorithm (I hope that reader is familiar with the Aho-Corasick Algorithm, otherwise it is not going to make much sense):

Convert patterns and text from the input to the language of terminals and nonterminals. This can be done using hashtables in roughly linear time. Care must be taken to separate lines from each other.
Replace each nonterminal with the distance from its previous occurrence, or its position, if there is no previous occurrence. Using a hash table, you can do this in linear time as well.
Put all patterns in a TRIE, just like in the Aho-Corasick algorithm. If you store children in hashtables you can do that in linear time.
Rules for navigating in this tree are almost the same as in the original algorithm, unless you try to visit child N, where N is greater or equalto the depth of current node. Then you should try to visit child whose number is the same as the depth of current node.
For each node of the TRIE find the backlink edge using a single BFS pass over the TRIE.
For each node corresponding to one of the pattern we are looking for, add its number to the list of matches.
Once you have built the automaton, you just perform usual Aho-Corasick Algorithm, remembering that there is a special rule for numbers greater than current depth. Whenever you visit a node that has not null pointer to the list of matches, and is not marked as reported you report all of them recursively, and mark all of them as reported. This is necessary to avoid n^2 complexity.

We assumed the program may use a STL map which increases the complexity from O(n) to O(nlgn).

Better and faster!

The problem asks you to maintain the checksum of a given string while replacing its letters. It helps to think about the string as a sequence of bits, then the checksum is easily seen to be CRC32. As you probably know, to calculate CRC32 we consider the input sequence as a polynomial w(x) and calculate its remainder when divided by a fixed polynomial p(x). So, flipping a single bit at position i results in a polynomial w'(x)=w(x)+xⁱ and thus to compute checksum of the modified text we only need to know the checksum of the original text and xⁱ mod p(x). All those values of xⁱ mod p(x) can be precomputed in linear time, resulting in a constant time processing of a single query.

Cave

We will count maximum possible level of fuel for each position [i, i+1]. Having it, counting total capacity of the cave is easy. To understand the solution we should think that we pour fuel to the cave. It leaks according to physics laws and forms several (or less, or more) ponds. Ponds are separated by rocks of the floor. The surface of each pond is horizontal. Moving along it to the left (we remember that our pond is two dimentsional) or to the right we may touch the ceiling (when the level of ceiling and the surface of the pond are equal) but not hit it (when the level of ceiling is strictly smaller than the surface of the pond). We calculate the level of fuel in each place of the cave, considering separately obstacles to the left of this position and to the right.

We start from obstacles positioned to the left. For each i we count, what is the highest possible level of the surface of the pond in this point, assuming that the cave ends at i + 1 (i.e. positions to the right of i do not matter). We go from left to right and maintain a level. For the position i, we know that drawing a horizontal line at level level to the left (i, 0], we touch the floor before we hit the ceiling. We also know that it is the highest value with this property. To keep above properties going from position i - 1 to i we consider following cases:

if floor[i] > level then we can rise the level level := floor[i]; we maintain the properties, as:
- we touch the floor just at point;
- greater level should be correct for previous position; as it is not, thus level is optimal;
if ceil[i] < level then we must lower the level level := ceil[i], otherwise we hit the ceiling at position i; we can also notice that it is not necessary to lower it more, as previous value was accessible for position i - 1.
if floor[i] ≤ level ≤ ceil[i] we can keep the value of level unchanged; is correct and can not be raised due to restrictions, which were already valid for the position i - 1.

After the previous phase and its reverse we have counted for each position two levels: let us denote them leftLevel and rightLevel. It is easy to see, that a correct and optimal level for each position is level = minium(lefLevel, rightLevel). Pouring the fuel to this level at this place is safe - we can draw the surface of the pond to the left and to the right from position i without hitting the ceiling. We can also see that the pond surface behaves according to phisics laws - it is horizontal, without any "jumps". If it was the opposite, we can find to neighbouring positions [i, i + 1) and [i + 1, i + 2], where the optimal level differs. But it is not possible if there was neither ceiling nor floor touching the surface.

Analysis of the cave requires two passes: from left to right and vice versa. It works in time O(n) and requires O(n) memory cells, where n is the length of the cave. The result is within range of long integers.

Decision

This was supposed to be the simplest task. The only thing you have to do is to construct the corresponding graph, where each vertex corresponds to a single cell and two vertices are connected with an edge iff their corresponding cells share a dark edge, and find its connected components using, for example, depth-first search.

Everybody may get lost in space

You might have seen a simpler version of this during the trial session. In the modified variant, you are asked to compute the volume of a union of three spheres. In case of just two spheres, you might be tempted to derive a closed form formula. Or, even better, to look it up here (as long as you can find the volume of the intersection, you can deal with the union as well). Unfortunately, the formula given there is slightly wrong and misses a case when the centers are close to each other (and, if you are trying to derive it by yourself, it is quite easy to make this, or another, mistake). But the problem asks about the case of three spheres. If you start trying to derive the closed form formula for such case, the number of special cases seems to be almost overwhelming. So, we are going to try another approach instead.

Consider the intersection of the union with a vertical plane (consisting of all points with the same x coordinate). Let the area of this union of (at most) three circles be f(x). What we really need to compute is the definite integral of f(x)! Fortunately, we are only required to compute this value up to a (relative) accuracy of 10^-6. So, we might try numerical integration. In its simplest form it work by cutting the whole interval into a number of slices and approximating the volume in each such slice with the are in the middle multiplied by the width of a single slice (this is called the midpoint rule, to get better accuracy you should try the trapezoidal or Simpson's rule). Unfortunately, computing the area of the union of as many as three circles does not seem like something very simple to code... but there is one trick we can use to simplify the computation: the choice of the cutting (vertical) plane was rather arbitrary, we can choose any other direction! Or, in other words, we can rotate all three spheres at once so that the y and z coordinates of two of them are the same. Then you can observe that the intersection will consist of just two circles. Calculating the area of the union of just two circles is a simple math exercise, with almost no special cases at all.

(False) faces

In this problem you are asked to check if the number of perfect matchings in a given bipartite graph is divisible by 4. As you might know, computing this number is rather difficult (although there are efficient approximation algorithms).

It turns out that reformulating the problem in algebraic terms makes it much easier. Let M be the matrix such that M[i,j]=1 if there is an edge from the ith left vertex to the jth right vertex (and 0 otherwise). Then what we are looking for is the permanent per(M) modulo 4, where per(M) = ∑_{π∈S_n} ∏_i=1,...,nM[i,&pi(i)]. You certainly have seen a similar formula before det(M)=∑_{π∈S_n}(-1)^sgn(π) ∏_i=1,...,nM[i,&pi(i)]. The question is: are they related? Well, not really, this (-1) changes a lot. But, as long as we are interested in calculating the value modulo 2, (-1) is the same as 1 so det(M)=per(M) modulo 2. This does not give a complete solution to the original problem yet, but gets us a little bit closer: we can check if 2 divides det(M), if not then 4 certainly does not divide per(M). Which solves a constant fraction of all possible input instances :)

Now we can focus on the case when 2 divides det(M). In such case the rows of M (call them v₁,...,v_n) are linearly dependent over Z₂, meaning that we can find 0/1 coefficients a₁,...,a_n such that the vector t=a₁*v₁ + ... + a_n*v_n has all coordinates even and (without losing the generality, as we can always reorder the rows) a₁=1. We will use the following notion: per(v₁,...,v_n) is the permanent of the matrix consisting of rows v₁,...,v_n. Then observe that the permanent is a multilinear function, meaning that per(M)=per(t,v₂,...,v_n) - ∑_i>1 a_i per(v_i,v₂,...,v_n). Thus to calculate per(M) we only have to compute all those summands. If we work modulo 4, this is much easier than the original problem! Observe that:

per(t,v₂,...,v_n) is the permanent of a matrix with all the values in the first row even. So, it is the same as 2*per(t',v₂,...,v_n), for some t'. Computing this value modulo 4 requires computing per(t',v₂,...,v_n) modulo 2, which we already know how to do.
per(v_i,v₂,...,v_n) is the permanent of a matrix with some row appearing twice. Without losing the generality assume it is the second one. Then the value we are looking for is ∑_i≠j v₂(i) v₂(j) per([v₃,...,v_n] with columns i,j removed). By reordering the summands we get that it is 2 ∑_i<j v₂(i) v₂(j) per([v₃,...,v_n] with columns i,j removed), which (again) reduces to computing the value of determinant modulo 2.

The above reasoning shows that checking if 4 divides per(M) reduces to n² calls to computing the determinant modulo 2. This gives us a polynomial time algorithm but is way too slow to work for n=300. Fortunately, one can observe that the matrices we are reducing the original problem to, are quite similar and does not need to perform Gaussian elimination on each of them from scratch. This results in just O(n³+n*M(n)) time algorithm, where M(n) is the time required to execute Gaussian elimination on a binary matrix (this can be done really efficiently by working with whole words, instead of single bits).

Garlands

We are given a sequence of natural numbers w₁,...,w_n and asked to split it into m (m-1 if we stick to the definition from the problem statement) fragments of even length such that each fragment is:

nonempty,
consisting of at most 2d numbers.
of even length,

in a way which minimizes the maximum weight of a half-fragment. The first observation is that it is much easier to work with the decision version of the problem, in which we are given some upper bound on the weight of all half-fragments and are asked to check if there exists a valid partition. If we know how to solve this variant, we can solve the original version as well, using a simple binary search.

You might be tempted to make the following observation: if there is a partition into x fragments, there is a partition into x+1 fragments as well. This would make a simple dynamic programming solution possible. Unfortunately, such observation is WRONG! which can be seen even in the sample test case. Fortunately, its slightly more complicated variant is true: if there is a partition into x fragments, there is a partition into x+2 fragments as well. So, you can use the following dynamic programming solution: let t[i][j] be the minimum number of valid fragments necessary to split w_i,...,w_n into odd (if j=1)/even (if j=0) number of fragments. Each t[i][j] can be computed in just O(d) operations (and, in most cases, even faster as you can stop as soon as the weight of the first half-fragment exceeds the given bound).

Hypervisor MacrOS

See here.

Island

Imagine that you start with all the cells flooded and decrease the current time. Then some cells become unflooded and either create a new area, or join some already existing areas (maybe just one). So, you can simulate the whole process efficiently using a union-find data structure.

Jack's socks

It turned out that our test data for this problem was rather weak, and unfortunately quite a few teams got a (wrong) heuristic accepted.

The solution we were looking for was as follows: assume that the given graph is connected and contains a bridge e (which is an edge after removing which the graph ceases to be connected). Assume that after removing e you are left with two connected components consisting of n₁ and n₂ vertices, respectively. Then if both n₁ and n₂ are even, e cannot belong to any perfect matching! so it can be safely removed from the graph. On the other hand, if both those value are odd, e belongs to all perfect matchings (if any) so you can safely remove its ends from the graph. So as long as the graph is not biconnected, you can reduce the problem. The question is, what if there is no bridge?

It turns out that in such case, there is either no perfect matching, or more than one. This rather simple to prove if the graph is bipartite: assume that there is at least one perfect matching. Orient each edge from left to right if it belongs to this matching, and from right to left otherwise. Observe that if the resulting graph contains a nontrivial strongly connected component, you can flip edges belonging to a simple directed cycle to get another perfect matching. So, assume that all those components are singletons and thus the graph is acyclic. Take its first vertex v in (any) topological ordering and without losing the generality assume that it is a left vertex. It does not have any incoming edges, on the other hand it is matched so it must have exactly one outgoing edge. It is easy to see that this edge is a bridge in the original graph.

By looking at the execution of Edmond's algorithm for finding the maximum cardinality matching in a nonbipartite graph, you can transform the above proof to the general case.

Knowledge for the masses

We call each rail a row, a bookrack - an item, maximal group of consecutive boockracks that are not separated by gaps - a block. Number of items in a block is its length. We will analyze each row separately and will find a cost of making a gap at position [i, i + 1] for 0 ≤ i ≤ L. Then results obtained for each row will be simply combined to the result for the whole library.

We start scanning a row from left to right. While scanning we count b₁, b₂, . . . , b_k-1 - numbers of items in consecutive blocks. Let us suppose we are encountering the k-th block now, and an item I in this block occupying initially positions from i to j. Let r be the number of I in its block (counting from left). Then, it is possible to form gaps:

at position [i - 1, i] with cost r - 1;
at position [i - 2, i - 1] with cost b_k-1 + r - 1;
at position [i - 3, i - 2] with cost b_k-2 + b_k-1 + r - 2, etc.

If k ≤ (i - j) then we have new costs for positions from i - k up to i - 1. If k > (i - j) then we stop counting new costs at position j, as moving item I is not necessary to form a gap at positions to the left of it. We repeat the same procedure scanning blocks from right to left and taking better cost from this two possibilities. Complexity of analysis is at most O(n) - number of items in a row, plus O(L), which bounds the number of possible gap positions in the row.

We analyze in constant time each position of each row, where a gap can be formed. Thus the total time can be bounded by O(LR) and usually is smaller as not all positions are available. Memory of size O(L) is necessary to store partial solutions and solutions for each row.

Screen-friendly version of this page