ROCK in O(N)

cake_is_a_lie 2017-02-23 15:50:29 UTC #1

http://www.spoj.com/problems/ROCK/

The constraints seem to be set based on an O(N^3) expected solution, and even then they offer a generous margin.
The problem CAN be solved in O(N), making it much more interesting than it seems at first glance.

There has already been a discussion about increasing the difficulty or adding a version with higher constraints after some people found the fairly obvious O(N^2) solution, but I don't think there was any follow-up on that from the problem setter.

Given that I don't know of any harder version of this problem to solve and everybody seems to think this is O(N^2) at best, I thought it might be at least worthwhile to post an explanation of how it can be solved in O(N).

SPOILERS BELOW

Problem statement, in short:
Given a binary string X,
- split it into any number of substrings
- to maximize S, the total length of substrings with more 1s than 0s.
Print S

We reduce the problem to computing the minimum number of acceptable substrings that include all 1s

Let's call c0(w) the number of 0s and c1(w) the number of 1s in the string w. Let val(w) = c1(w) - c0(w). Acceptable substrings w have val(w) >= 1.

We easily notice that c1(X) <= S <= 2 * c1(X) - 1

This happens only if we can make one large substring with all the 1s in it; if not, we "lose" at least one more 1. As a matter of fact, we notice that for each acceptable substring we have, we "lose" a 1.

Therefore, start with the substrings of all 1s, and try to join them together such that we produce substrings with val(w) >= 1. The optimal solution will have a minimal number of substrings, smin(X) and all of them will have val(w) = 1 except if smin(X) is 1.

w.r.t. the 2nd part of that last claim, assume we have a substring with val(w) > 1; then either c1(X) > c0(X), w is X and S is |X| (i.e. N, the length of X) or there are some 0s that we can add to w until its value reaches 1.

We conclude
S = min(2 * c1(X) - smin(X), |X|)

Since c1(X) and |X| are known, we only need to find a way to compute smin(X)

We compute smin(X) using DP

Let's encode X as blocks (n1,p1)(n2,p2)... where X is 0(n1 times)01(p1times)10(n2 times)01(p2 times)1...
For example, 0010111101100000 would be encoded as (2,1)(1,4)(1,2) - notice we ignore the trailing 0s; the leading 0s are also irrelevant.

We can recusively define smin based on i - the number of blocks used and v - the minimum value of the last substring. Using oo to denote infinity, we have:

smin(1, v) = 1, for -oo < v <= p1
smin(1, v) = oo, for p1 < v < oo

smin(i, v) = min(f(i - 1, 1) + 1, smin(i-1, v - pi + ni)) for -oo < v <= pi
smin(i, v) = smin(i-1, v - pi + ni) for pi < v < oo

When we want to integrate the ith block into our solution, we have 2 options:
* We extend the final substring with ni 0s and pi 1s. If the final substring had value vold, the new value for the final substring will be vnew = vold - ni + pi and therefore vold = vnew + ni - pi. The number of substrings is unchanged.
* We skip the 0s and start a new substring with pi 1s. This is only relevant in cases where the final substring was acceptable - i.e. had a value of at least 1. The value of the new final substring will be pi and there will be one more substring.

I leave it as an exercise to the reader to find reasonable bounds to replace -oo and oo when iterating over v. What I will say is the interval for useful values of v will be O(N) long and combined with an O(N) iteration over i this gives us O(N^2).

We calculate smin(i,v) efficiently

First of all, notice that to compute smin(i,x) we only need smin(i-1,y), so only O(N) memory is required to keep the tables smin(v) and smin_prev(v). Furthermore, notice that to obtain smin from smin_prev we perform two operations:

1) shifting smin_prev by ni - pi. A naive implementation is O(N), but instead of shifting the elements of the array we can simply shift the array pointer in O(1).

2) replacing elements of value pi and less with smin_prev(1) + 1, if they are greater than this value. This can be shown to cost an amortized O(N) over the entire computation of smin.

Therefore, the cost of computing S is O(N) * O(1) + O(N) = O(N)