BLOCKS
It was a great problem, thanks!
Theorem 1.
If the blocks in the initial and final positions are same, the puzzle is solvable in 3 steps.
Proof
We will first shuffle columns, then rows, then columns again.
Let's match positions in the initial position with positions in the final position in an arbitrary way, such that the color is same. Therefore, we have n^2 quadruples of numbers:
(x1,y1,x2,y2)
Every (x1,y1) appears exactly once and every (x2,y2) appears exactly once.
We want move the block (x1,y1,x2,y2) via some (x1,y3): (x1,y1) -> (x1,y3) -> (x2,y3) -> (x2,y2), so that all (x1,y3)'s are distinct and all (x2,y3)'s are distinct.
In other words, for each y3 we want to choose n quadruples with distinct x1's and distinct x2's.
Define A[i,j] to be the number of quadruples with x1=i and x2=j. The sum of every row and of every column of this matrix is n. For y3=1, we want to choose n positive elements of this matrix, each from a different row and different column. Then we will subtract 1 from those elements and continue. (in other words, subtract a permutation matrix). In the second step, the sum of each row and column will be (n-1).
It remains to be proven that in a matrix with nonnegative integer entries such that each row and each column sums up to the same (positive) number, we can choose one element in each row and in each column.
This is such an interesting property. It took me a long time to prove it and at some points I was convinced that it is not always possible.
It is not enough that the sum of each row and each column is positive (not equal)! For example:
033330
300003
300003
300003
300003
033330
OK - now why the property is true.
It is an immediate consequence of the Hall's Marriage Theorem. We are matching rows to columns. The theorem states that it is possible to match set A to distinct elements of B iff for each subset S of A, the number of elements matched by S is greater than or equal to |S|.
In our case, choose a subset S of rows. Suppose the sum of each row and column is c. Then, the sum of these rows is |S|*c. The matched columns contain all the positive entries out of these rows, so the sum of these columns must be at least |S|*c, hence there must be at least |S| of them!
QED
Now the algorithm.
From the theorem it is easy to check whether the puzzle is solvable in 3 steps. It is also easy to check if it is solvable in 0 or 1 steps. What remains is to check whether it is solvable in 2 steps, if it is solvable at all (the colors are same).
Suppose we are permuting columns first, and then the rows (then we transpose the inputs and check again).
For each color calculate the set of initial and final positions for that color. We want to match the initial positions (x1,y1) to the final positions (x2,y2) so that all the intermediate squares (x1,y2) are distinct.
If there is just one copy of a color, we just match and mark the square (x1,y2) as used.
Now what happens if there are two copies (that was the most possible) of a color? If the initial positions are in the same column or final positions are in the same row, again there is just one possibility really (mark squares (x1,y2) and (x1,y2') as used).
OK, but we are left with some colors for which there are two options - either mark (x1,y2) and (x1',y2') OR (x1,y2') and (x1',y2) as used.
Now this part of my algorithm was pretty dumb
(come on, I was tired) But I don't know if it can be done much better.
For some colors, one (or both) of these options may not be possible, due to the fact that some squares have already been marked. Then we know what to do - choose the other option.
I just do this repeatedly until all the squares in question are unmarked for each of the colors left.
Now guess what I did next.... Hopefully we are left with not so many colors (I guess 0 most of the time), so just brute-force all 2^K possibilities!