If you have the similarity between pairs of columns and try to find an order such that the similarity between adjacent columns is minimal, you've basically got a version of the travelling salesman problem.
Which is NP-Hard in general, but this instantiation of it can be made significantly easier than a random instantiation (in clock time, anyways).
You could pick a similarity metric such that adjacent columns have some bound on worst-case similarity - this allows you to reject many adjacencies. This could be chosen empirically by analyzing other known images. You could additionally model the probability distributions of similarity for adjacent columns, thus allowing you to reject further adjacencies and sets of adjacencies.
By using all this structure you should be able to do much better than this simulated annealing - which I have yet to see complete successfully on my machine, at any rate.
Is the author here? What energy metric is this using? Total variation?
I like your idea of rejecting pairs of columns that are too dissimilar. You are correct that my simulated annealing algorithm does not completely unscramble the image, and I believe this is because of the horizontal symmetry ambiguity.
The energy metric being used is simply the sum of absolute differences across all 3 channels for all horizontal adjacent pairs of pixels.
In pseudo-Python notation: sum(abs(pix[x,y,ch] - pix[x+1,y,ch]) for x in range(width-1) for y in range(height) for ch in range(3))
Cool - that's TV on RGB. There may be improvements on the channel selection in the literature in terms of "selectivity" for real images - i.e. HSV or something else. That might be an easy change that would bump your performance.
Also you could apply some nonlinear function to calculated TV to reflect your rejection penalties - i.e. if it's greater than x make the energy arbitrarily high. The thing is you would like to then reject the whole family of solutions that had that adjacency in your annealing, which doesn't seem super straightforward to me.
The other thing I see at the end is that it hits a local optima of multiple subsets of correct adjacencies (up to reflection) but there's no way to permute those subsets to get a better solution, as the temperature is too low. You'd almost want to "group" those subsets and then rerun on the 8 or so segments just over permutation and reflection. On the other hand, it does suggest that a "greedy" approach gets you pretty close - finding a locally optimal solution solves part of the global optimization. Then you can reduce the final problem to an exhaustive search of a much smaller space.
> I believe this is because of the horizontal symmetry ambiguity.
One quick test for this would be to make the strips 2 pixels wide.
It's interesting to watch the random walk that it does, and I think that problem would remain. Once it has made some wider strips that should be adjacent but aren't, the strips tend to randomly move between them, but there is no reason to prefer progress in one direction or the other.
I'd be interested to see what happens if you only allowed moves in one direction, e.g. only allowed a column to be inserted to tbe right of where it was.
Although the travelling salesman problem is NP-hard, and it’s possible to construct hard-to-solve instances, there are algorithms that work very well for most instances.
I just tried using LKH[0] to reconstruct these scrambled images, and it does it perfectly in a fraction of a second.
Interesting experiment! It took me a minute to understand why you mentioned the TSP, but then I realized that you can model each column as a destination and the pixel difference between two columns as the distance. Once I got that, I can certainly appreciate that a state-of-the-art TSP solver will solve the scrambled images perfectly in no time. =)
Interesting, although I guess I should have expected a TSP solver to do well. FWIW simulated annealing can also be used to solve TSP instances. Although clearly it doesn't do too well in this case.
Anyway, I only mentioned it to point out it wasn't a simple matter of calculating the similarities between columns and 'sorting' them, you need a more sophisticated algorithm.
Interesting to see how bad simulated annealing does, by not being able to step over so many hills.
LKH is a bit unfair, as this problem cries for a Travelling Salesman, but still SA doing so bad is astonishing. Given how often it is used for optimization problems. Maybe solvers should be used more often.