The paper (https://static.usenix.org/publications/compsystems/1993/win_... linked to from the page) is well-written and worth reading, iterating over multiple C programs that build up to this sort algorithm and giving a good rationale for each iteration.