Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is interesting to see the use of the A* path finding algorithm for finding optimal matchings of nodes.

The approach from Chawathe et. al splits nodes from the before/after trees into chains by their label in the syntax grammar, and then runs myers’ longest common subsequence on each pair of chains. Some parameters t, f are used to have an approximate ‘equals’ method for subtrees.

This iteratively builds a set of matchings between equivalent nodes from the old and new trees. Here’s the paper https://dl.acm.org/doi/10.1145/235968.233366

I’d be curious to see if this approach handles re-ordering of nodes better. The ‘fastMatch’ algorithm described above will typically miss matching cases where a node that is not order sensitive (i.e a function in a namespace can be moved somewhere else in that namespace).



I've looked at several tree diffing papers, including Chawathe et al, but it's not always easy to assess whether their approach would suit difftastic.

I know that sdiff by Arun Isaac is based on applying that paper to Scheme: https://archive.fosdem.org/2021/schedule/event/sexpressiondi... , although he also reports performance challenges.

Most tree diffing papers that I've seen focus on either (1) providing a minimal diff and accepting the performance cost or (2) providing a relatively minimal diff and focusing on the performance.

I've generally found that you need a minimal diff to get a good result, so papers in (2) are less applicable. I've also found several cases where there are several possible minimal diffs, but there's a clear 'correct' answer from the user's perspective.

Difftastic doesn't handle moves: the edit set is add, remove, or replace similar comment. If you reorder functions, it will take the largest unchanged subset. Moves are hard to model in a diffing algorithm, but they're also very hard to display coherently in the UI.

I know a few code forge websites (e.g. Phabricator) show moves in a fairly comprehensible way, although they're all based on line-based diffs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: