Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love flattened ASTs. One of my favorite uses of them is for inline markup in pulldown-cmark (see [1] for a brief description).

Here's a sketch of the problem being solved. The raw input is broken into a sequence of nodes, and something like * is turned into a "MaybeEmphasis" node, as it has the potential to turn into emphasis, or remain text if no match for it is found.

Another pass goes through those nodes in sequence, using a stack to find potential matches (the rules for whether a pair of such nodes actually match are quite fiddly and complicated). When a match is found, the MaybeEmphasis node is turned into the appropriate emphasis node, and the entire sequence of nodes between open and close is snipped out and made into a subtree of the new node. This is a somewhat unusual tree transformation, and a straightforward implementation could easily be O(n). But with the flattened AST representation, it can all be done O(1), no matter the number of nodes or stack depth.

For those interested in details, the tree representation is in [2] - there's basically a "child" and "next" index along with the node body - and the tree surgery on emphasis match is in [3].

The performance is excellent. I think pulldown-cmark may not be the single fastest CommonMark parser out there, but it's certainly competitive, and a lot faster than approaches that do, for example, an allocation per node.

[1]: https://fullyfaithful.eu/pulldown-cmark/

[2]: https://github.com/raphlinus/pulldown-cmark/blob/b7e709c0bd6...

[3]: https://github.com/raphlinus/pulldown-cmark/blob/b7e709c0bd6...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: