This is sort of a distinction without a difference. It's an autoregressive seque...

This is sort of a distinction without a difference. It's an autoregressive sequence model; the distinction is how you're encoding data into (and out of) a sequence of tokens.

LLMs are autoregressive sequence models where the "role" of the graph convolutional encoder here is filled by a BPE tokenizer (also a learned model, just a much simpler one than the model used here). That this works implies that you can probably port this idea to other domains by designing clever codecs which map their feature space into discrete token sequences, similarly.

(Everything is feature engineering if you squint hard enough.)