Word embeddings are "prior" to an LLMs facility with any given natural language as well. Tokens are not the most basic representational substrate in LLMs, rather it's the word embeddings that capture sub-word information. LLMs are a lot more interesting than people give them credit for.