Not sure what you mean, but the transformer model can only predict one token at a time. The final output layer needs as many nodes (or neurons, if you will) as there are distinct in the token vocabulary. So a large token vocabulary is expensive and that's why GPT-3 and LLaMA have only about 50000 different tokens and use BPE to find a set of useful tokens. They still can express every possible English text because the token vocabulary contains the whole latin alphabet.
Unicode 15 has nearly 150000 characters and CJK languages have even more than that because of Han unification.
A model like GPT-3 can only output a very primitive version of Chinese. My question is how real Chinese models deal with this and specifically how tokenization works in that case.
Yes, but the tokens are translated into bytes not characters. There are only 256 distinct bytes so GPT models can easily be trained to produce any character. Probably the problem will be how sensible or understandable the binary form of Chinese characters in Unicode are, but that will be a problem for the model, not the tokenizer
I'm pretty sure the set of tokens also contains all 256 Bytes to cover such cases.