each text token is often subword unit, but in VLMs the visual tokens are in sema...

		runeblaze 59 days ago \| parent \| context \| favorite \| on: DeepSeek OCR each text token is often subword unit, but in VLMs the visual tokens are in semantic space. Semantic space obviously compresses much more than subword slices. disclaimer: not expert, on top of my head