I think they will though, I think the enormous corpus of video data and the supercluster that powers self driving development are the machine vision analog of internet scale text data that gave rise to LLMs. We'll see the same moment for vision models that text prediction models had once the data is there, where an enormous foundation model becomes much much better, especially at zero-shot tasks.
I would guess the plan is to have the foundational machine vision tech that becomes the core of robotics sensors. Not just Optimus but every robot arm in a factory, robot mule, etc. I don't think everything will have LIDAR if its proven to be unnecessary.