They trained on text *and* audio *and* images. The model accepts tokens of all t...

famouswaffles · on May 13, 2024

It can also directly output images. Some examples are up on the page. Though with how little coverage that's gotten, not sure if users will ever be able to play with that

modeless · on May 13, 2024

People are saying that GPT-4o still uses Dall-e for image generation. I think that it doesn't match the quality of dedicated image models yet. Which is understandable. I bet it can't generate music as well as Suno or Udio either. But the direction is clear and I'm sure someday it will generate great images, music, and video. You'll be able to do a video call with it where it generates its own avatar in real time. And they'll add more outputs for keyboard/mouse/touchscreen control, and eventually robot control. GPT-7o is going to be absolutely wild.