Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They trained on text and audio and images. The model accepts tokens of all three types. And it can directly output audio as well as text.


It can also directly output images. Some examples are up on the page. Though with how little coverage that's gotten, not sure if users will ever be able to play with that


People are saying that GPT-4o still uses Dall-e for image generation. I think that it doesn't match the quality of dedicated image models yet. Which is understandable. I bet it can't generate music as well as Suno or Udio either. But the direction is clear and I'm sure someday it will generate great images, music, and video. You'll be able to do a video call with it where it generates its own avatar in real time. And they'll add more outputs for keyboard/mouse/touchscreen control, and eventually robot control. GPT-7o is going to be absolutely wild.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: