One of the authors here. Short story is, check out our web demo (on desktop Chrome) for real-time music generation and editing! Music is made on the fly by our diffusion models as you control the transition between two random genres. Every time you make a change, you're creating 30s of 44kHz stereo in 0.3s on a GPU via diffusion. Genre transitions are one of the most fun use cases we've found.
Longer story: The web demo is a way to earn your curiosity. The Riffusion crew is released our free iOS app today, and that's the real juice. We’re thinking constantly about how to create a new instrument with gen music, and for us this is wildly fun experience.
At the core, we’ve been training music foundation models that are high quality, diverse, controllable, and an order of magnitude faster than anything done before. This required a ton of deep investment across the whole stack from compact representations to architectures to deployment infra. The result is fast enough to enable totally new workflows for this stuff.
The primary input is to capture/upload a photo or video with optional personalization, then the app will generate original lyrics and music based on the context. The next time you see your dog or kid do something funny, you can turn that into a vibe in seconds. We’ve been having a blast creating music in moments that we never would have before.
The most remarkable part is the editing, as sampled in the web demo. In the edit view, you can go deep adding multiple prompts on a timeline and controlling their strengths and interactions. You can mix genres into something totally new or make transitions like going from an operatic aria to death metal. The near-real time editing flow is something that to me feels totally new, like trying GarageBand when it first came out.
There’s a ton more we’re excited to launch in the coming weeks as we work to create more musicians in the world. Excited to hear what you all create!