I couldn't agree more. I feel that the block-coding and rasterized approaches that are ubiquitous in audio codecs (even the modern "neural" ones) are a dead-end for the fine-grained control that musicians will want. They're just fine for text-to-music interfaces of course.
I'm working on a sparse audio codec that's mostly focused on "natural" sounds at the moment, and uses some (very roughly) physics-based assumptions to promote a sparse representation.
I'm working on a sparse audio codec that's mostly focused on "natural" sounds at the moment, and uses some (very roughly) physics-based assumptions to promote a sparse representation.
https://blog.cochlea.xyz/sparse-interpretable-audio-codec-pa...