Hacker Newsnew | past | comments | ask | show | jobs | submit | psyq123's commentslogin

It is not really 1D - to perform any T/F transform (FFT, (M)DCT, etc.) you need a number of samples in the time domain, so you are essentially transforming 2D (intensity over time) to another 2D representation (magnitude or magnitude+phase over frequency) - this is why MP3 style codecs usually have multiple frame (or "window") lenghts, usually one longer for high frequency resolution and one shorter for high temporal resolution.


That’s exactly what I mean. Break up the 1D audio into 2D samples in time and frequency space. Train the AI in this space plus diffusion noise, and have it generate de-noised output in this space.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: