Just for the fun of it, I've been reading a bit about the STM32H7 since you mentioned it. One half baked idea I have to achieve only one interrupt per scan line would be to use multiple DMA channels to drive the same SPI peripheral, one for the display SS and the other for the touch SS. Have a scanline ISR start both transfers at the same time, and rely on the well defined DMA channel priorities to deconflict them. It seems like this could possibly work given that the high level DMA documentation says it can control SS.
What I don't know at the moment is how exactly the DMA channel actually controls SS. Does the channel itself have knowledge of the signal, or does it just poke the SPI peripheral at the right time? Said another way, are SS pins assigned to DMA channels, or are they assigned to the SPI peripheral? There's also clock frequency and data mode that would need to be associated with the DMA channel. My guess is it's the latter and my idea wouldn't work.
Both options seem like they would have been reasonable from a design perspective. Being able to associate a specific slave device with a DMA channel and letting the hardware arbitrate the bus would certainly be a beneficial feature. The implementation would be more complicated than just letting the SPI peripheral do all the configuration and state management, and simply enabling DMA to shovel bytes in and out of the FIFO seems the simplest. My guess is that it's the absolute simplest implementation, and the SPI peripheral just has a setting to automatically de-assert SS when its TX FIFO and shift register is empty.
Once again practically, though, I don't think it's too crazy to drive this particular display and touch screen with per-scanline interrupts. Assuming 30Hz 240 lines, that's a baseline of 7.2Khz IRQs. If you want 400Hz touch samples, that's only 400Hz more IRQs. Go crazy and sample the touch screen at 2.8Khz as long as there's enough bus bandwidth. While it feels a bit inelegant to not have the scanlines be harmonic with the touch updates, the resulting jitter is at a sufficiently short timescale that it doesn't matter. Also, for this class of hardware, you're devoting a significant chunk of RAM to any sort of framebuffer. Expecting to budget a modest amount of CPU to push the framebuffer to the display seems completely reasonable.
As you mentioned, it seems like the original intent of the designers were for folk to treat the display hardware as a SPI accessible frame buffer. They were probably targeting 8-bit AVRs popularly used in 3D printers? Update rate was probably the last of their requirements. They probably didn't even consider someone trying to drive the display with DMA. The fact that you're able to drive 50fps is quite remarkable, on top of your achievement of fully offloading it on the rp2040.
Sadly SS is associated with an SPI peripheral, and only one will drive the same set of pins so this will not work without software to manually control nCS lines as far as i can tell
> 7.2Khz IRQs
M7 takes at least 14 cycles to enter an IRQ handler, and 12 to exit. Your compiler will push at least 2 registers (as per abi alignment requirements) and pop 2 at the end, that is 6 cycles more, and that is before you've done anything, AND assuming 0-wait-state flash (unlikely). Add in, say 100 cycles for your irq handler actual code, and suddenly 1MIPS is gone, AND your main code can have latency spikes of over 132 cycles, which may matter (i have an M7 project where random latency over 6 cycles breaks the required timings for example)
That's my point, though. 1 MIPS is less than 1% of the available CPU cycles for a typical M7! :-) As long as you can tolerate the jitter caused by interrupts at all, it seems like it should be fine.
6 cycles is pretty darn tight! I'm curious what you're controlling.
What I don't know at the moment is how exactly the DMA channel actually controls SS. Does the channel itself have knowledge of the signal, or does it just poke the SPI peripheral at the right time? Said another way, are SS pins assigned to DMA channels, or are they assigned to the SPI peripheral? There's also clock frequency and data mode that would need to be associated with the DMA channel. My guess is it's the latter and my idea wouldn't work.
Both options seem like they would have been reasonable from a design perspective. Being able to associate a specific slave device with a DMA channel and letting the hardware arbitrate the bus would certainly be a beneficial feature. The implementation would be more complicated than just letting the SPI peripheral do all the configuration and state management, and simply enabling DMA to shovel bytes in and out of the FIFO seems the simplest. My guess is that it's the absolute simplest implementation, and the SPI peripheral just has a setting to automatically de-assert SS when its TX FIFO and shift register is empty.
Once again practically, though, I don't think it's too crazy to drive this particular display and touch screen with per-scanline interrupts. Assuming 30Hz 240 lines, that's a baseline of 7.2Khz IRQs. If you want 400Hz touch samples, that's only 400Hz more IRQs. Go crazy and sample the touch screen at 2.8Khz as long as there's enough bus bandwidth. While it feels a bit inelegant to not have the scanlines be harmonic with the touch updates, the resulting jitter is at a sufficiently short timescale that it doesn't matter. Also, for this class of hardware, you're devoting a significant chunk of RAM to any sort of framebuffer. Expecting to budget a modest amount of CPU to push the framebuffer to the display seems completely reasonable.
As you mentioned, it seems like the original intent of the designers were for folk to treat the display hardware as a SPI accessible frame buffer. They were probably targeting 8-bit AVRs popularly used in 3D printers? Update rate was probably the last of their requirements. They probably didn't even consider someone trying to drive the display with DMA. The fact that you're able to drive 50fps is quite remarkable, on top of your achievement of fully offloading it on the rp2040.