It's neat that the author was able to completely offload the work to the PIO. I'...

dmitrygr · on April 10, 2023

The fact that touch and display are on the same SPI bus making it impossible to just set up a repeating DMA to the display. On a non-RP2040 it wouldn’t be possible to drive it easily.

Most DMA units wouldn’t allow you to feed a scan line at a time and then poll touch. They aren’t that flexible. You’d be stuck with a lot of IRQs

sgtnoodle · on April 10, 2023

The last time I worked with DMA was on an Atmel cortex-M7. Definitely a higher end MCU, though. It supported arbitrarily complex DMA chaining via linked lists in RAM. That being said, doing ~10Khz or so IRQs just to kick off a DMA transfer wouldn't have been much CPU in the grand scheme of things.

For the touch input, is it necessary to truly sample at a high rate continuously, or is it more a need to oversample and take an average? If it's a need for oversampling, you could sample the touch screen between frames rather than interleaved between scanlines. Likewise, you could read from the SD card between frames as well. Writing to an SD card would be pretty terrible, given that cards often block for hundreds of milliseconds.

Once again, awesome work with the PIO. I imagine you're running the display at its theoretical update limit unless you try overclocking?

dmitrygr · on April 10, 2023

Between frames would be too rare for touch. You really do not want touch at just 45Hz. More samples are for averaging for smoothing. The touch controller here is noisy and the board layout waveshare did … does not follow the data sheet recommendations for lowering noise.

Sd between refreshes is a poor idea for the reason you mention, yes.

And yes screen interface is run as fast as it can as per data sheet. Overclocking the MCU wont help as we are not cycle bound there. Overclocking the SPI bus will work but not far. Past 70MHz it gets glitchy.

sgtnoodle · on April 10, 2023

I specifically meant rapidly sampling the touch screen N times between frames rather than 1 time between frames. Depending on the characteristics of the EMI, it seems like that could work just as well.

dmitrygr · on April 10, 2023

But in terms of temporal resolution you still have 45Hz which is too low for good inking.

crest · on April 10, 2023

You could use the DMA engine to reconfigure the function routing using a pair of DMA channels.

dmitrygr · on April 10, 2023

Sadly in many chips this is not possible. Eg STM32H7 has 3 types of DMA units and each has a complex table of where they can read and write from. Two kinds of DMA cannot write to anywhere that contains a DMA controller.

Rp2040 is rather unique in how powerful its DMA units are due to their reach. But it also has faults. Its DMA units seemingly cannot access GPIO, since that is bolted directly to the core (via the single cycle IO port). The only way to affect pins from DMA is via PIO as far as I can tell.

sgtnoodle · on April 10, 2023

Reminds me of a memorable Ethernet peripheral. The peripheral would use the MCU's RAM for its own buffering. There were frame headers that both the peripheral and CPU would need to manipulate. Being an M7 the CPU had a data cache, though, and so the CPU couldn't safely manipulate the headers without first completely disabling the cache.

I ended up using a DMA channel to indirectly manipulate the headers, bypassing the data cache. It was pretty silly but worked, and was way more efficient than disabling the data cache.

dmitrygr · on April 10, 2023

On an m7, you can use the MPU to assign non-cacheable attribute to a memory area. As small as 32bytes and as large as 4GB. Any size representable as 2^n - k * 2^(n - 3) for k = 0..7 and n >= 8

sgtnoodle · on April 10, 2023

The particular driver was being retrofitted into a large enough variety of existing firmwares that we really wanted to avoid introducing any sort of MPU or link time configuration. The DMA channel approach was the least impactful path.

sgtnoodle · on April 10, 2023

Just for the fun of it, I've been reading a bit about the STM32H7 since you mentioned it. One half baked idea I have to achieve only one interrupt per scan line would be to use multiple DMA channels to drive the same SPI peripheral, one for the display SS and the other for the touch SS. Have a scanline ISR start both transfers at the same time, and rely on the well defined DMA channel priorities to deconflict them. It seems like this could possibly work given that the high level DMA documentation says it can control SS.

What I don't know at the moment is how exactly the DMA channel actually controls SS. Does the channel itself have knowledge of the signal, or does it just poke the SPI peripheral at the right time? Said another way, are SS pins assigned to DMA channels, or are they assigned to the SPI peripheral? There's also clock frequency and data mode that would need to be associated with the DMA channel. My guess is it's the latter and my idea wouldn't work.

Both options seem like they would have been reasonable from a design perspective. Being able to associate a specific slave device with a DMA channel and letting the hardware arbitrate the bus would certainly be a beneficial feature. The implementation would be more complicated than just letting the SPI peripheral do all the configuration and state management, and simply enabling DMA to shovel bytes in and out of the FIFO seems the simplest. My guess is that it's the absolute simplest implementation, and the SPI peripheral just has a setting to automatically de-assert SS when its TX FIFO and shift register is empty.

Once again practically, though, I don't think it's too crazy to drive this particular display and touch screen with per-scanline interrupts. Assuming 30Hz 240 lines, that's a baseline of 7.2Khz IRQs. If you want 400Hz touch samples, that's only 400Hz more IRQs. Go crazy and sample the touch screen at 2.8Khz as long as there's enough bus bandwidth. While it feels a bit inelegant to not have the scanlines be harmonic with the touch updates, the resulting jitter is at a sufficiently short timescale that it doesn't matter. Also, for this class of hardware, you're devoting a significant chunk of RAM to any sort of framebuffer. Expecting to budget a modest amount of CPU to push the framebuffer to the display seems completely reasonable.

As you mentioned, it seems like the original intent of the designers were for folk to treat the display hardware as a SPI accessible frame buffer. They were probably targeting 8-bit AVRs popularly used in 3D printers? Update rate was probably the last of their requirements. They probably didn't even consider someone trying to drive the display with DMA. The fact that you're able to drive 50fps is quite remarkable, on top of your achievement of fully offloading it on the rp2040.

dmitrygr · on April 10, 2023

Sadly SS is associated with an SPI peripheral, and only one will drive the same set of pins so this will not work without software to manually control nCS lines as far as i can tell

> 7.2Khz IRQs

M7 takes at least 14 cycles to enter an IRQ handler, and 12 to exit. Your compiler will push at least 2 registers (as per abi alignment requirements) and pop 2 at the end, that is 6 cycles more, and that is before you've done anything, AND assuming 0-wait-state flash (unlikely). Add in, say 100 cycles for your irq handler actual code, and suddenly 1MIPS is gone, AND your main code can have latency spikes of over 132 cycles, which may matter (i have an M7 project where random latency over 6 cycles breaks the required timings for example)

sgtnoodle · on April 11, 2023

That's my point, though. 1 MIPS is less than 1% of the available CPU cycles for a typical M7! :-) As long as you can tolerate the jitter caused by interrupts at all, it seems like it should be fine.

6 cycles is pretty darn tight! I'm curious what you're controlling.

dmitrygr · on April 11, 2023

I was pretending to be a memory stick (the Sony memory storage device). And thus had to reply to external commands. On tight deadlines. Very tight.