TL;DR: The authors show that if you simplify Mamba so its state-space layer uses a diagonal matrix A that is a scalar times the identity matrix, the state-space transformation can be expressed as a form of causal linear attention.[a] That's the duality the authors refer to in the title. The key practical benefit is that it enables more efficient (faster) training on GPUs.