From a keynote perspective, I guess showing software optimizations is less cinematic and not everyone can appreciate them as opposed to new or beautiful looking hardware.
Raw gemm computation was never the real bottleneck, especially on the newer GPUs. Feeding the matmuls i.e memory bandwidth is where it’s at, especially in the newer GPUs.