Optimized [selective state update kernels for Mamba](https://github.com/flashinfer-ai/flashinfer/tree/main/flashinfer/mamba) have now been introduced to FlashInfer. However, they are missing benchmark support. Need to add microbenchmark support.