#37681 adds support for the 2D matmul to run DRAM matmuls with interleaved activations and batched height sharded weights. Unit tests were added to exercise the two MLA prefill matmuls that required this support. However, we should add more test cases to nightly to cover a wider range of inputs.