[Bug]: [pyexecutor] SM121 appears to be unintentionally excluded from MLA block-reuse and chunked-prefill allowlists

### System Info

* CPU architecture: N/A (code inspection issue)
* GPU: N/A (issue identified through source analysis)
* TensorRT-LLM branch: main
* TensorRT-LLM commit: current main branch at time of investigation
* OS: N/A
* Additional information:

  * This issue was identified through source-code inspection and review of the MLA capability gating logic.
  * No specific hardware was required to observe the behavior.
  * The report concerns the SM allowlists used by the MLA block-reuse and chunked-prefill feature gates.


### Who can help?

@kaiyux 

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

## Summary

While reviewing the MLA capability gating logic in `tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`, I noticed that both MLA KV-cache reuse and MLA chunked prefill are gated by the following SM allowlist:

```python
[90, 100, 103, 120]
```

SM121 is excluded from both checks.

Relevant code:

```python
if kv_cache_config.enable_block_reuse and sm_version not in [
    90, 100, 103, 120
]:
    ...
```

```python
if enable_chunked_context and sm_version not in [
    90, 100, 103, 120
]:
    ...
```

I could not find any code, comments, tests, documentation, or commit history indicating that SM121 is intentionally unsupported for MLA block reuse or MLA chunked prefill.

At the same time, multiple other locations in the repository treat SM120 and SM121 as the same Blackwell family.

Examples include:

* `fused_moe_cute_dsl_b12x.py`
* `deep_ep_low_latency.py`
* `eagle3_dynamic_tree.py`
* several integration tests using `(120, 121)` checks

Additionally, the MLA XQA JIT path contains:

```cpp
// SM121 uses the same cubin target as SM120 (sm_120f) for compatibility.
```

in:

```cpp
cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/src/nvrtcWrapper.cpp
```

## Steps to reproduce the behavior

1. Review the MLA feature gating logic in:
   `tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`
2. Observe that SM121 is excluded from both MLA block-reuse and chunked-prefill allowlists.
3. Compare this behavior against other SM120/SM121 checks throughout the repository and the MLA XQA JIT kernel support path.

## Minimal example

The existing unit test pattern in:

```text
tests/unittest/_torch/executor/test_py_executor_creator_mla_cache_reuse_sync.py
```

can be adapted with:

```python
kv_cache_reuse, runtime_cache_reuse = _run_create_py_executor(
    monkeypatch,
    sm_version=121,
    kv_cache_quant_algo=QuantAlgo.NO_QUANT,
)
```

Under the current implementation, SM121 follows the unsupported-SM fallback path and MLA cache reuse is disabled.


### Expected behavior

If SM121 is intended to be supported similarly to SM120 for MLA execution, I would expect SM121 to be included in the MLA capability allowlists.

Alternatively, if SM121 is intentionally unsupported, it would be helpful to document the architectural limitation or rationale for the exclusion.

In either case, I would expect the behavior to be explicitly documented.


### actual behavior

SM121 falls through the unsupported-SM path and MLA KV-cache reuse / MLA chunked prefill are disabled.

Specifically:

* `kv_cache_config.enable_block_reuse` is forced to `False`
* `attn_runtime_features.cache_reuse` is forced to `False`
* MLA chunked prefill is disabled when requested

The runtime emits warnings indicating that these features are unsupported on SM121.


### additional notes

This report is primarily a request for clarification.

I investigated whether the exclusion of SM121 was intentional and was unable to find:

* comments indicating MLA is unsupported on SM121
* tests expecting SM121 to be disabled
* documentation describing an SM121 limitation
* commit history explicitly excluding SM121

Because SM121 appears to share the same `sm_120f` MLA kernel target as SM120, I wanted to confirm whether the current allowlists are intentional or whether SM121 was unintentionally omitted when MLA support was expanded to additional architectures.

If the current behavior is intentional, I would appreciate any context on the limitation. If not, I would be happy to help with a follow-up fix and regression test.


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [pyexecutor] SM121 appears to be unintentionally excluded from MLA block-reuse and chunked-prefill allowlists #15344

System Info

Who can help?

Information

Tasks

Reproduction

Summary

Steps to reproduce the behavior

Minimal example

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: [pyexecutor] SM121 appears to be unintentionally excluded from MLA block-reuse and chunked-prefill allowlists #15344

Description

System Info

Who can help?

Information

Tasks

Reproduction

Summary

Steps to reproduce the behavior

Minimal example

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions