deepseek full model test takes enormous time in decode mode during kv cache prefill

Runtime log shows, it takes 25mins to run prefill on reference model for seq_len=128.

This is observed on dual and quad machine

Test Command: `pytest -svv models/demos/deepseek_v3/tests/test_model.py::test_forward_pass[mode_decode_seq_1_batch_32_pos_random-True-device_params0]`

```
[1,0]<stderr>: 2026-02-11 00:25:53.604 | INFO     | models.demos.deepseek_v3.utils.test_utils:run_reference_with_attention:315 - Reference attention config: seq_len=126 chunk_size=2080 use_chunked_processing=False
[1,0]<stderr>: 2026-02-11 00:50:49.658 | INFO     | models.demos.deepseek_v3.tests.test_model:generate_reference_io:141 - Reference model output shape: torch.Size([256, 1, 129280])
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepseek full model test takes enormous time in decode mode during kv cache prefill #37585

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

deepseek full model test takes enormous time in decode mode during kv cache prefill #37585

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions