Skip to content

deepseek full model test takes enormous time in decode mode during kv cache prefill #37585

@kpaigwar

Description

@kpaigwar

Runtime log shows, it takes 25mins to run prefill on reference model for seq_len=128.

This is observed on dual and quad machine

Test Command: pytest -svv models/demos/deepseek_v3/tests/test_model.py::test_forward_pass[mode_decode_seq_1_batch_32_pos_random-True-device_params0]

[1,0]<stderr>: 2026-02-11 00:25:53.604 | INFO     | models.demos.deepseek_v3.utils.test_utils:run_reference_with_attention:315 - Reference attention config: seq_len=126 chunk_size=2080 use_chunked_processing=False
[1,0]<stderr>: 2026-02-11 00:50:49.658 | INFO     | models.demos.deepseek_v3.tests.test_model:generate_reference_io:141 - Reference model output shape: torch.Size([256, 1, 129280])

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions