INFERENG-6076: Fix granite-4.0-h-small TP for HPU by wjhrdy · Pull Request #81 · neuralmagic/model-validation-configs

wjhrdy · 2026-04-14T18:43:51Z

Summary

Reduce tensor-parallel-size from 2 to 1 for ibm-granite/granite-4.0-h-small performance server config
granite-4.0-h-small is a hybrid Mamba/Attention model (1.8B params) with GQA group counts not divisible by 2, causing assert n_groups % self.tp_size == 0 on HPU
TP=1 is sufficient for this small model and matches the pattern of other working HPU models

Test plan

Re-run HPU smoke tests via ocp-test.yml with config_ref=INFERENG-6076/granite-hpu-tp-fix to verify granite-4.0-h-small passes

🤖 Generated with Claude Code

granite-4.0-h-small is a hybrid Mamba/Attention model with GQA group counts that are not divisible by 2, causing an assertion error on HPU when tensor-parallel-size=2. Reduce to 1 since the model is small enough (1.8B params) to run on a single card. Signed-off-by: Willy Hardy <whardy@redhat.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INFERENG-6076: Fix granite-4.0-h-small TP for HPU#81

INFERENG-6076: Fix granite-4.0-h-small TP for HPU#81
wjhrdy wants to merge 1 commit intomainfrom
INFERENG-6076/granite-hpu-tp-fix

wjhrdy commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wjhrdy commented Apr 14, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant