find_possible_tp() may return invalid TP values for vLLM

## Summary

`find_possible_tp()` in `src/planner/capacity_planner.py:795` returns all integer divisors of `num_attention_heads` as valid tensor parallelism values. For example, Qwen2.5-14B has 40 attention heads, so the function returns `{1, 2, 4, 5, 8, 10, 20, 40}`.

However, vLLM only supports TP values that are powers of 2 (1, 2, 4, 8). This means TP=5, TP=10, etc. are returned as valid but would fail at deployment time.

## Observed behavior

A recommendation was generated with "5x H200" (TP=5) for Qwen2.5-14B-Instruct, which is not a valid vLLM configuration.

## Questions

- Should `find_possible_tp()` filter to powers of 2 only?
- Are there other inference frameworks where non-power-of-2 TP is valid?
- If filtering to powers of 2, should this be done in `find_possible_tp()` itself or at the call sites?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

find_possible_tp() may return invalid TP values for vLLM #157

Summary

Observed behavior

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

find_possible_tp() may return invalid TP values for vLLM #157

Description

Summary

Observed behavior

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions