Skip to content

find_possible_tp() may return invalid TP values for vLLM #157

@anfredette

Description

@anfredette

Summary

find_possible_tp() in src/planner/capacity_planner.py:795 returns all integer divisors of num_attention_heads as valid tensor parallelism values. For example, Qwen2.5-14B has 40 attention heads, so the function returns {1, 2, 4, 5, 8, 10, 20, 40}.

However, vLLM only supports TP values that are powers of 2 (1, 2, 4, 8). This means TP=5, TP=10, etc. are returned as valid but would fail at deployment time.

Observed behavior

A recommendation was generated with "5x H200" (TP=5) for Qwen2.5-14B-Instruct, which is not a valid vLLM configuration.

Questions

  • Should find_possible_tp() filter to powers of 2 only?
  • Are there other inference frameworks where non-power-of-2 TP is valid?
  • If filtering to powers of 2, should this be done in find_possible_tp() itself or at the call sites?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions