Add support for tensors/heads not divisible by GPUs · abacusai/vllm@641fb85

Triggered via pull request February 3, 2025 18:37

synchronize #1

Status Failure

Total duration 4m 10s

Artifacts –

pre-commit.yml

on: pull_request

10 errors

Ruff (F841): vllm/config.py#L690

vllm/config.py:690:9: F841 Local variable `total_num_attention_heads` is assigned to but never used

Ruff (F841): vllm/config.py#L692

vllm/config.py:692:9: F841 Local variable `tensor_parallel_size` is assigned to but never used

Ruff (F841): vllm/model_executor/layers/fused_moe/layer.py#L453

vllm/model_executor/layers/fused_moe/layer.py:453:9: F841 Local variable `tp_rank` is assigned to but never used

Ruff (E501): vllm/model_executor/layers/linear.py#L290

vllm/model_executor/layers/linear.py:290:81: E501 Line too long (93 > 80)

Ruff (F841): vllm/model_executor/layers/linear.py#L377

vllm/model_executor/layers/linear.py:377:13: F841 Local variable `shard_size` is assigned to but never used

Ruff (F841): vllm/model_executor/layers/linear.py#L678

vllm/model_executor/layers/linear.py:678:9: F841 Local variable `tp_size` is assigned to but never used

Ruff (F841): vllm/model_executor/layers/linear.py#L1127

vllm/model_executor/layers/linear.py:1127:9: F841 Local variable `tp_rank` is assigned to but never used

Ruff (F841): vllm/model_executor/layers/linear.py#L1151

vllm/model_executor/layers/linear.py:1151:13: F841 Local variable `shard_size` is assigned to but never used

Ruff (E501): vllm/model_executor/layers/quantization/base_config.py#L43

vllm/model_executor/layers/quantization/base_config.py:43:81: E501 Line too long (83 > 80)

Ruff (F841): vllm/model_executor/layers/quantization/fp8.py#L170

vllm/model_executor/layers/quantization/fp8.py:170:9: F841 Local variable `tp_chunk` is assigned to but never used