Add support for Llama 3.1 8B by AnishPahilajani · Pull Request #655 · vllm-project/vllm-spyre

AnishPahilajani · 2026-01-23T18:56:14Z

Description

Added support for Llama 3.1 8B model

github-actions · 2026-01-23T18:56:23Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

Signed-off-by: AnishPahilajani <anishhp13@gmail.com>

Daniel-Schenker · 2026-01-23T19:36:01Z

vllm_spyre/platform.py

+
+        # Log once upfront that we detected the model
+        logger.info(
+            "Llama 3.1 8b dense model with tensor parallel size 4 detected. "


Is this a dense model?

vllm_spyre/platform.py

Signed-off-by: AnishPahilajani <anishhp13@gmail.com>

joerunde · 2026-01-26T19:33:56Z

vllm_spyre/platform.py

                )
+
+    @classmethod
+    def configure_llama_3_1_8b(cls, vllm_config: VllmConfig):


It looks like this is all copy-pasted, which I would rather not do. I think @tjohnson31415 has been working on cleaning this up a bit to be more reusable, we should sync up on that

tjohnson31415 · 2026-02-02T17:05:25Z

RE: #655 (comment)

After the configuration refactor is complete and merged from #669, adding new model support should just be a bit of YAML:

  # Llama 3.1 8b
  meta-llama/Llama-3.1-8B-Instruct:
    architecture:
      model_type: llama
      num_hidden_layers: 32
      max_position_embeddings: 131072
      hidden_size: 4096
      vocab_size: 128256
      num_key_value_heads: 8
      num_attention_heads: 32

    # Continuous batching configurations
    continuous_batching_configs:
      - tp_size: 4
        max_model_len: 32768
        max_num_seqs: 32
        device_config:
          env_vars:
            VLLM_DT_MAX_BATCH_TKV_LIMIT: 131072  # 128k
            FLEX_HDMA_P2PSIZE: 268435456  # 256MB
            FLEX_HDMA_COLLSIZE: 33554432  # 32MB
          num_gpu_blocks_override:
            default: 8192

TODO: could add other CB configs or tune numbers

AnishPahilajani requested review from nikolaospapandreou, sducouedic, tdoublep and yannicks1 as code owners January 23, 2026 18:56

Add support for Llama 3.1 8B

de87017

Signed-off-by: AnishPahilajani <anishhp13@gmail.com>

AnishPahilajani force-pushed the Llama3.1-enablement branch from ec15912 to de87017 Compare January 23, 2026 19:03

Daniel-Schenker reviewed Jan 23, 2026

View reviewed changes

vllm_spyre/platform.py Outdated Show resolved Hide resolved

change blocks_override based on sendnn version

f603b38

Signed-off-by: AnishPahilajani <anishhp13@gmail.com>

joerunde reviewed Jan 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Llama 3.1 8B#655

Add support for Llama 3.1 8B#655
AnishPahilajani wants to merge 2 commits intovllm-project:mainfrom
AnishPahilajani:Llama3.1-enablement

AnishPahilajani commented Jan 23, 2026

Uh oh!

github-actions bot commented Jan 23, 2026

Uh oh!

Daniel-Schenker Jan 23, 2026

Uh oh!

AnishPahilajani Jan 23, 2026

Uh oh!

Uh oh!

joerunde Jan 26, 2026

Uh oh!

tjohnson31415 commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

AnishPahilajani commented Jan 23, 2026

Description

Uh oh!

github-actions bot commented Jan 23, 2026

Uh oh!

Daniel-Schenker Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

AnishPahilajani Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joerunde Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants