Overhaul configuration application and validation. by tjohnson31415 · Pull Request #669 · vllm-project/vllm-spyre

tjohnson31415 · 2026-01-28T20:08:17Z

Description

Refactors the model configuration system from validation-only to a comprehensive registry-based approach:

Registry Architecture: Centralized ModelConfigRegistry manages configurations from unified model_configs.yaml
Pattern Matching: Model matching with complexity scoring to prioritize quantized over base models
Configuration Application: ModelConfigurator applies environment variables and other model-device specific configurations
Structured Data: Dataclasses replace ad-hoc dictionaries for type safety
Comprehensive Testing: 110+ test cases (~1,500 lines) with unit, integration, and error handling coverage

Key Changes

Consolidates known_model_configs.json and supported_configs.yaml into single model_configs.yaml with YAML anchors for reuse
Moves 150+ lines of hardcoded Granite configuration from SpyrePlatform to declarative YAML
Adds VLLM_SPYRE_REQUIRE_KNOWN_CONFIG environment variable for strict validation
Configuration summaries track expected vs. actual values with override warnings

Benefits

Extensibility: Add models via YAML only (no code changes)
Maintainability: Centralized configuration, structured data types
Testability: Isolated component testing
Documentation: Self-documenting YAML + comprehensive README

Backward Compatibility

Maintains all existing model configurations and runtime behavior. New validation is opt-in via VLLM_SPYRE_REQUIRE_KNOWN_CONFIG=1.

Made with IBM Project Bob

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

github-actions · 2026-01-28T20:09:48Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Expected default for all models in 1024 and set by CLI arg default. Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

…stry Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

joerunde · 2026-02-03T16:50:07Z

vllm_spyre/config/model_configs.yaml

@@ -0,0 +1,220 @@
+# Model Configuration Schema


insanely 🌶️🌶️🌶️

tjohnson31415 · 2026-02-03T16:55:00Z

bot:test
MARKERS="spyre and prefix_caching"

joerunde · 2026-02-03T16:56:05Z

vllm_spyre/platform.py

+                        "Granite model detected. For backwards compatibility, "
+                        "defaulting --max-num-batched-tokens to 1024"
+                    )
+                    vllm_config.scheduler_config.max_num_batched_tokens = 1024


does this disallow user overrides?

This is to match the current behavior (which is less than ideal):

--max-num-batched-tokens has a default of 1024 (for vllm serve), but user can set a different value

detecting a granite model will force override the user setting and default

VLLM_DT_CHUNK_LEN will override either of the above

So the current override mechanism for Granite is to set VLLM_DT_CHUNK_LEN...

Note that this behavior is tested in test_cli_args.py and I didn't change any of the test assertions there (just had to change how a granite 8b model was mocked).

ooof, I guess I got things mixed up. So that's overridable for all other cases like using micro models or running with a different configuration here, but not when we match on the 32x32k @ TP4 case 🤔

We've documented that this is overridable so I do think it might be worth changing to make it so here. We could follow up quickly in another PR if we want to keep this one as just the refactor though. WDYT?

Yeah... the setting CLI arg defaulting was the last piece to be implemented, but should have been the solution instead of using VLLM_DT_CHUNK_LEN. Your comment does point out that technically I should check for TP=4 here and not just the model name.

We've documented that this is overridable so I do think it might be worth changing to make it so here.

Worth changing the documentation or change the behavior (edge-case breaking change)?

I figured we'd just fix this in the next major release and have the CLI arg be the only way to configure the chunk size.

edge-case breaking change

is it a breaking change or a bugfix? 😉

I figured we'd just fix this in the next major release and have the CLI arg be the only way to configure the chunk size.

sounds good to me

joerunde · 2026-02-03T16:57:32Z

vllm_spyre/platform.py

+                config_summary = configurator.configure(vllm_config)
+                logger.info(config_summary.format_log_message())
+                # TODO: This is a temporary check for backwards compatibility that should be
+                # removed when we can make breaking changes.


Is the intent to put max-num-batched-tokens into the configurator once we rip out the non-chunked-prefill code paths?

So I initially did have max-num-batched-tokens in the configurator, but it was getting messy with this extra overriding.
We dont currently have a chunk size per model / configuration, so it was simpler to just leave it out. Should be easy enough to add back later if we do want per-model defaults in there.

joerunde · 2026-02-03T16:59:30Z

tests/models/test_granite.py

@@ -1,91 +0,0 @@
-"""Tests for model-specific overrides for granite"""


just going through the tests now- did these move or are we deleting them?

I had assumed that we would want to keep these tests to ensure that the granite models get configured the same

Moved to test_integration.py under TestModelMatching and TestGraniteVersionAwareOverrides:
https://github.com/vllm-project/vllm-spyre/pull/669/changes#diff-293f09ec74a17c687d32f5503ca971d7ba5213d4aa6607283dec15d8179d8f75R37

joerunde · 2026-02-04T18:27:33Z

Looks like the spyre tests are failing for all the fp8 use cases, can we merge main in here and re-run?

joerunde · 2026-02-04T18:53:29Z

tests/config/test_integration.py

+
+    def test_match_granite_3_3_cb_config(self, registry, granite_3_3_hf_config, create_vllm_config):
+        """Test matching granite-3.3-8b-instruct with CB config and getting configurator."""
+        vllm_config = create_vllm_config(


🌶️ , I'd never thought of passing a method back in a fixture

I woudln't have either.
Thanks Bob for making it fancy; but I'll change it to a plain function 😅

joerunde · 2026-02-04T19:02:59Z

tests/config/test_integration.py

+    """Tests for applying configuration settings."""
+
+    @patch.dict(os.environ, {}, clear=True)
+    def test_apply_configuration_sets_env_vars(


Should these test cases be parameterized to run on both the granite 3 and granite 4 configs? I think we'd want to ensure that granite 4 keeps configuring the same way too

ah sorry I see class TestGraniteVersionAwareOverrides

Seems like these tests have a lot of overlap - do we need to keep both? I like the specificity here but it is a whole lot of testin'

joerunde · 2026-02-04T19:11:10Z

tests/config/test_integration.py

+    @pytest.mark.parametrize(
+        "model_name, sendnn_configured, sendnn_version, expected_blocks",
+        [
+            ("granite-3.3-8b-instruct", True, (0, 0, 0), 8192),


can we use the config fixtures that we already have? granite_3_3_hf_config and granite_4_hf_config

joerunde · 2026-02-04T19:14:51Z

vllm_spyre/config/model_config.py

+            # Quantization config gets extra weight as it's a key differentiator
+            if attr_name == "quantization_config" and isinstance(attr_value, dict):
+                # Base score for having quantization_config
+                score += 10


Was there a specific reason that 10 was picked here?

Nope, it was just Bob's preference... I'm not a fan of the complexity score stuff anyhow, so I'll revisit this.

joerunde · 2026-02-04T19:17:39Z

vllm_spyre/config/model_registry.py

+    """
+    registry = ModelConfigRegistry.get_instance()
+    if not registry._initialized:
+        registry.initialize()


Thoughts on adding an escape hatch here for users to author their own yaml and configure the path with an environment variable or something similar?

joerunde · 2026-02-04T19:20:28Z

tests/config/test_model_config.py

+    def test_complexity_score_minimal_pattern(self):
+        """Test complexity score for minimal pattern (no attributes)."""
+        pattern = ArchitecturePattern(model_name="test-model", model_type="llama")
+        assert pattern.complexity_score == 0


Are there also any tests that check that the highest complexity config is actually matched?

(There's a lot of diff here and I don't immediately see any with ctrl+f "complexity")

vllm_spyre/config/configurators/model_configurator.py

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

* main: ⚡ Implement fp8 with chunked prefill with static scaling (#661) test: fix test_compare_graphs (#671)

joerunde · 2026-02-05T20:18:17Z

bot:test
MARKERS="spyre and prefix_caching"

joerunde · 2026-02-05T21:07:00Z

bot:test
MARKERS="spyre and chunked_prefill"

…llow None in pattern Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

joerunde

No idea what the venv thing was about on the GHA workers 🤷

LPGTM if we wanna go ahead and get this in!

tjohnson31415 added 5 commits January 28, 2026 13:09

🤖 intiial model configuration refactor by Bob

c9fc619

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

additional changes for new model config registry

8a39277

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

refactor: separate static and continuous configs and some other cleanup

e47abb6

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

test: 100% code coverage of vllm_spyre/config courtesy of Bob

9c5532b

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

refactor: update tracking and summarization log from configurator

dd6414a

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 force-pushed the model-configurator branch from b874183 to dd6414a Compare January 28, 2026 20:09

tjohnson31415 added 12 commits January 28, 2026 13:47

update and delint config module README

4acff78

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

fix gha test workflow

12b71b3

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tests: config tests should use public methods

35e23a5

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

review: simplfying and clarifying changes from Bob review

9fd8b29

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

refactor: give warmup shapes a dataclass

b8d7084

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

markdownlint supported_models.py

6c0015e

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

🔥 remove max_num_batched_tokens from model registry

1712f34

Expected default for all models in 1024 and set by CLI arg default. Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

test: tweak to cleanup fixture

0321d52

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

test: fix fixtures in test_integration

6c8169a

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

fix: add complexity score to correctly match FP8 models from the regi…

297df5a

…stry Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

update templates / references in model_configs.yamll

216b5fd

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

remove test/models/test_granite.py again...

fa89b0a

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 mentioned this pull request Feb 2, 2026

Add support for Llama 3.1 8B #655

Open

tjohnson31415 added 3 commits February 2, 2026 11:11

test: refactor fixtures in test_error_handling

7e8998f

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

more test cleanup

9efeffe

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

fix: check static batching config first in registry lookup

a2b8858

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 marked this pull request as ready for review February 2, 2026 20:42

tjohnson31415 requested review from joerunde, prashantgupta24, rafvasq, sducouedic, tdoublep and yannicks1 as code owners February 2, 2026 20:42

tjohnson31415 requested a review from nikolaospapandreou as a code owner February 2, 2026 20:42

joerunde reviewed Feb 3, 2026

View reviewed changes

vllm_spyre/config/model_configs.yaml

@@ -0,0 +1,220 @@

# Model Configuration Schema

Copy link

Collaborator

joerunde Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insanely 🌶️🌶️🌶️

joerunde reviewed Feb 3, 2026

View reviewed changes

joerunde reviewed Feb 4, 2026

View reviewed changes

vllm_spyre/config/configurators/model_configurator.py Outdated Show resolved Hide resolved

tjohnson31415 added 4 commits February 4, 2026 15:33

fix: add TP check to granite chunk size override/default

e1d29e3

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

review: rename ConfigValue fields to default and applied

85fe58a

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

review: reduce test overlap in test_integration and use more fixtures

41f28e7

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Merge branch 'main' into model-configurator

f1f115d

* main: ⚡ Implement fp8 with chunked prefill with static scaling (#661) test: fix test_compare_graphs (#671)

tjohnson31415 added 5 commits February 5, 2026 14:13

refactor: remove complexity_score in favor of field count and don't a…

6d4e9a3

…llow None in pattern Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

feat: add support for VLLM_SPYRE_MODEL_CONFIG_FILE

a3fedf8

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

fixup: manual formatting fix

9d62737

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

try adding --clear to uv venv in GHA test workflow

b6db838

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

fmt: markdownlint README.md update

5cd255f

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

joerunde approved these changes Feb 6, 2026

View reviewed changes

		@@ -1,91 +0,0 @@
		"""Tests for model-specific overrides for granite"""

Conversation

tjohnson31415 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Benefits

Backward Compatibility

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 commented Feb 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joerunde commented Feb 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joerunde commented Feb 5, 2026

Uh oh!

joerunde commented Feb 5, 2026

Uh oh!

joerunde left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tjohnson31415 commented Jan 28, 2026 •

edited

Loading

tjohnson31415 Feb 3, 2026 •

edited

Loading

tjohnson31415 Feb 3, 2026 •

edited

Loading

tjohnson31415 Feb 3, 2026 •

edited

Loading