Skip to content

Overhaul configuration application and validation.#669

Open
tjohnson31415 wants to merge 29 commits intomainfrom
model-configurator
Open

Overhaul configuration application and validation.#669
tjohnson31415 wants to merge 29 commits intomainfrom
model-configurator

Conversation

@tjohnson31415
Copy link
Collaborator

@tjohnson31415 tjohnson31415 commented Jan 28, 2026

Description

Refactors the model configuration system from validation-only to a comprehensive registry-based approach:

  • Registry Architecture: Centralized ModelConfigRegistry manages configurations from unified model_configs.yaml
  • Pattern Matching: Model matching with complexity scoring to prioritize quantized over base models
  • Configuration Application: ModelConfigurator applies environment variables and other model-device specific configurations
  • Structured Data: Dataclasses replace ad-hoc dictionaries for type safety
  • Comprehensive Testing: 110+ test cases (~1,500 lines) with unit, integration, and error handling coverage

Key Changes

  • Consolidates known_model_configs.json and supported_configs.yaml into single model_configs.yaml with YAML anchors for reuse
  • Moves 150+ lines of hardcoded Granite configuration from SpyrePlatform to declarative YAML
  • Adds VLLM_SPYRE_REQUIRE_KNOWN_CONFIG environment variable for strict validation
  • Configuration summaries track expected vs. actual values with override warnings

Benefits

  • Extensibility: Add models via YAML only (no code changes)
  • Maintainability: Centralized configuration, structured data types
  • Testability: Isolated component testing
  • Documentation: Self-documenting YAML + comprehensive README

Backward Compatibility

Maintains all existing model configurations and runtime behavior. New validation is opt-in via VLLM_SPYRE_REQUIRE_KNOWN_CONFIG=1.

Made with IBM Project Bob

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Expected default for all models in 1024 and set by CLI arg default.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
…stry

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@@ -0,0 +1,220 @@
# Model Configuration Schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insanely 🌶️🌶️🌶️

@tjohnson31415
Copy link
Collaborator Author

bot:test
MARKERS="spyre and prefix_caching"

"Granite model detected. For backwards compatibility, "
"defaulting --max-num-batched-tokens to 1024"
)
vllm_config.scheduler_config.max_num_batched_tokens = 1024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this disallow user overrides?

Copy link
Collaborator Author

@tjohnson31415 tjohnson31415 Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to match the current behavior (which is less than ideal):

  • --max-num-batched-tokens has a default of 1024 (for vllm serve), but user can set a different value
  • detecting a granite model will force override the user setting and default
  • VLLM_DT_CHUNK_LEN will override either of the above

So the current override mechanism for Granite is to set VLLM_DT_CHUNK_LEN...

Note that this behavior is tested in test_cli_args.py and I didn't change any of the test assertions there (just had to change how a granite 8b model was mocked).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooof, I guess I got things mixed up. So that's overridable for all other cases like using micro models or running with a different configuration here, but not when we match on the 32x32k @ TP4 case 🤔

We've documented that this is overridable so I do think it might be worth changing to make it so here. We could follow up quickly in another PR if we want to keep this one as just the refactor though. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... the setting CLI arg defaulting was the last piece to be implemented, but should have been the solution instead of using VLLM_DT_CHUNK_LEN. Your comment does point out that technically I should check for TP=4 here and not just the model name.

We've documented that this is overridable so I do think it might be worth changing to make it so here.

Worth changing the documentation or change the behavior (edge-case breaking change)?

I figured we'd just fix this in the next major release and have the CLI arg be the only way to configure the chunk size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edge-case breaking change

is it a breaking change or a bugfix? 😉

I figured we'd just fix this in the next major release and have the CLI arg be the only way to configure the chunk size.

sounds good to me

config_summary = configurator.configure(vllm_config)
logger.info(config_summary.format_log_message())
# TODO: This is a temporary check for backwards compatibility that should be
# removed when we can make breaking changes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent to put max-num-batched-tokens into the configurator once we rip out the non-chunked-prefill code paths?

Copy link
Collaborator Author

@tjohnson31415 tjohnson31415 Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I initially did have max-num-batched-tokens in the configurator, but it was getting messy with this extra overriding.
We dont currently have a chunk size per model / configuration, so it was simpler to just leave it out. Should be easy enough to add back later if we do want per-model defaults in there.

@@ -1,91 +0,0 @@
"""Tests for model-specific overrides for granite"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just going through the tests now- did these move or are we deleting them?

I had assumed that we would want to keep these tests to ensure that the granite models get configured the same

Copy link
Collaborator Author

@tjohnson31415 tjohnson31415 Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to test_integration.py under TestModelMatching and TestGraniteVersionAwareOverrides:
https://github.com/vllm-project/vllm-spyre/pull/669/changes#diff-293f09ec74a17c687d32f5503ca971d7ba5213d4aa6607283dec15d8179d8f75R37

@joerunde
Copy link
Collaborator

joerunde commented Feb 4, 2026

Looks like the spyre tests are failing for all the fp8 use cases, can we merge main in here and re-run?


def test_match_granite_3_3_cb_config(self, registry, granite_3_3_hf_config, create_vllm_config):
"""Test matching granite-3.3-8b-instruct with CB config and getting configurator."""
vllm_config = create_vllm_config(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌶️ , I'd never thought of passing a method back in a fixture

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I woudln't have either.
Thanks Bob for making it fancy; but I'll change it to a plain function 😅

"""Tests for applying configuration settings."""

@patch.dict(os.environ, {}, clear=True)
def test_apply_configuration_sets_env_vars(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these test cases be parameterized to run on both the granite 3 and granite 4 configs? I think we'd want to ensure that granite 4 keeps configuring the same way too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry I see class TestGraniteVersionAwareOverrides

Seems like these tests have a lot of overlap - do we need to keep both? I like the specificity here but it is a whole lot of testin'

@pytest.mark.parametrize(
"model_name, sendnn_configured, sendnn_version, expected_blocks",
[
("granite-3.3-8b-instruct", True, (0, 0, 0), 8192),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the config fixtures that we already have? granite_3_3_hf_config and granite_4_hf_config

# Quantization config gets extra weight as it's a key differentiator
if attr_name == "quantization_config" and isinstance(attr_value, dict):
# Base score for having quantization_config
score += 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a specific reason that 10 was picked here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it was just Bob's preference... I'm not a fan of the complexity score stuff anyhow, so I'll revisit this.

"""
registry = ModelConfigRegistry.get_instance()
if not registry._initialized:
registry.initialize()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on adding an escape hatch here for users to author their own yaml and configure the path with an environment variable or something similar?

def test_complexity_score_minimal_pattern(self):
"""Test complexity score for minimal pattern (no attributes)."""
pattern = ArchitecturePattern(model_name="test-model", model_type="llama")
assert pattern.complexity_score == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there also any tests that check that the highest complexity config is actually matched?

(There's a lot of diff here and I don't immediately see any with ctrl+f "complexity")

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
* main:
  ⚡   Implement fp8 with chunked prefill with static scaling (#661)
  test: fix test_compare_graphs (#671)
@joerunde
Copy link
Collaborator

joerunde commented Feb 5, 2026

bot:test
MARKERS="spyre and prefix_caching"

@joerunde
Copy link
Collaborator

joerunde commented Feb 5, 2026

bot:test
MARKERS="spyre and chunked_prefill"

…llow None in pattern

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Copy link
Collaborator

@joerunde joerunde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what the venv thing was about on the GHA workers 🤷

LPGTM if we wanna go ahead and get this in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants