Overhaul configuration application and validation.#669
Overhaul configuration application and validation.#669tjohnson31415 wants to merge 29 commits intomainfrom
Conversation
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
b874183 to
dd6414a
Compare
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. We also recommend installing prek and configuring it to check your code before every local commit. |
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Expected default for all models in 1024 and set by CLI arg default. Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
…stry Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
| @@ -0,0 +1,220 @@ | |||
| # Model Configuration Schema | |||
|
bot:test |
| "Granite model detected. For backwards compatibility, " | ||
| "defaulting --max-num-batched-tokens to 1024" | ||
| ) | ||
| vllm_config.scheduler_config.max_num_batched_tokens = 1024 |
There was a problem hiding this comment.
does this disallow user overrides?
There was a problem hiding this comment.
This is to match the current behavior (which is less than ideal):
--max-num-batched-tokenshas a default of 1024 (for vllm serve), but user can set a different value- detecting a granite model will force override the user setting and default
VLLM_DT_CHUNK_LENwill override either of the above
So the current override mechanism for Granite is to set VLLM_DT_CHUNK_LEN...
Note that this behavior is tested in test_cli_args.py and I didn't change any of the test assertions there (just had to change how a granite 8b model was mocked).
There was a problem hiding this comment.
ooof, I guess I got things mixed up. So that's overridable for all other cases like using micro models or running with a different configuration here, but not when we match on the 32x32k @ TP4 case 🤔
We've documented that this is overridable so I do think it might be worth changing to make it so here. We could follow up quickly in another PR if we want to keep this one as just the refactor though. WDYT?
There was a problem hiding this comment.
Yeah... the setting CLI arg defaulting was the last piece to be implemented, but should have been the solution instead of using VLLM_DT_CHUNK_LEN. Your comment does point out that technically I should check for TP=4 here and not just the model name.
We've documented that this is overridable so I do think it might be worth changing to make it so here.
Worth changing the documentation or change the behavior (edge-case breaking change)?
I figured we'd just fix this in the next major release and have the CLI arg be the only way to configure the chunk size.
There was a problem hiding this comment.
edge-case breaking change
is it a breaking change or a bugfix? 😉
I figured we'd just fix this in the next major release and have the CLI arg be the only way to configure the chunk size.
sounds good to me
| config_summary = configurator.configure(vllm_config) | ||
| logger.info(config_summary.format_log_message()) | ||
| # TODO: This is a temporary check for backwards compatibility that should be | ||
| # removed when we can make breaking changes. |
There was a problem hiding this comment.
Is the intent to put max-num-batched-tokens into the configurator once we rip out the non-chunked-prefill code paths?
There was a problem hiding this comment.
So I initially did have max-num-batched-tokens in the configurator, but it was getting messy with this extra overriding.
We dont currently have a chunk size per model / configuration, so it was simpler to just leave it out. Should be easy enough to add back later if we do want per-model defaults in there.
| @@ -1,91 +0,0 @@ | |||
| """Tests for model-specific overrides for granite""" | |||
There was a problem hiding this comment.
just going through the tests now- did these move or are we deleting them?
I had assumed that we would want to keep these tests to ensure that the granite models get configured the same
There was a problem hiding this comment.
Moved to test_integration.py under TestModelMatching and TestGraniteVersionAwareOverrides:
https://github.com/vllm-project/vllm-spyre/pull/669/changes#diff-293f09ec74a17c687d32f5503ca971d7ba5213d4aa6607283dec15d8179d8f75R37
|
Looks like the spyre tests are failing for all the fp8 use cases, can we merge main in here and re-run? |
|
|
||
| def test_match_granite_3_3_cb_config(self, registry, granite_3_3_hf_config, create_vllm_config): | ||
| """Test matching granite-3.3-8b-instruct with CB config and getting configurator.""" | ||
| vllm_config = create_vllm_config( |
There was a problem hiding this comment.
🌶️ , I'd never thought of passing a method back in a fixture
There was a problem hiding this comment.
I woudln't have either.
Thanks Bob for making it fancy; but I'll change it to a plain function 😅
tests/config/test_integration.py
Outdated
| """Tests for applying configuration settings.""" | ||
|
|
||
| @patch.dict(os.environ, {}, clear=True) | ||
| def test_apply_configuration_sets_env_vars( |
There was a problem hiding this comment.
Should these test cases be parameterized to run on both the granite 3 and granite 4 configs? I think we'd want to ensure that granite 4 keeps configuring the same way too
There was a problem hiding this comment.
ah sorry I see class TestGraniteVersionAwareOverrides
Seems like these tests have a lot of overlap - do we need to keep both? I like the specificity here but it is a whole lot of testin'
tests/config/test_integration.py
Outdated
| @pytest.mark.parametrize( | ||
| "model_name, sendnn_configured, sendnn_version, expected_blocks", | ||
| [ | ||
| ("granite-3.3-8b-instruct", True, (0, 0, 0), 8192), |
There was a problem hiding this comment.
can we use the config fixtures that we already have? granite_3_3_hf_config and granite_4_hf_config
vllm_spyre/config/model_config.py
Outdated
| # Quantization config gets extra weight as it's a key differentiator | ||
| if attr_name == "quantization_config" and isinstance(attr_value, dict): | ||
| # Base score for having quantization_config | ||
| score += 10 |
There was a problem hiding this comment.
Was there a specific reason that 10 was picked here?
There was a problem hiding this comment.
Nope, it was just Bob's preference... I'm not a fan of the complexity score stuff anyhow, so I'll revisit this.
| """ | ||
| registry = ModelConfigRegistry.get_instance() | ||
| if not registry._initialized: | ||
| registry.initialize() |
There was a problem hiding this comment.
Thoughts on adding an escape hatch here for users to author their own yaml and configure the path with an environment variable or something similar?
tests/config/test_model_config.py
Outdated
| def test_complexity_score_minimal_pattern(self): | ||
| """Test complexity score for minimal pattern (no attributes).""" | ||
| pattern = ArchitecturePattern(model_name="test-model", model_type="llama") | ||
| assert pattern.complexity_score == 0 |
There was a problem hiding this comment.
Are there also any tests that check that the highest complexity config is actually matched?
(There's a lot of diff here and I don't immediately see any with ctrl+f "complexity")
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
bot:test |
|
bot:test |
…llow None in pattern Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
joerunde
left a comment
There was a problem hiding this comment.
No idea what the venv thing was about on the GHA workers 🤷
LPGTM if we wanna go ahead and get this in!
Description
Refactors the model configuration system from validation-only to a comprehensive registry-based approach:
ModelConfigRegistrymanages configurations from unifiedmodel_configs.yamlModelConfiguratorapplies environment variables and other model-device specific configurationsKey Changes
known_model_configs.jsonandsupported_configs.yamlinto singlemodel_configs.yamlwith YAML anchors for reuseSpyrePlatformto declarative YAMLVLLM_SPYRE_REQUIRE_KNOWN_CONFIGenvironment variable for strict validationBenefits
Backward Compatibility
Maintains all existing model configurations and runtime behavior. New validation is opt-in via
VLLM_SPYRE_REQUIRE_KNOWN_CONFIG=1.Made with IBM Project Bob