Fix Speculative decoding test cases#294
Fix Speculative decoding test cases#294tarukumar merged 2 commits intoopendatahub-io:mainfrom vaibhavjainwiz:fix_spec_dec
Conversation
WalkthroughThe command-line argument structure for configuring speculative decoding in two test files was updated. Multiple separate flags for speculative decoding parameters were replaced with a single Changes
Sequence Diagram(s)sequenceDiagram
participant Tester
participant ModelServingRuntime
Tester->>ModelServingRuntime: Start with --speculative_config '{...}'
ModelServingRuntime->>ModelServingRuntime: Parse JSON config for model, tokens, etc.
ModelServingRuntime->>Tester: Run test with provided speculative decoding parameters
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/hold', '/lgtm', '/verified', '/wip'} |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/model_serving/model_runtime/vllm/speculative_decoding/test_granite_7b_lab_draft.py (1)
20-21: Good approach to standardize speculative decoding configurationThis change aligns well with the update in the ngram test file, adopting a consistent approach by using a single
--speculative_configparameter with a JSON string. The configuration properly includes the model path and the number of speculative tokens.One minor nitpick: Consider standardizing the JSON formatting style between files - the draft file has extra spaces in
{ "model"...while the ngram file doesn't.- '{ "model": "/mnt/models/granite-7b-instruct-accelerator", "num_speculative_tokens": 5 }', + '{"model": "/mnt/models/granite-7b-instruct-accelerator", "num_speculative_tokens": 5}',
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/model_serving/model_runtime/vllm/speculative_decoding/test_granite_7b_lab_draft.py(1 hunks)tests/model_serving/model_runtime/vllm/speculative_decoding/test_granite_7b_lab_ngram.py(1 hunks)
🔇 Additional comments (1)
tests/model_serving/model_runtime/vllm/speculative_decoding/test_granite_7b_lab_ngram.py (1)
20-21: Good refactoring to consolidate speculative decoding configurationThe change to use a single
--speculative_configparameter with a JSON string instead of multiple separate flags is a good improvement. It makes the configuration more organized and maintainable by grouping related parameters together.The JSON format is valid and includes all necessary parameters for the ngram-based speculative decoding: model type, number of tokens, and prompt lookup maximum.
|
Status of building tag latest: success. |
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit