Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR allows partial metric updates even if the PodMetricsClient returns an error due to failure in processing a subset of the metrics. A partial update is considered better than no update.
A practical issue this fixes is that in vllm v1, if LoRA adapter is not enabled, the lora metrics won't even show up. Therefore the metrics won't get refreshed.
This also helps with any transient error in missing a metric.
Note1: For the LoRA metric, technically we can have a flag to indicate whether we can skip scraping it. That can be a separate followup.
Note2: There should be a separate "conformance test" effort to make sure the supported model server emit the metrics required by our protocol. The PodMetricsClient shouldn't be responsible for that. Therefore it's safe to optimistically allow partial updates.