add scorer test #3
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request significantly expands the latency prediction infrastructure and improves the configurability and scalability of the system. The main changes include increasing the number of prediction server sidecars from 3 to 10, updating all relevant configuration and service definitions, and making sampling parameters configurable via environment variables. Several image references have also been updated to newer versions. Below are the most important changes grouped by theme:
Latency Prediction Infrastructure Expansion:
inferencepool-resources-lp.yamlmanifest, including all necessary container definitions, ports, probes, resources, and dedicated storage volumes for each new server (prediction-server-4throughprediction-server-10). [1] [2] [3]PREDICTION_SERVER_URLenvironment variable to include all 10 prediction servers, ensuring the main application can route requests to all available predictors.Configuration and Plugin Updates:
prefix-cache-scorerto the scheduling configuration and included it in both thedefaultandsloscheduling profiles.LATENCY_QUANTILE_ALPHAfor latency prediction configuration, allowing adjustment of quantile estimation.Image and Version Updates:
latencypredictor-v3images from a different registry, reflecting a move to newer builds and possibly a new environment. [1] [2] [3] [4] [5] [6]Sampling Parameter Improvements:
DefaultSamplingMeanandMaxSampledTokens) configurable via environment variables, replacing hardcoded values and improving flexibility for tuning latency prediction sampling. [1] [2] [3] [4]Logging and Debugging:
DEBUGtoTRACEfor more granular logging during troubleshooting.These changes collectively enhance the scalability, flexibility, and observability of the latency prediction system.