add scorer test #3

kaushikmitr · 2025-11-05T19:23:50Z

This pull request significantly expands the latency prediction infrastructure and improves the configurability and scalability of the system. The main changes include increasing the number of prediction server sidecars from 3 to 10, updating all relevant configuration and service definitions, and making sampling parameters configurable via environment variables. Several image references have also been updated to newer versions. Below are the most important changes grouped by theme:

Latency Prediction Infrastructure Expansion:

Increased the number of latency prediction server sidecars from 3 to 10 in the inferencepool-resources-lp.yaml manifest, including all necessary container definitions, ports, probes, resources, and dedicated storage volumes for each new server (prediction-server-4 through prediction-server-10). [1] [2] [3]
Updated the PREDICTION_SERVER_URL environment variable to include all 10 prediction servers, ensuring the main application can route requests to all available predictors.

Configuration and Plugin Updates:

Added a new plugin type prefix-cache-scorer to the scheduling configuration and included it in both the default and slo scheduling profiles.
Introduced a new environment variable LATENCY_QUANTILE_ALPHA for latency prediction configuration, allowing adjustment of quantile estimation.

Image and Version Updates:

Updated container image references for the main EPP container, training server, and all prediction servers to use the new latencypredictor-v3 images from a different registry, reflecting a move to newer builds and possibly a new environment. [1] [2] [3] [4] [5] [6]

Sampling Parameter Improvements:

Made the Poisson sampling parameters (DefaultSamplingMean and MaxSampledTokens) configurable via environment variables, replacing hardcoded values and improving flexibility for tuning latency prediction sampling. [1] [2] [3] [4]

Logging and Debugging:

Adjusted the log level for composite score calculations in the SLO-aware router from DEBUG to TRACE for more granular logging during troubleshooting.

These changes collectively enhance the scalability, flexibility, and observability of the latency prediction system.

kaushikmitr added 3 commits November 5, 2025 19:22

add scorer test

b7a66bd

add helm chart

4420525

remove helm

3ca4700

BenjaminBraunDev merged commit 899add9 into BenjaminBraunDev:slo-aware-routing-stage-3 Nov 7, 2025
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add scorer test #3

add scorer test #3

Uh oh!

kaushikmitr commented Nov 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add scorer test #3

add scorer test #3

Uh oh!

Conversation

kaushikmitr commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaushikmitr commented Nov 5, 2025 •

edited

Loading