-
Notifications
You must be signed in to change notification settings - Fork 638
feat: move the EPP build docker file #3555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
WalkthroughIntroduces a multi-stage Dockerfile for building/running the GAIE EPP binary, updates the GAIE build script to use GAIE paths and copy a Dockerfile artifact, and applies a patch adding Dynamo-based pre-request and KV scorer plugins plus a version bump to v0.5.1. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as Build Script (GAIE)
participant Git as Patch Set
participant Docker as Docker Builder
participant Img as GAIE EPP Image
CLI->>CLI: Prepare GAIE dirs (lib/include)
CLI->>CLI: Copy headers, libs, Dockerfile.epp
CLI->>Git: Apply epp-v0.5.1-dyn2.patch
alt Patch applied
CLI-->>CLI: Proceed
else Patch already applied / skipped
CLI-->>CLI: Continue without error
end
CLI->>Docker: docker build -f container/Dockerfile.epp
Docker->>Docker: Build stage (golang 1.24) compile epp
Docker->>Docker: Runtime stage (ubuntu) add deps, user
Docker-->>Img: Produce GAIE EPP image
sequenceDiagram
autonumber
participant Client
participant Gateway as Inference Gateway
participant PreReq as Pre-Request Plugin (dynamo_inject_workerid)
participant Router as Router
participant Scorer as KV Scorer (dynamo_kv_scorer)
participant Backend
Client->>Gateway: Request
Gateway->>PreReq: Execute pre-request
note right of PreReq: Inject worker ID from Dynamo
PreReq-->>Gateway: Mutated request
Gateway->>Router: Route candidates
Router->>Scorer: Score candidates (KV-aware)
note right of Scorer: Uses KV signals for scoring
Scorer-->>Router: Ranked candidates
Router->>Backend: Dispatch
Backend-->>Client: Response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch (2)
27-38
: Dockerfile name mismatch: Makefile expects Dockerfile.dynamo but repo/script provide Dockerfile.eppTarget uses “-f Dockerfile.dynamo” while the artifact is Dockerfile.epp. Builds will fail.
Options:
- Rename/copy the file as Dockerfile.dynamo, or
- Change the Makefile target to use container/Dockerfile.epp, e.g.:
- $(IMAGE_BUILD_CMD) -f Dockerfile.dynamo -t $(IMAGE_TAG) \ + $(IMAGE_BUILD_CMD) -f container/Dockerfile.epp -t $(IMAGE_TAG) \Keep build script and Makefile consistent.
803-806
: Per-request state stored on Server struct causes data races and cross-request leakageworkerIDHint/tokenDataHint live on Server and are mutated per request. Process can handle multiple streams concurrently; this invites races and wrong association across requests.
Refactor to store per-request state:
- Keep hints in a per-stream/request context struct and pass to handler phases, or
- Use gRPC stream context to carry values between header/body phases, or
- Echo via ext_proc dynamic metadata between phases.
Avoid shared mutable fields on Server for request-local data.Also applies to: 765-792
deploy/inference-gateway/build-epp-dynamo.sh (1)
86-96
: Patch path likely incorrectRepo path shows deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch but script uses epp-patches/v0.5.1-2/...
-PATCH_FILE="${DYNAMO_DIR}/deploy/inference-gateway/epp-patches/v0.5.1-2/epp-v0.5.1-dyn2.patch" +PATCH_FILE="${DYNAMO_DIR}/deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch"
🧹 Nitpick comments (3)
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch (2)
615-645
: Base64 token_data decode silently ignores malformed inputsNon-blocking, but consider emitting a debug log when decode/unmarshal fails to aid diagnosis.
172-174
: Nit: flag name in error messageMessage uses “model-server-metrics-scheme”; actual flag is “modelServerMetricsScheme”. Align for clarity.
- return fmt.Errorf("unexpected %q value for %q flag, it can only be set to 'http' or 'https'", *modelServerMetricsScheme, "model-server-metrics-scheme") + return fmt.Errorf("unexpected %q value for %q flag, it can only be set to 'http' or 'https'", *modelServerMetricsScheme, "modelServerMetricsScheme")deploy/inference-gateway/build-epp-dynamo.sh (1)
17-17
: Harden script error handlingAdd -u and pipefail to catch unset vars and pipeline failures.
-set -e # Exit on any error +set -euo pipefail # Exit on error, unset vars, and pipeline errors
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
container/Dockerfile.epp
(1 hunks)deploy/inference-gateway/build-epp-dynamo.sh
(3 hunks)deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (4)
container/Dockerfile.epp (1)
51-52
: Verify ldflags package path for CommitSHA/BuildRefThe path points to pkg/epp/metrics. The PR changes show metrics code under pkg/epp/backend/metrics. Ensure the target variables actually exist to avoid “symbol not found” or unused ldflags.
If the variables live elsewhere, update the path accordingly.
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch (2)
1306-1324
: KV router pipeline global: confirm thread-safety of C APIThe pipeline is a shared global used under RLock; multiple concurrent Score calls can hit dynamo_query_worker_selection_and_annotate. Verify the C pipeline is thread-safe; otherwise, serialize calls or create per-goroutine instances.
If not thread-safe, protect with a mutex around the C call or implement a worker pool.
Also applies to: 1476-1527
949-970
: Prefix indexer Get: LGTMDeep-copying the pod set under RLock avoids races. Good fix.
deploy/inference-gateway/build-epp-dynamo.sh (1)
37-48
: Validate the produced static lib nameScript copies libdynamo_llm_capi.a after building “-p libdynamo_llm”. Ensure the crate produces libdynamo_llm_capi.a (staticlib). If not, build the correct crate/package.
If needed:
-cargo build --release -p libdynamo_llm +cargo build --release -p libdynamo_llm_capiOr adjust the copied filename to match Cargo output.
Also applies to: 58-59
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch
Outdated
Show resolved
Hide resolved
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch
Outdated
Show resolved
Hide resolved
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch
Outdated
Show resolved
Hide resolved
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch
Outdated
Show resolved
Hide resolved
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch
Outdated
Show resolved
Hide resolved
Signed-off-by: Anna Tchernych <[email protected]>
Overview:
DEP-504
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Chores