Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
6aca289
feat: update to use new kv-cache UDS tokenizer (#609)
zdtsw Feb 18, 2026
e294cfe
fix: Add kustomization file for rbac (#601)
albertoperdomo2 Feb 19, 2026
a0c8d17
Fix panic in SGLang proxy handling of concurrent requests (#632)
yangligt2 Feb 19, 2026
1519a28
Add otel tracing instrumentation (#506)
sallyom Feb 25, 2026
ed55c9c
bump kvc import to v0.5.1-rc2 (#657)
vMaroon Feb 27, 2026
1dae683
pull in v0.6.0 of kvcache (#660)
Gregory-Pereira Feb 28, 2026
f278bfe
deps(go): bump go.opentelemetry.io/otel/sdk from 1.39.0 to 1.40.0 (#661)
dependabot[bot] Feb 28, 2026
9fe0948
deps(go): bump the go-dependencies group across 1 directory with 2 up…
dependabot[bot] Feb 28, 2026
416d4a0
fix(docs): Updatede development guide (#666)
gyliu513 Mar 1, 2026
8437ae1
Optimized request prefill error messages (#652)
learner0810 Mar 1, 2026
f21dcac
fix(makefile): use shell variable for kv-cache path in UDS tokenizer …
gyliu513 Mar 1, 2026
dbb0b5d
fix: remove kustomize dependency (#665)
gyliu513 Mar 1, 2026
70a3fd0
deps(go): bump the kubernetes group with 5 updates (#673)
dependabot[bot] Mar 2, 2026
db7a7ba
deps(go): bump the go-dependencies group across 1 directory with 5 up…
dependabot[bot] Mar 3, 2026
0ccf2ed
deps(actions): bump lycheeverse/lychee-action from 2.7.0 to 2.8.0 (#671)
dependabot[bot] Mar 3, 2026
9903474
ci: add dev image workflow for main and release branches (#668)
pierDipi Mar 3, 2026
a4f9d45
deps(actions): bump crate-ci/typos from 1.43.5 to 1.44.0 (#670)
dependabot[bot] Mar 3, 2026
c34ad8f
fix(ci): update Trivy to v0.69.2 (#675)
pierDipi Mar 4, 2026
c910eeb
Allow sidecar server to reload TLS certificates (#607)
pierDipi Mar 4, 2026
091312c
use trivy action for scanning (#688)
elevran Mar 9, 2026
da0d089
deps(go): bump the go-dependencies group with 7 updates (#692)
dependabot[bot] Mar 10, 2026
927052d
feat(sidecar): simplify TLS command line options with StringSlice fla…
gyliu513 Mar 10, 2026
bd3ba8c
fix terminolgy and add links (#695)
elevran Mar 10, 2026
3ce43fe
replace map[string]bool with map[string]struct{} (#696)
roytman Mar 10, 2026
12c2dd7
add make targets for presubmit (#687)
elevran Mar 10, 2026
7675f18
run newer version with explicit auth tokens (#698)
elevran Mar 10, 2026
f5a626e
remove extra trivy params (#702)
elevran Mar 10, 2026
8112a3a
fix: simplify InferencePool flag to namespace/name format (#685)
gyliu513 Mar 10, 2026
9083ec0
Trivy complains of user without password (#704)
elevran Mar 11, 2026
0f30fa6
fix(test): Add unit test for pd_prerequest.go (#706)
gyliu513 Mar 11, 2026
700325d
remove trivy cache and enable workflow dispatch (#713)
elevran Mar 12, 2026
be97ee1
initial E/PD extension of the sidecar (#643)
roytman Mar 12, 2026
3e62967
Check for uniqueness of media URLs (#717)
roytman Mar 15, 2026
e0f7b8d
move typo checking from tools makefile to main, under lint (#719)
elevran Mar 15, 2026
d1a19ef
rename EncoderPodsHeader according to other constants (#721)
roytman Mar 16, 2026
2078503
Implement Options pattern for sidecar proxy (#697)
Mohamedma96 Mar 16, 2026
fb7e3af
rename common constants (#722)
roytman Mar 16, 2026
b9a4a82
deps(actions): bump dorny/paths-filter from 3 to 4 (#723)
dependabot[bot] Mar 17, 2026
2d38fc1
enable major version updates to gh actions (#714)
elevran Mar 17, 2026
e8e709d
NonCachedTokens defines the minimum number of non-cached tokens requi…
modassarrana89-new Mar 17, 2026
89cbbbb
Add external tokenizer PrepareData plugin and TokenizedPrompt scorer …
acardace Mar 17, 2026
37fac64
Deprecate the workaround used to support vLLM Data Parallel on Istio …
shmuelk Mar 18, 2026
63914ae
build: remove CGO dependency by migrating to pure-Go ZMQ (#728)
elevran Mar 18, 2026
c3f9eb6
deps(go): bump google.golang.org/grpc from 1.79.2 to 1.79.3 (#737)
dependabot[bot] Mar 19, 2026
16211df
use kubectl kustomize and not standalone (#741)
elevran Mar 19, 2026
9ff7747
feat: add idle pod config to active-request-scorer (#669)
dagrayvid Mar 19, 2026
888eaa8
[build] Optimize docker build (local and CICD) (#740)
elevran Mar 19, 2026
4cd7046
feat: speculative indexing for PrecisePrefixCacheScorer (#659)
bongwoobak Mar 19, 2026
a80ba25
remove obsolete hashBlockSize (#748)
roytman Mar 22, 2026
09e438b
feat: add optional Prometheus monitoring to Kind dev environment (#742)
hexfusion Mar 22, 2026
4793819
fix: support podman-docker in e2e tests and Makefile (#730)
hexfusion Mar 23, 2026
fce0fff
Remove UDS tokenizer image build from inference scheduler repo (#739)
elevran Mar 23, 2026
2e9e558
deps(go): bump the kubernetes group with 5 updates (#754)
dependabot[bot] Mar 23, 2026
2da439c
test: add disruption e2e tests for scheduler failure scenarios (#735)
hexfusion Mar 24, 2026
f729b15
Unified Disaggregate Handler (#732)
roytman Mar 24, 2026
62bb5ed
[build] Add test coverage reporting (#749)
elevran Mar 25, 2026
5b233fd
[refactor] Unify sidecar `Config` and `Options` (#751)
elevran Mar 25, 2026
8ce7248
[cicd[ Make coverage comparison optional (#762)
elevran Mar 25, 2026
24008df
temporary fix to use previous simulator image (#766)
roytman Mar 26, 2026
e8d31cc
Combine EncodeHeaderHandler and PrefillHeaderHandler into a single Di…
roytman Mar 26, 2026
2645f86
Prevent mismatch between new and deprecated APIs (#756)
roytman Mar 26, 2026
5192fb5
Revert IGW import to 1.4.0 and update tokenizer plugin accordingly (#…
vMaroon Mar 27, 2026
7b664bb
fix(test): increase test coverage for prefix_based_pd_decider.go (#715)
gyliu513 Mar 27, 2026
7407edd
Run unit, integration tests and builds in a container (#521)
acardace Mar 27, 2026
b65173a
update Makefile to run e2e tests locally with Podman (#775)
roytman Mar 29, 2026
e99a11d
implement context-length-aware plugin (scorer/filter) (#550)
vMaroon Mar 29, 2026
45232c9
Build images use Go builtin cross compilation (#776)
shmuelk Mar 29, 2026
a423802
feat: bump llm-d-kv-cache for MM-aware prefix-cache routing (#772)
vMaroon Mar 29, 2026
f2379f1
change simulator version to v0.8.1 (#773)
mayabar Mar 29, 2026
ea4610d
add missing newline at end of OWNERS, make script robust (#779)
elevran Mar 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions .github/actions/docker-build-and-push/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,11 @@ inputs:
prerelease:
required: true
description: indicates whether or not this is a pre-release (not a release) build
python-version:
required: false
description: Python version to use (defaults to 3.12)
default: '3.12'
runs:
using: "composite"
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4

- name: Login to GitHub Container Registry
run: echo "${{ inputs.github-token }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
Expand All @@ -49,7 +45,9 @@ runs:
fi
docker buildx build \
--platform linux/amd64,linux/arm64 \
--build-arg PYTHON_VERSION=${{ inputs.python-version }} \
--cache-from type=gha,scope=${{ inputs.image-name }} \
--cache-to type=gha,mode=max,scope=${{ inputs.image-name }} \
--build-arg LDFLAGS="-s -w" \
-t ${{ inputs.registry }}/${{ inputs.image-name }}:${{ inputs.tag }} \
${LATEST_TAG} -f ${{ inputs.docker-file }} --push .
shell: bash
22 changes: 10 additions & 12 deletions .github/actions/trivy-scan/action.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
name: Trivy Scan
description: Scan container image with Trivy
description: Scan container image with official Aqua Security Trivy action
inputs:
image:
required: true
description: "Image to scan (e.g., 'my-repo/my-image:latest')"

runs:
using: "composite"
steps:
- name: Install Trivy
run: |
wget https://github.com/aquasecurity/trivy/releases/download/v0.44.1/trivy_0.44.1_Linux-64bit.deb
sudo dpkg -i trivy_0.44.1_Linux-64bit.deb
shell: bash


- name: Scan image
run: |
trivy image --severity HIGH,CRITICAL --no-progress ${{ inputs.image }}
shell: bash
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@0.35.0
with:
image-ref: ${{ inputs.image }}
format: 'table'
severity: 'HIGH,CRITICAL'
exit-code: '1'
11 changes: 8 additions & 3 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ updates:
update-types: ["version-update:semver-major", "version-update:semver-minor"]
- dependency-name: "sigs.k8s.io/*"
update-types: ["version-update:semver-major", "version-update:semver-minor"]
# Ignore major updates for all packages
# Ignore major updates for all Go packages
- dependency-name: "*"
update-types: ["version-update:semver-major"]
update-types: ["version-update:semver-major"]
groups:
go-dependencies:
patterns:
Expand All @@ -46,8 +46,13 @@ updates:
- "release-note-none"
commit-message:
prefix: "deps(actions)"
# No "ignore" block here: This allows major version updates
groups:
github-actions:
patterns:
- "*"

# 3. Docker base image updates (e.g., for Dockerfile FROM lines)
# 3. Docker base image updates
- package-ecosystem: "docker"
directory: "/"
schedule:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/auto-assign.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
}
# Parse OWNERS
while IFS= read -r line; do
while IFS= read -r line || [[ -n "$line" ]]; do
# Skip comments/empty
[[ -z "${line// }" || "$line" =~ ^[[:space:]]*# ]] && continue
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/check-typos.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ jobs:
uses: actions/checkout@v6

- name: Check typos
uses: crate-ci/typos@v1.43.5
uses: crate-ci/typos@v1.44.0

63 changes: 63 additions & 0 deletions .github/workflows/ci-build-images.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: Build and Push Container Images

on:
workflow_call:
inputs:
epp-image-name:
required: true
type: string
sidecar-image-name:
required: true
type: string
tag:
required: true
type: string
prerelease:
required: true
type: string
secrets:
GHCR_TOKEN:
required: true

jobs:
build-epp:
runs-on: ubuntu-latest
steps:
- name: Checkout source
uses: actions/checkout@v6

- name: Build and push EPP image
uses: ./.github/actions/docker-build-and-push
with:
docker-file: Dockerfile.epp
tag: ${{ inputs.tag }}
image-name: ${{ inputs.epp-image-name }}
registry: ghcr.io/llm-d
github-token: ${{ secrets.GHCR_TOKEN }}
prerelease: ${{ inputs.prerelease }}

- name: Run Trivy scan on EPP image
uses: ./.github/actions/trivy-scan
with:
image: ghcr.io/llm-d/${{ inputs.epp-image-name }}:${{ inputs.tag }}

build-sidecar:
runs-on: ubuntu-latest
steps:
- name: Checkout source
uses: actions/checkout@v6

- name: Build and push sidecar image
uses: ./.github/actions/docker-build-and-push
with:
docker-file: Dockerfile.sidecar
tag: ${{ inputs.tag }}
image-name: ${{ inputs.sidecar-image-name }}
registry: ghcr.io/llm-d
github-token: ${{ secrets.GHCR_TOKEN }}
prerelease: ${{ inputs.prerelease }}

- name: Run Trivy scan on sidecar image
uses: ./.github/actions/trivy-scan
with:
image: ghcr.io/llm-d/${{ inputs.sidecar-image-name }}:${{ inputs.tag }}
39 changes: 39 additions & 0 deletions .github/workflows/ci-dev.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: CI - Dev - Docker Container Image

on:
push:
branches:
- main
- 'release-*'
workflow_dispatch:

jobs:
set-params:
runs-on: ubuntu-latest
outputs:
project_name: ${{ steps.version.outputs.project_name }}
sidecar_name: ${{ steps.version.outputs.sidecar_name }}
tag: ${{ steps.tag.outputs.tag }}
steps:
- name: Set image names
id: version
run: |
repo="${GITHUB_REPOSITORY##*/}"
echo "project_name=${repo}-dev" >> "$GITHUB_OUTPUT"
echo "sidecar_name=llm-d-routing-sidecar-dev" >> "$GITHUB_OUTPUT"

- name: Set branch name as tag
id: tag
run: |
echo "tag=${GITHUB_REF_NAME}" >> "$GITHUB_OUTPUT"

build-and-push:
needs: set-params
uses: ./.github/workflows/ci-build-images.yaml
with:
epp-image-name: ${{ needs.set-params.outputs.project_name }}
sidecar-image-name: ${{ needs.set-params.outputs.sidecar_name }}
tag: ${{ needs.set-params.outputs.tag }}
prerelease: "true"
secrets:
GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }}
101 changes: 57 additions & 44 deletions .github/workflows/ci-pr-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,75 +16,88 @@ jobs:
steps:
- name: Checkout source
uses: actions/checkout@v6
- uses: dorny/paths-filter@v3
- uses: dorny/paths-filter@v4
id: filter
with:
filters: |
src:
- '**/*.go'
- '**/*.py'
- Dockerfile.epp
- Dockerfile.sidecar
- Dockerfile.*
- Makefile*
- go.mod
- scripts/**
lint-and-test:
needs: check-changes
if: ${{ needs.check-changes.outputs.src == 'true' }}
runs-on: ubuntu-latest
steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
tool-cache: false

- name: Checkout source
uses: actions/checkout@v6

- name: Sanity check repo contents
run: ls -la

- name: Extract Go version from go.mod
run: sed -En 's/^go (.*)$/GO_VERSION=\1/p' go.mod >> $GITHUB_ENV
- name: Create and resolve Go cache volumes
id: go-cache
run: |
docker volume create llm-d-gomodcache
docker volume create llm-d-gobuildcache
echo "mod=$(docker volume inspect llm-d-gomodcache -f '{{.Mountpoint}}')" >> $GITHUB_OUTPUT
echo "build=$(docker volume inspect llm-d-gobuildcache -f '{{.Mountpoint}}')" >> $GITHUB_OUTPUT

- name: Set up Go with cache
uses: actions/setup-go@v6
- name: Cache Go modules and build cache
uses: actions/cache@v4
with:
go-version: "${{ env.GO_VERSION }}"
cache-dependency-path: ./go.sum

- name: Configure CGO for Python
run: |
PYTHON_INCLUDE=$(python3 -c "import sysconfig; print(sysconfig.get_path('include'))")
echo "CPATH=${PYTHON_INCLUDE}:${CPATH}" >> $GITHUB_ENV
echo "CGO_ENABLED=1" >> $GITHUB_ENV
echo "CGO_CFLAGS=$(python3-config --cflags --embed)" >> $GITHUB_ENV
echo "CGO_LDFLAGS=$(python3-config --ldflags --embed)" >> $GITHUB_ENV
path: |
${{ steps.go-cache.outputs.mod }}
${{ steps.go-cache.outputs.build }}
key: go-cache-${{ hashFiles('go.sum') }}
restore-keys: |
go-cache-

- name: Set PKG_CONFIG_PATH
run: echo "PKG_CONFIG_PATH=/usr/lib/pkgconfig" >> $GITHUB_ENV
- name: Run make lint
run: make lint

- name: Install dependencies
run: |
go mod tidy
sudo -E env "PATH=$PATH" make install-dependencies install-python-deps
- name: Run make build
shell: bash
run: make build

- name: Run lint checks
uses: golangci/golangci-lint-action@v9
# Restore the baseline saved from the last main push so the compare step
# can diff against it. Missing cache (first PR ever) is not an error;
# compare-coverage.sh reports all components as "new" and exits 0.
- name: Restore main branch coverage baseline
if: github.event_name == 'pull_request'
uses: actions/cache/restore@v4
with:
version: "v2.8.0"
args: "--config=./.golangci.yml"
skip-cache: true
env:
CGO_ENABLED: ${{ env.CGO_ENABLED }}
CGO_CFLAGS: ${{ env.CGO_CFLAGS }}
CGO_LDFLAGS: ${{ env.CGO_LDFLAGS }}
CPATH: ${{ env.CPATH }}
PKG_CONFIG_PATH: ${{ env.PKG_CONFIG_PATH }}
path: coverage/baseline
key: coverage-main

- name: Run make build
- name: Run unit tests
shell: bash
run: make build
run: make test-unit

- name: Compare coverage against main baseline
if: github.event_name == 'pull_request'
continue-on-error: true
shell: bash
run: make coverage-compare BASELINE_DIR=coverage/baseline

# Copy the freshly generated profiles into coverage/baseline so that
# the saved path matches the restored path on future PR runs.
- name: Stage coverage baseline for caching
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
shell: bash
run: |
mkdir -p coverage/baseline
cp coverage/*.out coverage/baseline/

- name: Save coverage baseline (main branch only)
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
uses: actions/cache/save@v4
with:
path: coverage/baseline
key: coverage-main

- name: Run make test
- name: Run make test-e2e
shell: bash
run: make test
run: make test-e2e
Loading