Skip to content

Commit 202dc36

Browse files
committed
docs: Update FMA docs with ecosystem context, milestone status, and dependencies
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Gloire Rubambiza <gloire@ibm.com>
1 parent c47c25b commit 202dc36

File tree

4 files changed

+62
-30
lines changed

4 files changed

+62
-30
lines changed

README.md

Lines changed: 42 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
1-
The llm-d-fast-model-actuation repository contains work on one of the
2-
many areas of work that contribute to fast model actuation. This area
3-
concerns exploiting techniques in which an inference server process
4-
dramatically changes its properties and behavior over time.
1+
The llm-d-fast-model-actuation repository is part of the
2+
[llm-d](https://github.com/llm-d) ecosystem for serving large
3+
language models on Kubernetes. FMA lives in the
4+
[llm-d-incubation](https://github.com/llm-d-incubation) organization,
5+
where new llm-d components are developed before graduation.
6+
7+
This repository contains work on one of the many areas of work that
8+
contribute to fast model actuation. This area concerns exploiting
9+
techniques in which an inference server process dramatically changes
10+
its properties and behavior over time.
511

612
There are two sorts of changes contemplated here. Both are currently
713
realized only for vLLM and nvidia's GPU operator, but we hope that
@@ -38,26 +44,45 @@ _server-requesting Pod_, which describes a desired inference server
3844
but does not actually run it, and (b) a _server-providing Pod_, which
3945
actually runs the inference server(s).
4046

41-
The topics above are realized by two software components, as follows.
47+
The topics above are realized by the following software components.
4248

43-
- A vLLM instance launcher, the persistent management process
44-
mentioned above. This is written in Python and the source code is in
45-
the [inference_server/launcher](inference_server/launcher)
46-
directory.
47-
48-
- A "dual-pods" controller, which manages the server-providing Pods
49+
- A **dual-pods controller**, which manages the server-providing Pods
4950
in reaction to the server-requesting Pods that other manager(s)
5051
create and delete. This controller is written in the Go programming
5152
language and this repository's contents follow the usual conventions
5253
for one containing Go code.
5354

54-
We are currently in the midst of a development roadmap with three
55-
milestones. We are currently polishing off milestone 2, which involves
56-
using vLLM sleep/wake but not the launcher. The final milestone, 3,
57-
adds the use of the launcher.
55+
- A **vLLM instance launcher**, the persistent management process
56+
mentioned above. This is written in Python and the source code is in
57+
the [inference_server/launcher](inference_server/launcher)
58+
directory.
59+
60+
- A **launcher-populator** controller, which watches LauncherConfig
61+
and LauncherPopulationPolicy custom resources and ensures that the
62+
right number of launcher pods exist on each node. This controller is
63+
also written in Go.
64+
65+
These controllers are deployed together via a unified Helm chart at
66+
[charts/fma-controllers](charts/fma-controllers). The chart also
67+
installs the shared RBAC resources and optional ValidatingAdmissionPolicies.
68+
69+
The repository defines three Custom Resource Definitions (CRDs):
70+
71+
- **InferenceServerConfig** — declares the properties of an inference
72+
server (image, command, resources) that server-providing Pods use.
73+
- **LauncherConfig** — declares the configuration for a launcher
74+
process (image, resources, ports) that manages vLLM instances.
75+
- **LauncherPopulationPolicy** — declares the desired population of
76+
launcher pods per node.
77+
78+
These CRD definitions live in [config/crd](config/crd) and the Go
79+
types are in [pkg/api](pkg/api).
5880

59-
**NOTE**: we are in the midst of a terminology shift, from
60-
"server-running Pod" to "server-providing Pod".
81+
The development roadmap has three milestones. Milestone 2, which
82+
introduced vLLM sleep/wake without the launcher, is finished.
83+
Milestone 3, which adds launcher-based model swapping where a
84+
persistent launcher process manages vLLM instances on each node, is
85+
under implementation.
6186

6287
For further design documentation, see [the docs
6388
directory](docs/README.md).

docs/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,15 @@
1313

1414
- [Markdown and Python code quality check](../.github/workflows/python-code-quality.yml)
1515
- [Go code quality check](../.github/workflows/golangci-lint.yml)
16+
- [Verify IDL consumption](../.github/workflows/verify-idl-consumption.yml)
1617
- [Test build of dual-pods controller image](../.github/workflows/build-controller-image.yml)
1718
- [Test build of launcher image](../.github/workflows/build-launcher-image.yml)
1819
- [Test build of requester image](../.github/workflows/build-requester-image.yml)
1920
- [Test build of launcher populator image](../.github/workflows/build-populator-image.yml)
2021
- [End-to-end testing in CI using a `kind` cluster](../.github/workflows/pr-test-in-kind.yml)
22+
- [Launcher-based end-to-end testing in CI](../.github/workflows/launcher-based-e2e-test.yml)
23+
- [End-to-end testing on OpenShift](../.github/workflows/ci-e2e-openshift.yaml)
24+
- [Signed commits check](../.github/workflows/ci-signed-commits.yaml)
2125
- [Release – Build Images & Publish Helm Charts to GHCR](../.github/workflows/publish-release.yaml)
2226

2327
# Release

docs/dual-pods.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -86,11 +86,10 @@ is not yet supported.
8686

8787
## Design
8888

89-
Note: this document currently focuses on the design for the second of
90-
three milestones.
91-
92-
Defining limitation of milestone 2: No use of the launcher. Each
93-
server-providing Pod runs just one vLLM instance.
89+
Note: this document covers the design for milestones 2 and 3.
90+
Milestone 2 (vLLM sleep/wake without the launcher) is finished.
91+
Milestone 3 (launcher-based model swapping) is under implementation;
92+
see [launcher](launcher.md) for details on the launcher API.
9493

9594
### Drawing
9695

@@ -508,8 +507,7 @@ ConfigMap is populated with the needed information. The dual-pods
508507
controller reads the mapping from GPU UUID to index from that
509508
ConfigMap.
510509

511-
This will change in milestone 3. The launcher will read the
512-
UUIDs of the GPUs on its node, and the request to launch a vLLM
513-
instance will carry the list of assigned GPU UUIDs. The launcher will
514-
translate from UUID to index and put the list of indices in the vLLM
515-
container's CUDA_VISIBLE_DEVICES.
510+
In milestone 3, the launcher reads the UUIDs of the GPUs on its node,
511+
and the request to launch a vLLM instance carries the list of assigned
512+
GPU UUIDs. The launcher translates from UUID to index and puts the
513+
list of indices in the vLLM container's CUDA_VISIBLE_DEVICES.

docs/upstream-versions.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,13 @@
55
66
## Dependencies
77

8-
<!-- Add your tracked dependencies using the format below. Remove this comment when populated. -->
9-
108
| Dependency | Current Pin | Pin Type | File Location | Upstream Repo |
119
|-----------|-------------|----------|---------------|---------------|
12-
<!-- | **example-lib** | `v1.2.3` | tag | `go.mod` line 10 | example-org/example-lib | -->
10+
| **Go** | `1.24.2` | version | `go.mod` line 3 | [golang/go](https://github.com/golang/go) |
11+
| **k8s.io/api** | `v0.34.0` | tag | `go.mod` line 7 | [kubernetes/api](https://github.com/kubernetes/api) |
12+
| **k8s.io/apimachinery** | `v0.34.0` | tag | `go.mod` line 8 | [kubernetes/apimachinery](https://github.com/kubernetes/apimachinery) |
13+
| **k8s.io/client-go** | `v0.34.0` | tag | `go.mod` line 9 | [kubernetes/client-go](https://github.com/kubernetes/client-go) |
14+
| **sigs.k8s.io/controller-runtime** | `v0.22.1` | tag | `go.mod` line 12 | [kubernetes-sigs/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) |
15+
| **vllm/vllm-openai** | `v0.10.2` | tag | `cmd/requester/README.md` | [vllm-project/vllm](https://github.com/vllm-project/vllm) |
16+
| **vllm (CPU build)** | `v0.15.0` | tag | `dockerfiles/Dockerfile.launcher.cpu` | [vllm-project/vllm](https://github.com/vllm-project/vllm) |
17+
| **nvidia/cuda** | `12.8.0-base-ubuntu22.04` | tag | `dockerfiles/Dockerfile.requester` | [NVIDIA CUDA](https://hub.docker.com/r/nvidia/cuda) |

0 commit comments

Comments
 (0)