FMA is currently developed by a small team in a focused development spike. We welcome contributions that align with the project's goals. The FMA project accepts contributions via GitHub pull requests.
There are several ways you can contribute to FMA:
- Reporting Issues: Help us identify and fix bugs by reporting them clearly and concisely.
- Suggesting Features: Share your ideas for new features or improvements.
- Improving Documentation: Help make the project more accessible by enhancing the documentation.
- Submitting Code Contributions (with consideration): While the project leads maintain final say, code contributions that align with the project's vision are always welcome.
This project adheres to the llm-d Code of Conduct and Covenant. By participating, you are expected to uphold this code.
- Developer Slack: Join our developer Slack workspace and participate in the #fast-model-actuation channel to connect with the core maintainers and other contributors, ask questions, and participate in discussions.
- Weekly Meetings: FMA project updates, ongoing work discussions, and Q&A are covered in our weekly project meeting every Tuesday at 8:00 PM ET. Join at meet.google.com/nha-rgkw-qkw.
- Code: Hosted in the llm-d-incubation GitHub organization
- Issues: FMA-specific bugs or issues should be reported in llm-d-incubation/llm-d-fast-model-actuation
- Mailing List: llm-d-contributors@googlegroups.com for document sharing and collaboration
- Social Media: Follow the main llm-d project on social media for the latest news, announcements, and updates:
- X: https://x.com/_llm_d_
- LinkedIn: https://linkedin.com/company/llm-d
- Reddit: https://www.reddit.com/r/llm_d/
- YouTube @llm-d-project
We are a small team with defined responsibilities. All proposals must be reviewed by at least one relevant human reviewer, with broader review expected for changes with particularly wide impact.
All features involving public APIs, behavior between core components, or new core repositories/subsystems should be discussed with maintainers before implementation.
Process:
- Create an issue in the FMA repository describing:
- Summary: A clear description of the change proposed and the outcome
- Motivation: Problem to be solved, including Goals/Non-Goals, and any necessary background
- Proposal: User stories and enough detail that reviewers can understand what you're proposing
- Design Details: Specifics of your change including API specs or code snippets if applicable
- Alternatives: Alternative implementations considered and why they were rejected
- Discuss in the #fast-model-actuation Slack channel or weekly meeting
- Get review from impacted component maintainers
- Get approval from project maintainers before starting implementation
For changes that fix broken code or add small changes within a component:
- All bugs and commits must have a clear description of the bug, how to reproduce, and how the change is made
- Create an issue in the FMA repository or submit a pull request directly for small fixes
- A maintainer must approve the change (within the spirit of the component design and scope of change)
- For moderate size changes, create an RFC issue in GitHub and engage in the #fast-model-actuation Slack channel
The current testing documentation can be found within the respective components of the docs folder.
- All code changes must be submitted as pull requests (no direct pushes)
- All changes must be reviewed and approved by a maintainer other than the author
- All repositories must gate merges on compilation and passing tests
- Pull requests should describe the problem succinctly
- Prefer smaller PRs over larger ones; when a PR adds multiple commits, prefer smaller commits
- Commit messages should have:
- Short, descriptive titles
- Description of why the change was needed
- Enough detail for someone reviewing git history to understand the scope
- DCO Sign-off: All commits must include a valid DCO sign-off line (
Signed-off-by: Name <email@domain.com>)- Add automatically with
git commit -s - See PR_SIGNOFF.md for configuration details
- Required for all contributions per Developer Certificate of Origin
- Add automatically with
- Includes: All protocols, API endpoints, internal APIs, command line flags/arguments, and Kubernetes API object type (resource) definitions
- Versioning: We use Semantic Versioning at major version 0 for Go modules and Python packages, which grants freedom to make breaking changes. For Kubernetes API object types we use the Kubernetes versioning structure and evolution rules (currently at
v1alpha1). Since the project has no installed base, we currently make changes without regard to backward compatibility. - Documentation: All APIs must have documented specs describing expected behavior
We use two tiers of testing:
- Behavioral unit tests: Fast verification of individual units of code, testing different arguments
- Best for fast verification of parts of code, testing different arguments
- Does not cover interaction between units of code
- End-to-end (e2e) tests: Whole system testing including benchmarking
- Best for preventing end-to-end regression and verifying overall correctness
- Execution can be slow
Appropriate test coverage is an important part of code review.
Maintain appropriate security mindset for production serving. The project will establish a project email address for responsible disclosure of security issues that will be reviewed by the project maintainers. Prior to the first GA release we will formalize a security component and process. More details on security can be found in the SECURITY.md file.
The repository contains the following deployable components.
| Component | Language | Source | Description |
|---|---|---|---|
| Dual-Pods Controller | Go | cmd/dual-pods-controller/, pkg/controller/dual-pods/ |
Manages server-providing Pods (milestone 2) and launched vLLM instances (milestone 3) in reaction to server-requesting Pods. Handles binding, sleep/wake, and readiness relay. |
| Launcher-Populator Controller | Go | cmd/launcher-populator/, pkg/controller/launcher-populator/ |
Proactively creates launcher pods on nodes based on LauncherPopulationPolicy CRDs. |
| Requester | Go | cmd/requester/, pkg/server/requester/ |
Lightweight binary running in server-requesting Pods. Exposes SPI endpoints for GPU info and readiness relay. |
| Launcher | Python | inference_server/launcher/ |
FastAPI service managing multiple vLLM subprocess instances via REST API. |
| Test Requester | Go | cmd/test-requester/ |
Test binary simulating a requester (does not use real GPUs). |
| Test Server | Go | cmd/test-server/ |
Test binary simulating a vLLM-like inference server. |
| Test Launcher | Python | dockerfiles/Dockerfile.launcher.cpu |
CPU-based launcher image for testing without GPUs. |
The two controllers are deployed via a single Helm chart in charts/fma-controllers/.
This is an incubating component in the llm-d ecosystem, focused on fast model actuation techniques.
-
api/fma/v1alpha1/: Custom Resource Definitions (CRDs) and Go typesinferenceserverconfig_types.go: InferenceServerConfig CRDlauncherconfig_types.go: LauncherConfig CRDlauncherpopulationpolicy_types.go: LauncherPopulationPolicy CRD
-
cmd/: Main applicationsdual-pods-controller/: Controller managing server-providing Podslauncher-populator/: Controller managing launcher pod populationrequester/: Requester binary for server-requesting Podstest-requester/: Test requester (does not use real GPUs)test-server/: Test binary simulating a vLLM-like inference server
-
charts/: Helm charts for deploymentfma-controllers/: Unified Helm chart for both controllers
-
config/: Kubernetes configurations (CRDs, examples, and more — see cluster-sharing docs for recent extensions) -
inference_server/: Python-based inference server componentslauncher/: vLLM instance launcher (persistent management process)benchmark/: Benchmarking tools and scenarios
-
docs/: Documentation (seedocs/README.mdfor full index) -
test/e2e/: End-to-end test scriptsrun.sh: Standard dual-pods E2E testrun-launcher-based.sh: Launcher-based E2E test
-
dockerfiles/: Container image definitionsDockerfile.launcher.cpu: CPU-based launcher image for testing without GPUsDockerfile.launcher.benchmark: GPU-based launcher image (the real deal)Dockerfile.requester: Requester application image
- Maintainers are listed in the OWNERS file. The file follows Kubernetes OWNERS conventions for future Prow compatibility but is not currently consumed by automation. Additional OWNERS files can be added per-directory as the project grows.
- Contributors can become maintainers through consistent, quality contributions
FMA is currently in the llm-d-incubation organization, which means:
- Rapid iteration: Greater freedom for testing new ideas and approaches
- Components may change significantly as we learn
- Best effort support: Not yet ready for production use
- Graduation path: Working toward integration with core llm-d components
Graduation criteria are defined by the llm-d organization (not this repo). This repo tracks its progress toward meeting those criteria. See the llm-d organization documentation for details.