Skip to content

Commit 84e60e1

Browse files
Mike's comments
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
1 parent 87478d2 commit 84e60e1

File tree

2 files changed

+28
-57
lines changed

2 files changed

+28
-57
lines changed

CONTRIBUTING.md

Lines changed: 26 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
## Contributing Guidelines
22

3-
Thank you for your interest in contributing to llm-d Fast Model Actuation (FMA). Community involvement is highly valued and crucial for the project's growth and success. The FMA project accepts contributions via GitHub pull requests. This outlines the process to help get your contribution accepted.
4-
5-
To ensure a clear direction and cohesive vision for the project, the project leads have the final decision on all contributions. However, these guidelines outline how you can contribute effectively to FMA.
3+
FMA is currently developed by a small team in a focused development spike. We welcome contributions that align with the project's goals. The FMA project accepts contributions via GitHub pull requests.
64

75
## How You Can Contribute
86

@@ -32,7 +30,7 @@ This project adheres to the llm-d [Code of Conduct and Covenant](CODE_OF_CONDUCT
3230

3331
## Contributing Process
3432

35-
We follow a **lazy consensus** approach: changes proposed by people with responsibility for a problem, without disagreement from others, within a bounded time window of review by their peers, should be accepted.
33+
We are a small team with defined responsibilities. All proposals must be reviewed by at least one relevant human reviewer, with broader review expected for changes with particularly wide impact.
3634

3735
### Types of Contributions
3836

@@ -70,13 +68,11 @@ The current testing documentation can be found within the respective components
7068
* **All code changes** must be submitted as pull requests (no direct pushes)
7169
* **All changes** must be reviewed and approved by a maintainer other than the author
7270
* **All repositories** must gate merges on compilation and passing tests
73-
* **All experimental features** must be off by default and require explicit opt-in
7471

7572
## Commit and Pull Request Style
7673

7774
* **Pull requests** should describe the problem succinctly
78-
* **Rebase and squash** before merging
79-
* **Use minimal commits** and break large changes into distinct commits
75+
* **Prefer smaller PRs** over larger ones; when a PR adds multiple commits, prefer smaller commits
8076
* **Commit messages** should have:
8177
* Short, descriptive titles
8278
* Description of why the change was needed
@@ -88,43 +84,42 @@ The current testing documentation can be found within the respective components
8884

8985
## API Changes and Deprecation
9086

91-
* **No breaking changes**: The no-breaking-changes policy will apply once we reach GA
92-
* **Includes**: All protocols, API endpoints, internal APIs, command line flags/arguments
93-
* **Exception**: Bug fixes that don't impact significant number of consumers (As the project matures, we will be stricter about such changes - Hyrum's Law is real)
94-
* **Versioning**: All protocols and APIs should be versionable with clear forward and backward compatibility requirements. A new version may change behavior and fields. For Go modules and Python packages use semver v0.x.x. For Kubernetes API object types we use the Kubernetes versioning structure and evolution rules
87+
* **Includes**: All protocols, API endpoints, internal APIs, command line flags/arguments, and Kubernetes API object type (resource) definitions
88+
* **Versioning**: We use [Semantic Versioning](https://semver.org) at major version 0 for Go modules and Python packages, which grants freedom to make breaking changes. For Kubernetes API object types we use the Kubernetes versioning structure and evolution rules (currently at `v1alpha1`). Since the project has no installed base, we currently make changes without regard to backward compatibility.
9589
* **Documentation**: All APIs must have documented specs describing expected behavior
9690

9791
## Testing Requirements
9892

9993
We use two tiers of testing:
10094

101-
1. **Behavioral tests**: Fast verification of code parts, testing different arguments
95+
1. **Behavioral unit tests**: Fast verification of individual units of code, testing different arguments
10296
* Best for fast verification of parts of code, testing different arguments
103-
* Doesn't cover interactions between code
97+
* Does not cover interaction between units of code
10498
2. **End-to-end (e2e) tests**: Whole system testing including benchmarking
105-
* Best for preventing end to end regression and verifying overall correctness
99+
* Best for preventing end-to-end regression and verifying overall correctness
106100
* Execution can be slow
107101

108-
Strong e2e coverage is required for deployed systems to prevent functional regression. Appropriate test coverage is an important part of code review.
102+
Appropriate test coverage is an important part of code review.
109103

110104
## Security
111105

112106
Maintain appropriate security mindset for production serving. The project will establish a project email address for responsible disclosure of security issues that will be reviewed by the project maintainers. Prior to the first GA release we will formalize a security component and process. More details on security can be found in the [SECURITY.md](./SECURITY.md) file.
113107

114108
## Project Structure and Ownership
115109

116-
The repository contains the following deployable components:
110+
The repository contains the following deployable components.
117111

118112
| Component | Language | Source | Description |
119113
|---|---|---|---|
120-
| **Dual-Pods Controller** | Go | `cmd/dual-pods-controller/`, `pkg/controller/dual-pods/` | Manages server-providing Pods in reaction to server-requesting Pods. Handles binding, sleep/wake, and readiness relay. |
114+
| **Dual-Pods Controller** | Go | `cmd/dual-pods-controller/`, `pkg/controller/dual-pods/` | Manages server-providing Pods (milestone 2) and launched vLLM instances (milestone 3) in reaction to server-requesting Pods. Handles binding, sleep/wake, and readiness relay. |
121115
| **Launcher-Populator Controller** | Go | `cmd/launcher-populator/`, `pkg/controller/launcher-populator/` | Proactively creates launcher pods on nodes based on `LauncherPopulationPolicy` CRDs. |
122116
| **Requester** | Go | `cmd/requester/`, `pkg/server/requester/` | Lightweight binary running in server-requesting Pods. Exposes SPI endpoints for GPU info and readiness relay. |
123117
| **Launcher** | Python | `inference_server/launcher/` | FastAPI service managing multiple vLLM subprocess instances via REST API. |
124-
| **Test Requester** | Go | `cmd/test-requester/` | Test binary simulating a requester with GPU allocation. |
118+
| **Test Requester** | Go | `cmd/test-requester/` | Test binary simulating a requester (does not use real GPUs). |
125119
| **Test Server** | Go | `cmd/test-server/` | Test binary simulating a vLLM-like inference server. |
120+
| **Test Launcher** | Python | `dockerfiles/Dockerfile.launcher.cpu` | CPU-based launcher image for testing without GPUs. |
126121

127-
The two controllers are deployed via Helm charts in `charts/`.
122+
The two controllers are deployed via a single Helm chart in `charts/fma-controllers/`.
128123

129124
### Core Organization (`llm-d-incubation/llm-d-fast-model-actuation`)
130125

@@ -140,59 +135,44 @@ This is an **incubating component** in the llm-d ecosystem, focused on fast mode
140135
* **`cmd/`**: Main applications
141136
* `dual-pods-controller/`: Controller managing server-providing Pods
142137
* `launcher-populator/`: Controller managing launcher pod population
143-
* `requester/`: Example requester application
144-
* `test-requester/`: Test requester with GPU allocation
145-
* `test-server/`: Test server application
138+
* `requester/`: Requester binary for server-requesting Pods
139+
* `test-requester/`: Test requester (does not use real GPUs)
140+
* `test-server/`: Test binary simulating a vLLM-like inference server
146141

147142
* **`charts/`**: Helm charts for deployment
148-
* `dual-pods-controller/`: Helm chart for dual-pods controller
149-
* `launcher-populator/`: Helm chart for launcher-populator controller
143+
* `fma-controllers/`: Unified Helm chart for both controllers
150144

151-
* **`config/`**: Kubernetes configurations
152-
* `crd/`: CRD YAML definitions
153-
* `examples/`: Example configurations and deployments
145+
* **`config/`**: Kubernetes configurations (CRDs, examples, and more — see [cluster-sharing docs](docs/cluster-sharing.md) for recent extensions)
154146

155147
* **`inference_server/`**: Python-based inference server components
156148
* `launcher/`: vLLM instance launcher (persistent management process)
157149
* `benchmark/`: Benchmarking tools and scenarios
158150

159-
* **`docs/`**: Documentation
160-
* `dual-pods.md`: Dual-pods architecture documentation
161-
* `launcher.md`: Launcher component documentation
162-
* `e2e-recipe.md`: End-to-end testing guide
163-
* `local-test.md`: Local testing instructions
151+
* **`docs/`**: Documentation (see [`docs/README.md`](docs/README.md) for full index)
164152

165153
* **`test/e2e/`**: End-to-end test scripts
166154
* `run.sh`: Standard dual-pods E2E test
167155
* `run-launcher-based.sh`: Launcher-based E2E test
168156

169157
* **`dockerfiles/`**: Container image definitions
170-
* `Dockerfile.launcher.cpu`: CPU-based launcher image
171-
* `Dockerfile.launcher.benchmark`: Benchmark launcher image
158+
* `Dockerfile.launcher.cpu`: CPU-based launcher image for testing without GPUs
159+
* `Dockerfile.launcher.benchmark`: GPU-based launcher image (the real deal)
172160
* `Dockerfile.requester`: Requester application image
173161

174162
### Component Ownership
175163

176-
* **Maintainers** are listed in the [OWNERS](OWNERS) file. The file follows Kubernetes conventions for future Prow compatibility but is not currently consumed by automation. Additional OWNERS files can be added per-directory as the project grows.
164+
* **Maintainers** are listed in the [OWNERS](OWNERS) file. The file follows [Kubernetes OWNERS conventions](https://www.kubernetes.dev/docs/guide/owners/) for future Prow compatibility but is not currently consumed by automation. Additional OWNERS files can be added per-directory as the project grows.
177165
* **Contributors** can become maintainers through consistent, quality contributions
178-
* Code ownership follows Kubernetes project conventions with OWNERS files
179166

180167
### Incubation Status
181168

182169
FMA is currently in the **llm-d-incubation** organization, which means:
183170

184171
* **Rapid iteration**: Greater freedom for testing new ideas and approaches
185-
* **Experimental features**: Components may change significantly as we learn
172+
* **Components may change significantly** as we learn
186173
* **Best effort support**: Not yet ready for production use
187174
* **Graduation path**: Working toward integration with core llm-d components
188175

189-
### Graduation Criteria
190-
191-
To graduate to the core `llm-d` organization, FMA must demonstrate:
176+
### Graduation
192177

193-
1. **Stability**: Proven reliability in test environments
194-
2. **Performance**: Measurable improvements in model actuation speed
195-
3. **Documentation**: Complete user and developer documentation
196-
4. **Testing**: Comprehensive unit, integration, and E2E test coverage
197-
5. **Community adoption**: Active use and feedback from early adopters
198-
6. **API maturity**: Stable APIs ready for production use
178+
Graduation criteria are defined by the llm-d organization (not this repo). This repo tracks its progress toward meeting those criteria. See the llm-d organization documentation for details.

PR_SIGNOFF.md

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,6 @@
11
# Git Commit Signoff and Signing
22

3-
**NOTE**: "sign-off" is different from "signing" a commit. The former
4-
indicates your assent to the repository's terms for contributors, the
5-
latter adds a cryptographic signature that is rarely displayed. See
6-
[the git
7-
book](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work)
8-
about signing. For commit signoff, do a web search on `git
9-
signoff`. GitHub has a concept of [a commit being
10-
"verified"](https://docs.github.com/en/authentication/managing-commit-signature-verification)
11-
that extends the Git concept of signing.
3+
**NOTE:** "DCO sign-off" is different from commit "signing". The former affirms your compliance with the DCO, while the latter adds a cryptographic signature that is rarely displayed. See [the git book](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work) about signing. For commit signoff, do a web search on `git signoff`. GitHub has a concept of [a commit being "verified"](https://docs.github.com/en/authentication/managing-commit-signature-verification) that extends the Git concept of signing.
124

135
In order to get a pull request approved, you must first complete a DCO
146
sign-off for each commit that the request is asking to add to the
@@ -20,8 +12,7 @@ repository](https://github.com/llm-d/llm-d/blob/main/DCO). In
2012
the case of an individual, DCO sign-off is accomplished by doing a Git
2113
"sign-off" on the commit.
2214

23-
We prefer that commits contributed to this repository be signed and
24-
GitHub verified, but this is not strictly necessary or enforced.
15+
Commits contributed to this repository must be signed and GitHub verified, as enforced by the [signed commits CI check](.github/workflows/ci-signed-commits.yaml).
2516

2617
## Commit Sign-off
2718

0 commit comments

Comments
 (0)