Skip to content

Commit 87478d2

Browse files
Mike's comments
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
1 parent 9036294 commit 87478d2

File tree

2 files changed

+25
-134
lines changed

2 files changed

+25
-134
lines changed

CONTRIBUTING.md

Lines changed: 24 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -63,93 +63,16 @@ For changes that fix broken code or add small changes within a component:
6363

6464
## Feature Testing
6565

66-
The FMA project includes several components that require different testing approaches:
66+
The current testing documentation can be found within the respective components of the [docs folder](docs/).
6767

68-
### Testing FMA Components
69-
70-
#### 1. Dual-Pods Controller Testing
71-
72-
The dual-pods controller manages server-providing Pods in reaction to server-requesting Pods:
73-
74-
* **Unit tests**: Test controller logic in `cmd/dual-pods-controller/`
75-
* **Integration tests**: Verify Pod creation, deletion, and lifecycle management
76-
* **E2E tests**: Run the full controller in a Kubernetes cluster (kind or OpenShift)
77-
* Use `test/e2e/run.sh` for end-to-end testing
78-
* Verify server-requesting Pods trigger server-providing Pod creation
79-
* Test resource allocation and GPU assignment
80-
81-
#### 2. Launcher Testing
82-
83-
The vLLM instance launcher is a persistent management process written in Python:
84-
85-
* **Unit tests**: Located in `inference_server/launcher/tests/`
86-
* `test_launcher.py`: Tests launcher logic and vLLM instance management
87-
* `test_gputranslator.py`: Tests GPU resource translation
88-
* **Integration tests**: Test launcher with actual vLLM instances
89-
* Verify model loading and unloading
90-
* Test sleep/wake functionality
91-
* Validate model swapping capabilities
92-
* **Benchmark tests**: Use `inference_server/benchmark/` for performance testing
93-
* Run scenarios defined in `scenarios.py`
94-
* Measure startup latency and model swap times
95-
96-
#### 3. Launcher-Populator Controller Testing
97-
98-
The launcher-populator controller ensures the right number of launcher pods exist on each node:
99-
100-
* **Unit tests**: Test controller logic in `cmd/launcher-populator/`
101-
* **Integration tests**: Verify LauncherConfig and LauncherPopulationPolicy handling
102-
* **E2E tests**: Validate launcher pod distribution across nodes
103-
* Use `test/e2e/run-launcher-based.sh` for launcher-based testing
104-
105-
#### 4. Custom Resource Definitions (CRDs)
106-
107-
Test the three CRDs defined in `config/crd/`:
108-
109-
* **InferenceServerConfig**: Verify server configuration properties
110-
* **LauncherConfig**: Test launcher process configuration
111-
* **LauncherPopulationPolicy**: Validate launcher pod population rules
112-
113-
### Running Tests
114-
115-
**Go tests:**
116-
117-
```bash
118-
make test
119-
```
120-
121-
**Python tests:**
122-
123-
```bash
124-
cd inference_server/launcher
125-
python -m pytest tests/
126-
```
127-
128-
**E2E tests:**
129-
130-
```bash
131-
# Standard dual-pods test
132-
./test/e2e/run.sh
133-
134-
# Launcher-based test
135-
./test/e2e/run-launcher-based.sh
136-
```
137-
138-
**Benchmark tests:**
139-
140-
```bash
141-
cd inference_server/benchmark
142-
python benchmark_base.py
143-
```
144-
145-
### Code Review Requirements
68+
## Code Review Requirements
14669

14770
* **All code changes** must be submitted as pull requests (no direct pushes)
14871
* **All changes** must be reviewed and approved by a maintainer other than the author
14972
* **All repositories** must gate merges on compilation and passing tests
15073
* **All experimental features** must be off by default and require explicit opt-in
15174

152-
### Commit and Pull Request Style
75+
## Commit and Pull Request Style
15376

15477
* **Pull requests** should describe the problem succinctly
15578
* **Rebase and squash** before merging
@@ -163,66 +86,45 @@ python benchmark_base.py
16386
* See [PR_SIGNOFF.md](PR_SIGNOFF.md) for configuration details
16487
* Required for all contributions per [Developer Certificate of Origin](https://developercertificate.org/)
16588

166-
## Code Organization and Ownership
167-
168-
### Components and Maintainers
169-
170-
* **Components** are the primary unit of code organization (repo scope or directory/package/module within a repo)
171-
* **Maintainers** own components and approve changes
172-
* **Contributors** can become maintainers through sufficient evidence of contribution
173-
* Code ownership is reflected in [OWNERS files](https://go.k8s.io/owners) consistent with Kubernetes project conventions
174-
175-
### Experimental Features in FMA
176-
177-
As an incubating component, FMA encourages fast iteration and exploration with these constraints:
178-
179-
1. **Clear identification** as experimental in code and documentation
180-
2. **Default to off** and require explicit enablement for experimental features
181-
3. **Best effort support** only
182-
4. **Removal if unmaintained** with no one to move it forward
183-
5. **No stigma** to experimental or incubating status
184-
185-
**Naming convention**: Experimental flags must include `experimental` in name (e.g., `--experimental-model-swap-v2=true`)
186-
187-
When adding experimental features:
188-
189-
1. Open pull request with clear experimental designation
190-
2. Maintainer reviews and enforces "off-by-default" gating
191-
3. Provide tests for both on/off states
192-
4. Document the experimental nature in code comments and user documentation
193-
5. When graduating a feature, default to on and remove conditional logic after one release
194-
19589
## API Changes and Deprecation
19690

197-
* **No breaking changes**: Once an API/protocol is in GA release (non-experimental), it cannot be removed or behavior changed
91+
* **No breaking changes**: The no-breaking-changes policy will apply once we reach GA
19892
* **Includes**: All protocols, API endpoints, internal APIs, command line flags/arguments
19993
* **Exception**: Bug fixes that don't impact significant number of consumers (As the project matures, we will be stricter about such changes - Hyrum's Law is real)
200-
* **Versioning**: All protocols and APIs should be versionable with clear forward and backward compatibility requirements. A new version may change behavior and fields.
94+
* **Versioning**: All protocols and APIs should be versionable with clear forward and backward compatibility requirements. A new version may change behavior and fields. For Go modules and Python packages use semver v0.x.x. For Kubernetes API object types we use the Kubernetes versioning structure and evolution rules
20195
* **Documentation**: All APIs must have documented specs describing expected behavior
20296

20397
## Testing Requirements
20498

205-
We use three tiers of testing:
99+
We use two tiers of testing:
206100

207-
1. **Unit tests**: Fast verification of code parts, testing different arguments
101+
1. **Behavioral tests**: Fast verification of code parts, testing different arguments
208102
* Best for fast verification of parts of code, testing different arguments
209103
* Doesn't cover interactions between code
210-
2. **Integration tests**: Testing protocols between components and built artifacts
211-
* Best for testing protocols and agreements between components
212-
* May not model interactions between components as they are deployed
213-
3. **End-to-end (e2e) tests**: Whole system testing including benchmarking
104+
2. **End-to-end (e2e) tests**: Whole system testing including benchmarking
214105
* Best for preventing end to end regression and verifying overall correctness
215106
* Execution can be slow
216107

217-
Strong e2e coverage is required for deployed systems to prevent performance regression. Appropriate test coverage is an important part of code review.
108+
Strong e2e coverage is required for deployed systems to prevent functional regression. Appropriate test coverage is an important part of code review.
218109

219110
## Security
220111

221-
Maintain appropriate security mindset for production serving. The project will establish a project email address for responsible disclosure of security issues that will be reviewed by the project maintainers. Prior to the first GA release we will formalize a security component and process.
112+
Maintain appropriate security mindset for production serving. The project will establish a project email address for responsible disclosure of security issues that will be reviewed by the project maintainers. Prior to the first GA release we will formalize a security component and process. More details on security can be found in the [SECURITY.md](./SECURITY.md) file.
113+
114+
## Project Structure and Ownership
115+
116+
The repository contains the following deployable components:
222117

223-
## Project Structure
118+
| Component | Language | Source | Description |
119+
|---|---|---|---|
120+
| **Dual-Pods Controller** | Go | `cmd/dual-pods-controller/`, `pkg/controller/dual-pods/` | Manages server-providing Pods in reaction to server-requesting Pods. Handles binding, sleep/wake, and readiness relay. |
121+
| **Launcher-Populator Controller** | Go | `cmd/launcher-populator/`, `pkg/controller/launcher-populator/` | Proactively creates launcher pods on nodes based on `LauncherPopulationPolicy` CRDs. |
122+
| **Requester** | Go | `cmd/requester/`, `pkg/server/requester/` | Lightweight binary running in server-requesting Pods. Exposes SPI endpoints for GPU info and readiness relay. |
123+
| **Launcher** | Python | `inference_server/launcher/` | FastAPI service managing multiple vLLM subprocess instances via REST API. |
124+
| **Test Requester** | Go | `cmd/test-requester/` | Test binary simulating a requester with GPU allocation. |
125+
| **Test Server** | Go | `cmd/test-server/` | Test binary simulating a vLLM-like inference server. |
224126

225-
The FMA repository is organized as follows:
127+
The two controllers are deployed via Helm charts in `charts/`.
226128

227129
### Core Organization (`llm-d-incubation/llm-d-fast-model-actuation`)
228130

@@ -271,7 +173,7 @@ This is an **incubating component** in the llm-d ecosystem, focused on fast mode
271173

272174
### Component Ownership
273175

274-
* **Maintainers** are listed in the [OWNERS](OWNERS) file
176+
* **Maintainers** are listed in the [OWNERS](OWNERS) file. The file follows Kubernetes conventions for future Prow compatibility but is not currently consumed by automation. Additional OWNERS files can be added per-directory as the project grows.
275177
* **Contributors** can become maintainers through consistent, quality contributions
276178
* Code ownership follows Kubernetes project conventions with OWNERS files
277179

PR_SIGNOFF.md

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -121,18 +121,7 @@ For Windows users, **Git Bash** is also highly recommended.
121121

122122
2. Press `Enter` to select the default option if prompted to set a save-file or passphrase for the key (you may choose to enter a passphrase if desired; this will prompt you to enter the passphrase every time you perform a DCO sign-off).
123123
- The following output should generate a `randomart` image
124-
3. Use the following command to copy the **public** part of the new SSH key to your clipboard:
125-
126-
```shell
127-
clip < ~/.ssh/id_ed25519.pub
128-
```
129-
130-
Note: If you are in a WSL shell, use instead
131-
132-
```shell
133-
clip.exe < ~/.ssh/id_ed25519.pub
134-
```
135-
124+
3. Copy the contents of ~/.ssh/id_ed25519.pub to your clipboard
136125
4. After copying or saving your SSH key, navigate to **Settings** in your Github.
137126
5. Navigate to the **SSH and GPG keys** page under the Access section in the sidebar.
138127
6. Under SSH keys, select **New SSH key**.

0 commit comments

Comments
 (0)