Skip to content

Commit 96e6aca

Browse files
committed
docs: add research metrics framework, reproducibility guide, and engineering standards
1 parent 132cd32 commit 96e6aca

File tree

2 files changed

+403
-11
lines changed

2 files changed

+403
-11
lines changed

CONTRIBUTING.md

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# Contributing & Engineering Standards
2+
3+
This document defines the non-negotiable quality bar for every change to this repository.
4+
Code is reviewed as if it will be read by a distributed systems expert and a security researcher simultaneously.
5+
6+
---
7+
8+
## Mandatory Workflow
9+
10+
### Before writing any code
11+
12+
1. Read **all** existing files in the affected directory
13+
2. State your plan in one paragraph — what changes, what breaks, why this approach
14+
3. Identify which existing tests will be affected
15+
16+
### For every change
17+
18+
1. Write the **test first** (TDD)
19+
2. Run existing tests — confirm they pass before your change
20+
3. Implement the code
21+
4. Run **all** tests again — zero regressions allowed
22+
5. Fix the code if a test fails — never fix the test to hide a real bug
23+
24+
### After every change
25+
26+
```bash
27+
ruff check . && ruff format --check . # Python linting + formatting
28+
mypy --strict <changed_file> # type checking
29+
bandit -r services/ # security scan
30+
```
31+
32+
A change is not done until all three pass with zero errors.
33+
34+
---
35+
36+
## Definition of Done
37+
38+
A task is **DONE** only when all boxes are checked:
39+
40+
- [ ] All new code has tests (happy path + edge case + failure case)
41+
- [ ] All existing tests still pass
42+
- [ ] `ruff check` returns 0 errors
43+
- [ ] `mypy --strict` returns 0 errors
44+
- [ ] `bandit` returns no HIGH or MEDIUM findings
45+
- [ ] No hardcoded secrets, URLs, or magic numbers
46+
- [ ] README updated if observable behaviour changed
47+
- [ ] `CHANGELOG.md` entry added
48+
49+
---
50+
51+
## Code Quality — Python
52+
53+
- **Type hints** on every function signature — no exceptions
54+
- **Docstring** on every public function explaining *why*, not *what*
55+
- **Maximum function length:** 40 lines — decompose if longer
56+
- **No bare `except:`** — always catch specific exceptions (`ValueError`, `HTTPException`, etc.)
57+
- **No hardcoded values** — use environment variables or named constants at module top
58+
- **Async functions** must have explicit timeout handling (`asyncio.wait_for` or `httpx` timeout)
59+
- **Logging levels:**
60+
- `INFO` — state changes (round completed, tenant registered)
61+
- `DEBUG` — data payloads (weight shapes, sample counts)
62+
- `ERROR` — failures requiring operator attention
63+
64+
### Example — correct style
65+
66+
```python
67+
FL_MIN_CLIENTS: int = int(os.getenv("FL_MIN_CLIENTS", "2"))
68+
AGGREGATION_TIMEOUT_S: float = float(os.getenv("FL_AGGREGATION_TIMEOUT_S", "30.0"))
69+
70+
async def aggregate_round(state: FLState) -> None:
71+
"""
72+
Run one FedAvg aggregation round over all pending client updates.
73+
74+
Rejects updates containing NaN weights before averaging to prevent
75+
gradient poisoning from a malfunctioning client.
76+
"""
77+
async with asyncio.wait_for(state.round_lock.acquire(), timeout=AGGREGATION_TIMEOUT_S):
78+
try:
79+
_validate_updates(state.pending_updates)
80+
await _fedavg(state)
81+
except ValueError as exc:
82+
logger.error("Aggregation rejected: %s", exc)
83+
raise
84+
finally:
85+
state.round_lock.release()
86+
```
87+
88+
---
89+
90+
## Code Quality — Tests
91+
92+
Every public function needs **at least 3 test cases**:
93+
94+
| Case | Description |
95+
|------|-------------|
96+
| Happy path | Normal input, expected output |
97+
| Edge case | Empty input, zero samples, boundary values |
98+
| Failure case | What happens when input is malformed or a dep fails |
99+
100+
### Naming convention
101+
102+
```
103+
test_<function>_<condition>_<expected_result>
104+
105+
# Good:
106+
test_fedavg_with_unequal_sample_sizes_weights_proportionally
107+
test_fedavg_with_all_zero_samples_raises_value_error
108+
test_submit_update_with_nan_weights_rejects_and_logs
109+
110+
# Bad:
111+
test_aggregation
112+
test_fedavg_works
113+
```
114+
115+
### Rules
116+
117+
- Use `pytest` fixtures for repeated setup — no copy-paste between tests
118+
- Mock all external services (`httpx`, `kubernetes` API) — never call real endpoints
119+
- Minimum coverage: **80 %** per file, **90 %** for `fl-coordinator/main.py`
120+
121+
---
122+
123+
## Kubernetes / Helm Standards
124+
125+
Every manifest must have:
126+
127+
- `resources.requests` **and** `resources.limits` on every container
128+
- `livenessProbe` **and** `readinessProbe` on every Deployment
129+
- `NetworkPolicy` must be explicit: deny-all default, then allow only named traffic
130+
- Secrets via `secretKeyRef` — never inline in `env[].value`
131+
- Helm values fields must have an inline comment explaining the field
132+
133+
### Example — correct NetworkPolicy
134+
135+
```yaml
136+
# Allow only fl-client pods in fl-enabled namespaces to reach the coordinator.
137+
# All other ingress is denied by the cluster-wide default-deny policy.
138+
apiVersion: networking.k8s.io/v1
139+
kind: NetworkPolicy
140+
metadata:
141+
name: fl-coordinator-ingress
142+
namespace: fl-system
143+
spec:
144+
podSelector:
145+
matchLabels:
146+
app: fl-coordinator
147+
policyTypes:
148+
- Ingress
149+
ingress:
150+
- from:
151+
- namespaceSelector:
152+
matchLabels:
153+
fl-enabled: "true"
154+
podSelector:
155+
matchLabels:
156+
component: fl-client
157+
ports:
158+
- protocol: TCP
159+
port: 8080
160+
```
161+
162+
---
163+
164+
## Federated Learning — Invariants
165+
166+
These must **never** be violated:
167+
168+
| Invariant | Enforcement |
169+
|-----------|-------------|
170+
| Coordinator stores no raw data between rounds | `pending_updates` cleared after every `_aggregate()` call |
171+
| NaN weights are rejected before aggregation | `_validate_updates()` runs before FedAvg |
172+
| Race condition on concurrent submissions is prevented | `asyncio.Lock` wraps all mutations to `pending_updates` |
173+
| Every FL round emits Prometheus metrics | `rounds_completed`, `clients_participated`, `aggregation_duration` |
174+
| Tenant ID validated against registered set | Check `tenant_id in state.registered_tenants` before accepting |
175+
176+
---
177+
178+
## Security Rules
179+
180+
- Never log the `x-fl-secret` value, even partially (`secret[:4]` is still forbidden)
181+
- Rate-limit `/submit-update`: max 10 requests/minute per `tenant_id`
182+
- Validate all inputs with Pydantic — no manual string parsing
183+
- `bandit` scan must return no HIGH or MEDIUM findings before merge
184+
185+
---
186+
187+
## Commit Message Format
188+
189+
```
190+
type(scope): short description (max 72 chars)
191+
192+
WHY this change was needed (1-3 sentences).
193+
What alternatives were considered and why this approach was chosen.
194+
195+
Fixes: #issue-number (if applicable)
196+
```
197+
198+
**Types:** `feat` · `fix` · `test` · `refactor` · `perf` · `docs` · `ci` · `chore`
199+
200+
### Examples
201+
202+
```
203+
feat(fl-coordinator): add NaN weight validation before FedAvg aggregation
204+
205+
Malformed weights from a buggy client can corrupt the global model silently.
206+
Validation rejects the update and logs an error without crashing the round.
207+
Alternative considered: clip NaN to zero — rejected because it hides client bugs.
208+
```
209+
210+
```
211+
test(fl-client): add edge case for zero-sample training round
212+
213+
FedAvg denominator is total_samples; zero samples causes division by zero.
214+
This test confirms the coordinator raises ValueError and skips aggregation.
215+
```
216+
217+
```
218+
fix(networkpolicy): restrict fl-client egress to coordinator only
219+
220+
Without this policy, fl-client pods could reach other tenant namespaces,
221+
violating the data isolation guarantee. Default-deny + explicit allow added.
222+
```
223+
224+
---
225+
226+
## Changelog Format
227+
228+
Add an entry to `CHANGELOG.md` for every non-trivial change:
229+
230+
```markdown
231+
## [Unreleased]
232+
233+
### Added
234+
- NaN weight validation in FL coordinator before FedAvg aggregation (#12)
235+
236+
### Fixed
237+
- NetworkPolicy now correctly restricts fl-client egress to fl-system only (#15)
238+
239+
### Changed
240+
- HPA CPU threshold reduced from 70% to 40% for faster burst response in experiments (#18)
241+
```

0 commit comments

Comments
 (0)