|
| 1 | +# Contributing & Engineering Standards |
| 2 | + |
| 3 | +This document defines the non-negotiable quality bar for every change to this repository. |
| 4 | +Code is reviewed as if it will be read by a distributed systems expert and a security researcher simultaneously. |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## Mandatory Workflow |
| 9 | + |
| 10 | +### Before writing any code |
| 11 | + |
| 12 | +1. Read **all** existing files in the affected directory |
| 13 | +2. State your plan in one paragraph — what changes, what breaks, why this approach |
| 14 | +3. Identify which existing tests will be affected |
| 15 | + |
| 16 | +### For every change |
| 17 | + |
| 18 | +1. Write the **test first** (TDD) |
| 19 | +2. Run existing tests — confirm they pass before your change |
| 20 | +3. Implement the code |
| 21 | +4. Run **all** tests again — zero regressions allowed |
| 22 | +5. Fix the code if a test fails — never fix the test to hide a real bug |
| 23 | + |
| 24 | +### After every change |
| 25 | + |
| 26 | +```bash |
| 27 | +ruff check . && ruff format --check . # Python linting + formatting |
| 28 | +mypy --strict <changed_file> # type checking |
| 29 | +bandit -r services/ # security scan |
| 30 | +``` |
| 31 | + |
| 32 | +A change is not done until all three pass with zero errors. |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## Definition of Done |
| 37 | + |
| 38 | +A task is **DONE** only when all boxes are checked: |
| 39 | + |
| 40 | +- [ ] All new code has tests (happy path + edge case + failure case) |
| 41 | +- [ ] All existing tests still pass |
| 42 | +- [ ] `ruff check` returns 0 errors |
| 43 | +- [ ] `mypy --strict` returns 0 errors |
| 44 | +- [ ] `bandit` returns no HIGH or MEDIUM findings |
| 45 | +- [ ] No hardcoded secrets, URLs, or magic numbers |
| 46 | +- [ ] README updated if observable behaviour changed |
| 47 | +- [ ] `CHANGELOG.md` entry added |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## Code Quality — Python |
| 52 | + |
| 53 | +- **Type hints** on every function signature — no exceptions |
| 54 | +- **Docstring** on every public function explaining *why*, not *what* |
| 55 | +- **Maximum function length:** 40 lines — decompose if longer |
| 56 | +- **No bare `except:`** — always catch specific exceptions (`ValueError`, `HTTPException`, etc.) |
| 57 | +- **No hardcoded values** — use environment variables or named constants at module top |
| 58 | +- **Async functions** must have explicit timeout handling (`asyncio.wait_for` or `httpx` timeout) |
| 59 | +- **Logging levels:** |
| 60 | + - `INFO` — state changes (round completed, tenant registered) |
| 61 | + - `DEBUG` — data payloads (weight shapes, sample counts) |
| 62 | + - `ERROR` — failures requiring operator attention |
| 63 | + |
| 64 | +### Example — correct style |
| 65 | + |
| 66 | +```python |
| 67 | +FL_MIN_CLIENTS: int = int(os.getenv("FL_MIN_CLIENTS", "2")) |
| 68 | +AGGREGATION_TIMEOUT_S: float = float(os.getenv("FL_AGGREGATION_TIMEOUT_S", "30.0")) |
| 69 | + |
| 70 | +async def aggregate_round(state: FLState) -> None: |
| 71 | + """ |
| 72 | + Run one FedAvg aggregation round over all pending client updates. |
| 73 | +
|
| 74 | + Rejects updates containing NaN weights before averaging to prevent |
| 75 | + gradient poisoning from a malfunctioning client. |
| 76 | + """ |
| 77 | + async with asyncio.wait_for(state.round_lock.acquire(), timeout=AGGREGATION_TIMEOUT_S): |
| 78 | + try: |
| 79 | + _validate_updates(state.pending_updates) |
| 80 | + await _fedavg(state) |
| 81 | + except ValueError as exc: |
| 82 | + logger.error("Aggregation rejected: %s", exc) |
| 83 | + raise |
| 84 | + finally: |
| 85 | + state.round_lock.release() |
| 86 | +``` |
| 87 | + |
| 88 | +--- |
| 89 | + |
| 90 | +## Code Quality — Tests |
| 91 | + |
| 92 | +Every public function needs **at least 3 test cases**: |
| 93 | + |
| 94 | +| Case | Description | |
| 95 | +|------|-------------| |
| 96 | +| Happy path | Normal input, expected output | |
| 97 | +| Edge case | Empty input, zero samples, boundary values | |
| 98 | +| Failure case | What happens when input is malformed or a dep fails | |
| 99 | + |
| 100 | +### Naming convention |
| 101 | + |
| 102 | +``` |
| 103 | +test_<function>_<condition>_<expected_result> |
| 104 | +
|
| 105 | +# Good: |
| 106 | +test_fedavg_with_unequal_sample_sizes_weights_proportionally |
| 107 | +test_fedavg_with_all_zero_samples_raises_value_error |
| 108 | +test_submit_update_with_nan_weights_rejects_and_logs |
| 109 | +
|
| 110 | +# Bad: |
| 111 | +test_aggregation |
| 112 | +test_fedavg_works |
| 113 | +``` |
| 114 | + |
| 115 | +### Rules |
| 116 | + |
| 117 | +- Use `pytest` fixtures for repeated setup — no copy-paste between tests |
| 118 | +- Mock all external services (`httpx`, `kubernetes` API) — never call real endpoints |
| 119 | +- Minimum coverage: **80 %** per file, **90 %** for `fl-coordinator/main.py` |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +## Kubernetes / Helm Standards |
| 124 | + |
| 125 | +Every manifest must have: |
| 126 | + |
| 127 | +- `resources.requests` **and** `resources.limits` on every container |
| 128 | +- `livenessProbe` **and** `readinessProbe` on every Deployment |
| 129 | +- `NetworkPolicy` must be explicit: deny-all default, then allow only named traffic |
| 130 | +- Secrets via `secretKeyRef` — never inline in `env[].value` |
| 131 | +- Helm values fields must have an inline comment explaining the field |
| 132 | + |
| 133 | +### Example — correct NetworkPolicy |
| 134 | + |
| 135 | +```yaml |
| 136 | +# Allow only fl-client pods in fl-enabled namespaces to reach the coordinator. |
| 137 | +# All other ingress is denied by the cluster-wide default-deny policy. |
| 138 | +apiVersion: networking.k8s.io/v1 |
| 139 | +kind: NetworkPolicy |
| 140 | +metadata: |
| 141 | + name: fl-coordinator-ingress |
| 142 | + namespace: fl-system |
| 143 | +spec: |
| 144 | + podSelector: |
| 145 | + matchLabels: |
| 146 | + app: fl-coordinator |
| 147 | + policyTypes: |
| 148 | + - Ingress |
| 149 | + ingress: |
| 150 | + - from: |
| 151 | + - namespaceSelector: |
| 152 | + matchLabels: |
| 153 | + fl-enabled: "true" |
| 154 | + podSelector: |
| 155 | + matchLabels: |
| 156 | + component: fl-client |
| 157 | + ports: |
| 158 | + - protocol: TCP |
| 159 | + port: 8080 |
| 160 | +``` |
| 161 | +
|
| 162 | +--- |
| 163 | +
|
| 164 | +## Federated Learning — Invariants |
| 165 | +
|
| 166 | +These must **never** be violated: |
| 167 | +
|
| 168 | +| Invariant | Enforcement | |
| 169 | +|-----------|-------------| |
| 170 | +| Coordinator stores no raw data between rounds | `pending_updates` cleared after every `_aggregate()` call | |
| 171 | +| NaN weights are rejected before aggregation | `_validate_updates()` runs before FedAvg | |
| 172 | +| Race condition on concurrent submissions is prevented | `asyncio.Lock` wraps all mutations to `pending_updates` | |
| 173 | +| Every FL round emits Prometheus metrics | `rounds_completed`, `clients_participated`, `aggregation_duration` | |
| 174 | +| Tenant ID validated against registered set | Check `tenant_id in state.registered_tenants` before accepting | |
| 175 | + |
| 176 | +--- |
| 177 | + |
| 178 | +## Security Rules |
| 179 | + |
| 180 | +- Never log the `x-fl-secret` value, even partially (`secret[:4]` is still forbidden) |
| 181 | +- Rate-limit `/submit-update`: max 10 requests/minute per `tenant_id` |
| 182 | +- Validate all inputs with Pydantic — no manual string parsing |
| 183 | +- `bandit` scan must return no HIGH or MEDIUM findings before merge |
| 184 | + |
| 185 | +--- |
| 186 | + |
| 187 | +## Commit Message Format |
| 188 | + |
| 189 | +``` |
| 190 | +type(scope): short description (max 72 chars) |
| 191 | + |
| 192 | +WHY this change was needed (1-3 sentences). |
| 193 | +What alternatives were considered and why this approach was chosen. |
| 194 | + |
| 195 | +Fixes: #issue-number (if applicable) |
| 196 | +``` |
| 197 | +
|
| 198 | +**Types:** `feat` · `fix` · `test` · `refactor` · `perf` · `docs` · `ci` · `chore` |
| 199 | +
|
| 200 | +### Examples |
| 201 | +
|
| 202 | +``` |
| 203 | +feat(fl-coordinator): add NaN weight validation before FedAvg aggregation |
| 204 | + |
| 205 | +Malformed weights from a buggy client can corrupt the global model silently. |
| 206 | +Validation rejects the update and logs an error without crashing the round. |
| 207 | +Alternative considered: clip NaN to zero — rejected because it hides client bugs. |
| 208 | +``` |
| 209 | +
|
| 210 | +``` |
| 211 | +test(fl-client): add edge case for zero-sample training round |
| 212 | + |
| 213 | +FedAvg denominator is total_samples; zero samples causes division by zero. |
| 214 | +This test confirms the coordinator raises ValueError and skips aggregation. |
| 215 | +``` |
| 216 | +
|
| 217 | +``` |
| 218 | +fix(networkpolicy): restrict fl-client egress to coordinator only |
| 219 | + |
| 220 | +Without this policy, fl-client pods could reach other tenant namespaces, |
| 221 | +violating the data isolation guarantee. Default-deny + explicit allow added. |
| 222 | +``` |
| 223 | +
|
| 224 | +--- |
| 225 | +
|
| 226 | +## Changelog Format |
| 227 | +
|
| 228 | +Add an entry to `CHANGELOG.md` for every non-trivial change: |
| 229 | +
|
| 230 | +```markdown |
| 231 | +## [Unreleased] |
| 232 | +
|
| 233 | +### Added |
| 234 | +- NaN weight validation in FL coordinator before FedAvg aggregation (#12) |
| 235 | +
|
| 236 | +### Fixed |
| 237 | +- NetworkPolicy now correctly restricts fl-client egress to fl-system only (#15) |
| 238 | +
|
| 239 | +### Changed |
| 240 | +- HPA CPU threshold reduced from 70% to 40% for faster burst response in experiments (#18) |
| 241 | +``` |
0 commit comments