Skip to content

Commit 1a6bc2e

Browse files
committed
research: iris controller dry-run mode analysis
Analyze controller startup flow, scheduling loop, worker sync, autoscaler, and checkpoint systems. Identify all side-effectful operations that need gating for a --dry-run flag.
1 parent b140fd2 commit 1a6bc2e

1 file changed

Lines changed: 177 additions & 0 deletions

File tree

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
# Iris Controller Dry-Run Mode: Codebase Analysis
2+
3+
## Controller Startup Flow
4+
5+
### Entry point: `lib/iris/src/iris/cluster/controller/main.py`
6+
7+
The controller daemon is a Click CLI:
8+
9+
```
10+
cli -> serve command (main.py:34-239)
11+
```
12+
13+
`serve` takes `--host`, `--port`, `--scheduler-interval`, `--config`, `--log-level`, `--checkpoint-path`, `--checkpoint-interval`.
14+
15+
Startup sequence:
16+
1. Load cluster config from YAML (`load_config()`) — `main.py:81`
17+
2. Resolve `remote_state_dir` and `local_state_dir``main.py:88-100`
18+
3. Restore or create local SQLite DB from checkpoint — `main.py:106-121`
19+
4. Create provider via `make_provider()``main.py:124`
20+
5. Create autoscaler (if not K8sTaskProvider) — `main.py:128-165`
21+
6. Create `ControllerConfig` dataclass — `main.py:182-193`
22+
7. Instantiate `Controller(config, provider, autoscaler, db)``main.py:196`
23+
8. Call `controller.start()``main.py:208`
24+
9. Register SIGTERM/SIGINT handler that checkpoints + stops — `main.py:218-237`
25+
10. Block on `stop_event.wait()``main.py:239`
26+
27+
### Controller.__init__: `controller.py:712-799`
28+
29+
Creates: `ControllerDB`, `LogStore`, `ControllerTransitions`, `Scheduler`, `BundleStore`, `ControllerServiceImpl` (RPC), `ControllerDashboard` (HTTP + dashboard UI).
30+
31+
### Controller.start(): `controller.py:832-871`
32+
33+
Spawns background threads depending on provider type:
34+
35+
**Worker provider path** (standard):
36+
- `_run_scheduling_loop` thread — task assignment
37+
- `_run_provider_loop` thread — heartbeats/sync with workers
38+
- `_run_profile_loop` thread — periodic CPU profiling
39+
- `_run_autoscaler_loop` thread (if autoscaler configured)
40+
- uvicorn server thread (dashboard + RPC)
41+
42+
**K8sTaskProvider path** (direct):
43+
- `_run_direct_provider_loop` thread — combined scheduling + sync
44+
- uvicorn server thread
45+
46+
## Main Loop: Scheduling (`_run_scheduling_loop`)
47+
48+
Location: `controller.py:921-936`
49+
50+
Calls `_run_scheduling()` (`controller.py:1202-1315`) each cycle. This is the core scheduling function:
51+
52+
1. **Read reservation claims** from DB — `controller.py:1224`
53+
2. **Cleanup stale claims** (dead workers, finished jobs) — `controller.py:1225`
54+
3. **Claim workers for reservations**`controller.py:1226``_claim_workers_for_reservations()` at `controller.py:1151`
55+
4. **Read pending tasks** via `_schedulable_tasks()``controller.py:1232`
56+
5. **Read healthy workers**`controller.py:1233`
57+
6. **Filter tasks**: check scheduling deadlines, reservation gates, per-job caps — `controller.py:1250-1273`
58+
7. **Inject reservation taints** on workers — `controller.py:1281-1282`
59+
8. **Create SchedulingContext**`controller.py:1286-1291`
60+
9. **Phase 1**: Preference pass (steer reservation tasks to claimed workers) — `controller.py:1295`
61+
10. **Phase 2**: Normal scheduler `find_assignments()``controller.py:1298`
62+
11. **Buffer assignments**`_buffer_assignments()``transitions.queue_assignments()``controller.py:1303`
63+
12. **Cache diagnostics** for unassigned jobs — `controller.py:1315`
64+
65+
### Where task assignment happens
66+
67+
- `_buffer_assignments()` at `controller.py:1359-1367`: calls `transitions.queue_assignments(command)` which writes ASSIGNED state to DB and enqueues dispatch batches
68+
- `transitions.queue_assignments()` at `transitions.py:840`: the actual DB mutation
69+
70+
## Provider Sync / Heartbeat Loop (`_run_provider_loop`)
71+
72+
Location: `controller.py:972-987`
73+
74+
Calls `_sync_all_execution_units()` (`controller.py:1454-1553`):
75+
76+
1. **Reap stale workers**`controller.py:1460`
77+
2. **Drain dispatch batches** for all healthy workers — `transitions.drain_dispatch_all()` at `controller.py:1464`
78+
3. **Provider.sync(batches)** — sends RPCs to workers (start tasks, heartbeat, collect status) — `controller.py:1470`
79+
4. **Apply results**: `transitions.apply_heartbeats_batch()` for successes, `transitions.fail_heartbeat()` for failures — `controller.py:1475-1527`
80+
5. **Kill tasks** on workers if needed — `controller.py:1529-1530`
81+
6. **Notify autoscaler** of worker failures — `controller.py:1501-1523`
82+
83+
### Direct provider path (`_sync_direct_provider`)
84+
85+
Location: `controller.py:1005-1019`
86+
87+
For K8sTaskProvider:
88+
1. `transitions.drain_for_direct_provider()` — gets tasks to run, running tasks, tasks to kill
89+
2. `provider.sync(batch)` — applies to K8s
90+
3. `transitions.apply_direct_provider_updates()` — applies results
91+
92+
## Autoscaler Loop (`_run_autoscaler_loop`)
93+
94+
Location: `controller.py:952-970`
95+
96+
Calls `_run_autoscaler_once()` (`controller.py:1555-1572`):
97+
1. Build worker status map — `controller.py:1563`
98+
2. `autoscaler.refresh(worker_status_map)` — probes cloud API for VM status
99+
3. Compute demand entries — `controller.py:1566-1571`
100+
4. `autoscaler.update(demand_entries)` — scales up/down VMs
101+
102+
Also runs periodic checkpoints if configured — `controller.py:966-970`.
103+
104+
## Checkpoint/Archive System
105+
106+
- `write_checkpoint()` at `checkpoint.py:76-130`: SQLite hot-backup → upload to `{remote_state_dir}/controller-state/{epoch_ms}/`
107+
- `download_checkpoint_to_local()` at `checkpoint.py:170-218`: find latest checkpoint, download to local DB dir
108+
- `begin_checkpoint()` at `controller.py:1587-1603`: sets `_checkpoint_in_progress=True`, acquires heartbeat lock, writes checkpoint, unsets flag
109+
- Periodic checkpoint runs in autoscaler loop — `controller.py:966-970`
110+
- Atexit checkpoint — `controller.py:913-919`
111+
112+
## Recommended Approach for --dry-run Flag
113+
114+
### What dry-run should do
115+
116+
Show what the controller *would* do without actually doing it. The controller should:
117+
- ✅ Load config, restore DB, create all objects normally
118+
- ✅ Run the scheduling loop to compute assignments
119+
- ✅ Probe workers (heartbeats) to discover cluster state
120+
- ❌ NOT dispatch task assignments to workers (suppress `transitions.queue_assignments()`)
121+
- ❌ NOT start/stop VMs via autoscaler (`autoscaler.update()` should be read-only or skipped)
122+
- ❌ NOT kill tasks on workers
123+
- ❌ NOT write checkpoints (state hasn't changed)
124+
- ❌ NOT modify reservation claims in DB
125+
- ✅ Log what would happen (assignments, autoscaler decisions)
126+
127+
### Where to add the flag
128+
129+
1. **CLI**: Add `--dry-run` to the `serve` command in `main.py:34`.
130+
131+
2. **ControllerConfig**: Add `dry_run: bool = False` field at `controller.py:604`.
132+
133+
3. **Controller**: Gate side-effectful operations on `self._config.dry_run`:
134+
135+
| Method | Location | What to gate |
136+
|--------|----------|-------------|
137+
| `_buffer_assignments()` | `controller.py:1359` | Skip `transitions.queue_assignments()`, log assignments instead |
138+
| `_claim_workers_for_reservations()` | `controller.py:1151` | Skip `transitions.replace_reservation_claims()`, log claims |
139+
| `_cleanup_stale_claims()` | `controller.py:1119` | Skip DB writes |
140+
| `kill_tasks_on_workers()` | `controller.py:1392` | Skip `buffer_kill`/`buffer_direct_kill` |
141+
| `_mark_task_unschedulable()` | `controller.py:1369` | Skip `transitions.mark_task_unschedulable()` |
142+
| `_sync_all_execution_units()` | `controller.py:1454` | Run heartbeats for probing but skip applying dispatch; or skip entirely |
143+
| `_sync_direct_provider()` | `controller.py:1005` | Skip `provider.sync()` |
144+
| `_run_autoscaler_once()` | `controller.py:1555` | Skip `autoscaler.update()` (scale decisions), keep `autoscaler.refresh()` for status |
145+
| `begin_checkpoint()` | `controller.py:1587` | Skip entirely |
146+
| `_maybe_prune()` | `controller.py:938` | Skip entirely |
147+
148+
4. **Logging in dry-run mode**: In `_run_scheduling()`, after computing `all_assignments`, log them:
149+
```python
150+
if self._config.dry_run:
151+
for task_id, worker_id in all_assignments:
152+
logger.info("[DRY-RUN] Would assign %s -> %s", task_id, worker_id)
153+
return
154+
```
155+
156+
### Alternative: read-only DB wrapper
157+
158+
Instead of scattering `if dry_run` checks, wrap `ControllerTransitions` with a subclass that logs mutations but doesn't execute them. This is cleaner but requires more upfront work since `ControllerTransitions` is not designed for composition.
159+
160+
### Simplest viable approach
161+
162+
The easiest path with minimal code changes:
163+
164+
1. Add `dry_run: bool` to `ControllerConfig`
165+
2. In `_run_scheduling()` at line 1301: if dry_run, log assignments and return early instead of calling `_buffer_assignments()`
166+
3. In `_claim_workers_for_reservations()`: if dry_run, skip the `transitions.replace_reservation_claims()` call
167+
4. In `_run_autoscaler_once()`: skip `autoscaler.update()`
168+
5. In `_sync_all_execution_units()`: skip the `provider.sync()` call (no RPCs to workers)
169+
6. Skip checkpoints and pruning
170+
171+
This keeps the scheduling logic running normally (reads are fine) but suppresses all writes and RPCs. The dashboard + RPC server still runs so you can inspect state.
172+
173+
### Open questions
174+
175+
1. Should dry-run mode still accept new job submissions via RPC? Probably yes for testing, but the jobs would never be dispatched.
176+
2. Should dry-run mode replay from a checkpoint without probing workers? This would be useful for offline analysis ("what would the controller do with this checkpoint?").
177+
3. Should the autoscaler compute and *log* demand without acting? Or skip entirely?

0 commit comments

Comments
 (0)