Skip to content

Commit 7b1b2a2

Browse files
authored
Complete reworking of client/server architecture to setup inference with the Raspberry Pi (#15)
This pull request introduces a simplified structure to the code base. There is an improved web app for the client. Web socket connections between the client's server and the inference server. Things are logged properly now. Better configurations through YAML files. Moved away from this whole memory buffer architecture to go for a simpler approach. Now we're abstracting away with a job manager to handle how jobs are called, how they are triggered, and how they log their output back to the client. There is also HTTPS enabled so that the Raspberry Pi's webserver can let the user use their own camera to do inference on. In general, code in place was simplified, to simpler architecture and configuration to avoid debugging hassle. The following is a detailed view of the commits that made this possible! * Adapting Client: Add inference results display to the client Implements a WebSocket endpoint on the client to receive and display inference results from the server. This includes: - A new "/results" WebSocket route in the client web app - Updates to the client-side JavaScript to connect to the new endpoint and display results in a dedicated section - Modification of the StreamingClient to use a callback to store latest inference results. - Updates the default ports of client and server to avoid collision. * Tweaks to improve server architecture * Small linter changes * Client Refactor: Improved logging calls Uses the logger instead of print Additionally, the client frontend should reflect the logs correctly * Better logging management server-side Logging is now more streamlined, and allows for better configuration of these outputs to the client. Also encompasses better logging like GPU metrics and more. * Implement flexible job system with frame collection and video inference Added a multi-job architecture to support frame buffering and batch inference: - there is job configuration, to allow per job to configure settings. - A Job factory is used to instantiate a new job in a type-safe way. - Implemented WIP FrameCollectionJob and VideoInferenceJob - Added a Job control API to start, delete and retrieve on-going jobs - Also decoupled the websocket job creation to allow for any jobs. - Polymorphic implementation to handle responses from jobs * Centralized configuration with YAML files Introduced a centralized YAML configuration for both serving and training settings. Using YAML to configure, and Pydantic to load, removing the previous `src/iris/server/config.py` as it is replaced by the new system. Also updated frame collection job defaults for 1 FPS streaming, adjusting trigger conditions and frame skipping. Finally, added memory buffer initialization in the server, enabling it if configured. * Fixes on the client, for better display, and camera control * Optimizations to run on pi * typos and tiny fixes to pyproject.toml * HTTPS self cert * Implement SSH tunneling from the client * Fixes fo UI tunnel connection, and server shutdown * Remove setup for memory buffer, not pursuing that anymore * Simplified Job classes and more flexible trigger system for launching them TriggerConfig.should_trigger() encapsulates trigger logic. Jobs manage their own triggers (more responsive than centralized polling). Unified FrameCollectionJob and VideoInferenceJob into one VideoJob class Also input some logging changes * Improved logging for job manager, and API triggering of jobs * Simplified Job creation and configuration Also documentation in the README. This commit simplifies the codebase and the amount of variables to configure for job creation * Small tweaks to improve how the server runs * Fixes to client logging * Optimizations to the config of settings * torch_dtype is deprecated so using dtype * Fix circular imports for configurations in server * Improved results from inference in client * Dataset prep files * Fixes to let client and server run together better * More fixes hopefully for the inference results to display * Even more fixes, this is tiring
1 parent 41d6bb7 commit 7b1b2a2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+11582
-2816
lines changed

.gitignore

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
data/**/*.mp4
2-
data/**/*.json
3-
data/**/*.txt
4-
data/**/*.csv
1+
data/**/*
52
models/**/*.pth
63
!**/.gitkeep
4+
uv.lock
5+
6+
benchmark*.json
7+
8+
.claude/
79

810
# Created by https://www.toptal.com/developers/gitignore/api/vim,latex,linux,macos,synology,jetbrains+all,visualstudiocode,python,jupyternotebooks
911
# Edit at https://www.toptal.com/developers/gitignore?templates=vim,latex,linux,macos,synology,jetbrains+all,visualstudiocode,python,jupyternotebooks

.vscode/settings.json

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,17 +31,20 @@
3131
"reportUndefinedVariable": "error",
3232
"reportMissingImports": "warning"
3333
},
34+
// Disable old linting system
35+
"python.linting.enabled": false,
36+
"python.linting.pylintEnabled": false,
3437
// Python Formatting with Ruff
3538
"[python]": {
3639
"editor.formatOnSave": true,
3740
"editor.defaultFormatter": "charliermarsh.ruff",
3841
"editor.codeActionsOnSave": {
39-
"source.fixAll": "explicit",
40-
"source.organizeImports": "explicit"
42+
"source.fixAll.ruff": "explicit",
43+
"source.organizeImports.ruff": "explicit"
4144
}
4245
},
4346
// Ruff Settings
4447
"ruff.nativeServer": "on",
4548
// Other Extensions
46-
"evenBetterToml.schema.enabled": false,
49+
"evenBetterToml.schema.enabled": false
4750
}

README.md

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,249 @@
33
A research project done with the AI Team by Myriam Benlamri (Data Science MSc 2nd year) and Marcus Hamelink (Computer Science BSc 3rd year) as a collaborative research semester project.
44

55
More info at [https://epflaiteam.ch/projects/iris](https://epflaiteam.ch/projects/iris)
6+
7+
8+
## Set up
9+
10+
### Client
11+
12+
On Your Raspberry Pi, generate the self-signed certificate:
13+
```bash
14+
mkdir -p ~/iris-certs
15+
cd ~/iris-certs
16+
openssl req -x509 -newkey rsa:4096 -nodes \
17+
-keyout key.pem \
18+
-out cert.pem \
19+
-days 365 \
20+
-subj "/C=US/ST=State/L=City/O=Organization/CN=$(hostname -I | awk '{print $1}')"
21+
```
22+
23+
To use HTTPS, make sure to specify `use_ssl=true` under `client` in `config.yaml`. In that case, make sure to connect to the HTTPS IP.
24+
25+
Run `uv run iris-client` to start a client instance.
26+
27+
28+
### Server
29+
30+
```bash
31+
uv sync
32+
uv sync --group server
33+
34+
```
35+
36+
#### For training (unsloth)
37+
38+
```bash
39+
uv pip install unsloth
40+
```
41+
42+
#### Running the pipeline
43+
44+
Run `uv run iris-server` to start a server instance.
45+
46+
47+
## Job System
48+
49+
IRIS uses a flexible job system for managing inference tasks. Jobs can be started via API and triggered in multiple ways.
50+
51+
### Job Types
52+
53+
**1. SingleFrameJob**
54+
- Processes each incoming frame individually with VLM
55+
- Useful for: real-time inference, testing, continuous monitoring
56+
- Trigger: Automatic on every frame (or every Nth frame with `frame_skip`)
57+
58+
**2. VideoJob**
59+
- Collects frames in buffer, then processes batch with video-aware VLM
60+
- Useful for: temporal understanding, action recognition, video summarization
61+
- Supports three trigger modes: periodic (automatic), manual (API), and disabled (job-to-job)
62+
63+
### Trigger Modes
64+
65+
VideoJob supports three triggering modes via the `trigger_mode` parameter:
66+
67+
**PERIODIC (Automatic)**
68+
- Automatically triggers inference when buffer reaches `buffer_size` frames
69+
- After inference, keeps last `overlap_frames` for temporal continuity
70+
- Configuration example:
71+
```python
72+
{
73+
"job_type": "video",
74+
"trigger_mode": "periodic",
75+
"buffer_size": 8,
76+
"overlap_frames": 4
77+
}
78+
```
79+
**Use case:** Continuous video analysis (e.g., Qwen2.5-VL logging)
80+
81+
**MANUAL (API-triggered)**
82+
- Buffers frames but only triggers via API call: `POST /jobs/{job_id}/trigger`
83+
- No overlap - buffer clears after each trigger
84+
- Configuration example:
85+
```python
86+
{
87+
"job_type": "video",
88+
"trigger_mode": "manual",
89+
"buffer_size": 1
90+
}
91+
```
92+
**Use case:** On-demand analysis (e.g., colony counter when user clicks)
93+
94+
**DISABLED (Buffering Only)**
95+
- Accepts and buffers frames but never processes them
96+
- For future use or conditional triggering
97+
- Configuration example:
98+
```python
99+
{
100+
"job_type": "video",
101+
"trigger_mode": "disabled",
102+
"buffer_size": 8
103+
}
104+
```
105+
**Use case:** Placeholder for future YOLO integration
106+
107+
### Auto-Started VideoJob
108+
109+
When a client connects to `/ws/stream`, a VideoJob is automatically created for that connection:
110+
- Job ID: Unique per connection (e.g., `video_job_a3f7b2c1`)
111+
- Mode: PERIODIC
112+
- Buffer: 8 frames (configurable in `config.yaml`)
113+
- Overlap: 4 frames - 50% overlap for temporal continuity
114+
- Cleanup: Automatically stopped and removed when WebSocket disconnects
115+
116+
**No manual job creation needed - just start streaming!**
117+
118+
You can configure defaults in `config.yaml`:
119+
```yaml
120+
jobs:
121+
video:
122+
trigger_mode: "periodic"
123+
buffer_size: 8
124+
overlap_frames: 4
125+
```
126+
127+
### API Endpoints
128+
129+
**Start a job:**
130+
```bash
131+
POST /jobs/start
132+
Content-Type: application/json
133+
134+
{
135+
"job_type": "video",
136+
"prompt": "Describe what you see in the video.",
137+
"trigger_mode": "periodic",
138+
"buffer_size": 8,
139+
"overlap_frames": 4
140+
}
141+
```
142+
143+
**Manually trigger inference:**
144+
```bash
145+
POST /jobs/{job_id}/trigger
146+
```
147+
148+
**Get job status:**
149+
```bash
150+
GET /jobs/{job_id}/status
151+
```
152+
153+
**List active jobs:**
154+
```bash
155+
GET /jobs/active
156+
```
157+
158+
**Stop a job:**
159+
```bash
160+
POST /jobs/{job_id}/stop
161+
```
162+
163+
### WebSocket Logging
164+
165+
Jobs send progress logs via WebSocket (`/ws/stream`):
166+
167+
```json
168+
{
169+
"type": "log",
170+
"job_id": "video-abc123",
171+
"message": "Buffered frame 3/5",
172+
"timestamp": 1234567890.123
173+
}
174+
```
175+
176+
Results are also sent via WebSocket:
177+
178+
```json
179+
{
180+
"type": "result",
181+
"job_id": "video-abc123",
182+
"job_type": "video",
183+
"status": "completed",
184+
"result": "..."
185+
}
186+
```
187+
188+
### Job Orchestration
189+
190+
Jobs can launch other jobs during execution, enabling conditional workflows:
191+
192+
```python
193+
class YOLOVideoJob(VideoJob):
194+
async def _run_inference(self):
195+
# Run YOLO detection
196+
detections = await self._run_yolo(self.frame_buffer)
197+
198+
# If object detected, launch VLM job
199+
if detections["confidence"] > 0.5:
200+
vlm_config = VideoJobConfig(
201+
prompt="Describe what the detected object is doing.",
202+
trigger=TriggerConfig(mode=TriggerMode.DISABLED)
203+
)
204+
vlm_job = self.job_factory.create_job(vlm_config, ...)
205+
await self.queue.submit(vlm_job)
206+
```
207+
208+
### Multi-GPU Support
209+
210+
Set `server.num_workers` in `config.yaml` to utilize multiple GPUs:
211+
212+
```yaml
213+
server:
214+
num_workers: 2 # Uses 2 GPUs in round-robin
215+
```
216+
217+
Workers are automatically assigned to GPUs: `worker_id % device_count`.
218+
219+
### Video Inference Notes
220+
221+
**TODO:** The current VideoJob implementation processes only the first frame as a placeholder. Proper video inference requires exploring Qwen2.5-VL's video prompt template, which may support native video input with special tokens for temporal understanding.
222+
223+
See `src/iris/vlm/inference/queue/jobs.py:VideoJob._sync_inference()` for implementation details.
224+
225+
## Workflow with Izar
226+
227+
This supposee
228+
229+
### On Izar
230+
```
231+
cd /path/to/IRIS
232+
Sinteract -t 00:20:00 -g gpu:1 -m 32G -q team-ai
233+
hostname
234+
./run_iris.sh
235+
```
236+
237+
### On Personal machine
238+
239+
**Terminal 1**
240+
```
241+
uv run iris-client
242+
```
243+
244+
**Terminal 2**
245+
```
246+
ssh -N -L 8005:[RUN hostname ON NODE TO SEE]:8001 EPFL-USERNAME@izar.hpc.epfl.ch
247+
```
248+
249+
Then go to http://localhost:8006
250+
251+
Important, make sure to modify the hostname

config.yaml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# IRIS Configuration
2+
# Override via environment variables: IRIS_SERVER__PORT=8002
3+
4+
server:
5+
model_id: "unsloth/Qwen2-VL-7B-Instruct-bnb-4bit" # Direct model selection (qwen2.5-7b)
6+
vlm_hardware: null # Optional: Hardware profile (v100, mac, etc.)
7+
max_queue_size: 10
8+
num_workers: 1
9+
host: "0.0.0.0"
10+
port: 8001
11+
graceful_shutdown_timeout: 30.0
12+
enable_log_streaming: true
13+
log_streaming_min_level: "INFO"
14+
enable_metrics: true
15+
16+
jobs:
17+
video:
18+
trigger_mode: "periodic" # "periodic" | "manual" | "disabled"
19+
buffer_size: 8
20+
overlap_frames: 4
21+
22+
client:
23+
video:
24+
width: 640
25+
height: 480
26+
fps: 10
27+
jpeg_quality: 80
28+
camera_index: 0
29+
server:
30+
host: "localhost"
31+
port: 8001
32+
use_ssl: false
33+
web:
34+
host: "0.0.0.0"
35+
port: 8006
36+
use_ssl: true
37+
cert_dir: "~/iris-certs" # Directory containing key.pem and cert.pem
38+
ssh_tunnel:
39+
enabled: false # Toggle on for IZAR HPC
40+
ssh_host: "izar.hpc.epfl.ch"
41+
ssh_user: "mhamelin"
42+
ssh_key_path: "~/.ssh/id_rsa"
43+
remote_host: "" # Set via UI or config (IZAR compute node hostname)

configs/vlm/hardware/mac.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Apple Silicon (M1/M2/M3) optimization
2+
# Uses Metal Performance Shaders (MPS) for GPU acceleration
3+
4+
model:
5+
dtype: "float16" # MPS works well with float16
6+
attn_implementation: "sdpa"
7+
low_cpu_mem_usage: true
8+
9+
quantization:
10+
load_in_8bit: false
11+
load_in_4bit: false
12+
13+
device: "mps" # Metal Performance Shaders for Apple Silicon

configs/vlm/hardware/v100.yaml

Lines changed: 10 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,13 @@
1-
training:
2-
# batch_size: 8
3-
# gradient_accumulation_steps: 4 # Effective batch = 32
1+
# V100 GPU optimization (16GB VRAM)
2+
# For inference use on IZAR cluster
43

5-
# model:
6-
# torch_dtype: "float16"
4+
model:
5+
dtype: "float16" # V100 doesn't support bfloat16
6+
attn_implementation: "sdpa" # V100 doesn't support flash_attention_2
7+
low_cpu_mem_usage: true
78

8-
# accelerate:
9-
# use_accelerate: true
10-
# mixed_precision: "fp16" # V100 supports fp16, NOT bf16
11-
# gradient_checkpointing: true
9+
quantization:
10+
load_in_8bit: false # V100 has enough VRAM for float16
11+
load_in_4bit: false
1212

13-
# peft:
14-
# use_peft: true
15-
# peft_method: "lora"
16-
# r: 8
17-
# alpha: 16
18-
# dropout: 0.1
19-
20-
# quantization:
21-
# load_in_4bit: true
22-
# bnb_4bit_quant_type: "nf4"
23-
# bnb_4bit_compute_dtype: "float16"
24-
25-
# device: "cuda"
13+
device: "auto"

configs/vlm/serve.yaml

Lines changed: 0 additions & 12 deletions
This file was deleted.

0 commit comments

Comments
 (0)