EPFL-AI-Team
diff --git a/‎.gitignore‎
Lines changed: 6 additions & 4 deletions b/‎.gitignore‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎.vscode/settings.json‎
Lines changed: 6 additions & 3 deletions b/‎.vscode/settings.json‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎README.md‎
Lines changed: 246 additions & 0 deletions b/‎README.md‎
Lines changed: 246 additions & 0 deletions
diff --git a/‎config.yaml‎
Lines changed: 43 additions & 0 deletions b/‎config.yaml‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎configs/vlm/hardware/mac.yaml‎
Lines changed: 13 additions & 0 deletions b/‎configs/vlm/hardware/mac.yaml‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎configs/vlm/hardware/v100.yaml‎
Lines changed: 10 additions & 22 deletions b/‎configs/vlm/hardware/v100.yaml‎
Lines changed: 10 additions & 22 deletions
diff --git a/‎configs/vlm/serve.yaml‎
Lines changed: 0 additions & 12 deletions b/‎configs/vlm/serve.yaml‎
Lines changed: 0 additions & 12 deletions
@@ -1,9 +1,11 @@
-data/**/*.mp4
-data/**/*.json
-data/**/*.txt
-data/**/*.csv
+data/**/*
 models/**/*.pth
 !**/.gitkeep
+uv.lock
+
+benchmark*.json
+
+.claude/
 
 # Created by https://www.toptal.com/developers/gitignore/api/vim,latex,linux,macos,synology,jetbrains+all,visualstudiocode,python,jupyternotebooks
 # Edit at https://www.toptal.com/developers/gitignore?templates=vim,latex,linux,macos,synology,jetbrains+all,visualstudiocode,python,jupyternotebooks
 
@@ -31,17 +31,20 @@
         "reportUndefinedVariable": "error",
         "reportMissingImports": "warning"
     },
+    // Disable old linting system
+    "python.linting.enabled": false,
+    "python.linting.pylintEnabled": false,
     // Python Formatting with Ruff
     "[python]": {
         "editor.formatOnSave": true,
         "editor.defaultFormatter": "charliermarsh.ruff",
         "editor.codeActionsOnSave": {
-            "source.fixAll": "explicit",
-            "source.organizeImports": "explicit"
+            "source.fixAll.ruff": "explicit",
+            "source.organizeImports.ruff": "explicit"
         }
     },
     // Ruff Settings
     "ruff.nativeServer": "on",
     // Other Extensions
-    "evenBetterToml.schema.enabled": false,
+    "evenBetterToml.schema.enabled": false
 }
@@ -3,3 +3,249 @@
 A research project done with the AI Team by Myriam Benlamri (Data Science MSc 2nd year) and Marcus Hamelink (Computer Science BSc 3rd year) as a collaborative research semester project.
 
 More info at [https://epflaiteam.ch/projects/iris](https://epflaiteam.ch/projects/iris)
+
+
+## Set up
+
+### Client
+
+On Your Raspberry Pi, generate the self-signed certificate:
+```bash
+mkdir -p ~/iris-certs
+cd ~/iris-certs
+openssl req -x509 -newkey rsa:4096 -nodes \
+  -keyout key.pem \
+  -out cert.pem \
+  -days 365 \
+  -subj "/C=US/ST=State/L=City/O=Organization/CN=$(hostname -I | awk '{print $1}')"
+```
+
+To use HTTPS, make sure to specify `use_ssl=true` under `client` in `config.yaml`. In that case, make sure to connect to the HTTPS IP.
+
+Run `uv run iris-client` to start a client instance.
+
+
+### Server
+
+```bash
+uv sync
+uv sync --group server
+
+```
+
+#### For training (unsloth)
+
+```bash
+uv pip install unsloth
+```
+
+#### Running the pipeline
+
+Run `uv run iris-server` to start a server instance.
+
+
+## Job System
+
+IRIS uses a flexible job system for managing inference tasks. Jobs can be started via API and triggered in multiple ways.
+
+### Job Types
+
+**1. SingleFrameJob**
+- Processes each incoming frame individually with VLM
+- Useful for: real-time inference, testing, continuous monitoring
+- Trigger: Automatic on every frame (or every Nth frame with `frame_skip`)
+
+**2. VideoJob**
+- Collects frames in buffer, then processes batch with video-aware VLM
+- Useful for: temporal understanding, action recognition, video summarization
+- Supports three trigger modes: periodic (automatic), manual (API), and disabled (job-to-job)
+
+### Trigger Modes
+
+VideoJob supports three triggering modes via the `trigger_mode` parameter:
+
+**PERIODIC (Automatic)**
+- Automatically triggers inference when buffer reaches `buffer_size` frames
+- After inference, keeps last `overlap_frames` for temporal continuity
+- Configuration example:
+```python
+{
+    "job_type": "video",
+    "trigger_mode": "periodic",
+    "buffer_size": 8,
+    "overlap_frames": 4
+}
+```
+**Use case:** Continuous video analysis (e.g., Qwen2.5-VL logging)
+
+**MANUAL (API-triggered)**
+- Buffers frames but only triggers via API call: `POST /jobs/{job_id}/trigger`
+- No overlap - buffer clears after each trigger
+- Configuration example:
+```python
+{
+    "job_type": "video",
+    "trigger_mode": "manual",
+    "buffer_size": 1
+}
+```
+**Use case:** On-demand analysis (e.g., colony counter when user clicks)
+
+**DISABLED (Buffering Only)**
+- Accepts and buffers frames but never processes them
+- For future use or conditional triggering
+- Configuration example:
+```python
+{
+    "job_type": "video",
+    "trigger_mode": "disabled",
+    "buffer_size": 8
+}
+```
+**Use case:** Placeholder for future YOLO integration
+
+### Auto-Started VideoJob
+
+When a client connects to `/ws/stream`, a VideoJob is automatically created for that connection:
+- Job ID: Unique per connection (e.g., `video_job_a3f7b2c1`)
+- Mode: PERIODIC
+- Buffer: 8 frames (configurable in `config.yaml`)
+- Overlap: 4 frames - 50% overlap for temporal continuity
+- Cleanup: Automatically stopped and removed when WebSocket disconnects
+
+**No manual job creation needed - just start streaming!**
+
+You can configure defaults in `config.yaml`:
+```yaml
+jobs:
+  video:
+    trigger_mode: "periodic"
+    buffer_size: 8
+    overlap_frames: 4
+```
+
+### API Endpoints
+
+**Start a job:**
+```bash
+POST /jobs/start
+Content-Type: application/json
+
+{
+    "job_type": "video",
+    "prompt": "Describe what you see in the video.",
+    "trigger_mode": "periodic",
+    "buffer_size": 8,
+    "overlap_frames": 4
+}
+```
+
+**Manually trigger inference:**
+```bash
+POST /jobs/{job_id}/trigger
+```
+
+**Get job status:**
+```bash
+GET /jobs/{job_id}/status
+```
+
+**List active jobs:**
+```bash
+GET /jobs/active
+```
+
+**Stop a job:**
+```bash
+POST /jobs/{job_id}/stop
+```
+
+### WebSocket Logging
+
+Jobs send progress logs via WebSocket (`/ws/stream`):
+
+```json
+{
+    "type": "log",
+    "job_id": "video-abc123",
+    "message": "Buffered frame 3/5",
+    "timestamp": 1234567890.123
+}
+```
+
+Results are also sent via WebSocket:
+
+```json
+{
+    "type": "result",
+    "job_id": "video-abc123",
+    "job_type": "video",
+    "status": "completed",
+    "result": "..."
+}
+```
+
+### Job Orchestration
+
+Jobs can launch other jobs during execution, enabling conditional workflows:
+
+```python
+class YOLOVideoJob(VideoJob):
+    async def _run_inference(self):
+        # Run YOLO detection
+        detections = await self._run_yolo(self.frame_buffer)
+
+        # If object detected, launch VLM job
+        if detections["confidence"] > 0.5:
+            vlm_config = VideoJobConfig(
+                prompt="Describe what the detected object is doing.",
+                trigger=TriggerConfig(mode=TriggerMode.DISABLED)
+            )
+            vlm_job = self.job_factory.create_job(vlm_config, ...)
+            await self.queue.submit(vlm_job)
+```
+
+### Multi-GPU Support
+
+Set `server.num_workers` in `config.yaml` to utilize multiple GPUs:
+
+```yaml
+server:
+  num_workers: 2  # Uses 2 GPUs in round-robin
+```
+
+Workers are automatically assigned to GPUs: `worker_id % device_count`.
+
+### Video Inference Notes
+
+**TODO:** The current VideoJob implementation processes only the first frame as a placeholder. Proper video inference requires exploring Qwen2.5-VL's video prompt template, which may support native video input with special tokens for temporal understanding.
+
+See `src/iris/vlm/inference/queue/jobs.py:VideoJob._sync_inference()` for implementation details.
+
+## Workflow with Izar
+
+This supposee
+
+### On Izar
+```
+cd /path/to/IRIS
+Sinteract -t 00:20:00 -g gpu:1 -m 32G -q team-ai
+hostname
+./run_iris.sh
+```
+
+### On Personal machine
+
+**Terminal 1**
+```
+uv run iris-client
+```
+
+**Terminal 2**
+```
+ssh -N -L 8005:[RUN hostname ON NODE TO SEE]:8001 EPFL-USERNAME@izar.hpc.epfl.ch
+```
+
+Then go to http://localhost:8006
+
+Important, make sure to modify the hostname
@@ -0,0 +1,43 @@
+# IRIS Configuration
+# Override via environment variables: IRIS_SERVER__PORT=8002
+
+server:
+  model_id: "unsloth/Qwen2-VL-7B-Instruct-bnb-4bit"  # Direct model selection (qwen2.5-7b)
+  vlm_hardware: null      # Optional: Hardware profile (v100, mac, etc.)
+  max_queue_size: 10
+  num_workers: 1
+  host: "0.0.0.0"
+  port: 8001
+  graceful_shutdown_timeout: 30.0
+  enable_log_streaming: true
+  log_streaming_min_level: "INFO"
+  enable_metrics: true
+
+jobs:
+  video:
+    trigger_mode: "periodic"  # "periodic" | "manual" | "disabled"
+    buffer_size: 8
+    overlap_frames: 4
+
+client:
+  video:
+    width: 640
+    height: 480
+    fps: 10
+    jpeg_quality: 80
+    camera_index: 0
+  server:
+    host: "localhost"
+    port: 8001
+    use_ssl: false
+  web:
+    host: "0.0.0.0"
+    port: 8006
+    use_ssl: true
+    cert_dir: "~/iris-certs"  # Directory containing key.pem and cert.pem
+  ssh_tunnel:
+    enabled: false  # Toggle on for IZAR HPC
+    ssh_host: "izar.hpc.epfl.ch"
+    ssh_user: "mhamelin"
+    ssh_key_path: "~/.ssh/id_rsa"
+    remote_host: ""  # Set via UI or config (IZAR compute node hostname)
@@ -0,0 +1,13 @@
+# Apple Silicon (M1/M2/M3) optimization
+# Uses Metal Performance Shaders (MPS) for GPU acceleration
+
+model:
+  dtype: "float16"  # MPS works well with float16
+  attn_implementation: "sdpa"
+  low_cpu_mem_usage: true
+
+quantization:
+  load_in_8bit: false
+  load_in_4bit: false
+
+device: "mps"  # Metal Performance Shaders for Apple Silicon
@@ -1,25 +1,13 @@
-training:
-#   batch_size: 8
-#   gradient_accumulation_steps: 4  # Effective batch = 32
+# V100 GPU optimization (16GB VRAM)
+# For inference use on IZAR cluster
 
-# model:
-#   torch_dtype: "float16"
+model:
+  dtype: "float16"  # V100 doesn't support bfloat16
+  attn_implementation: "sdpa"  # V100 doesn't support flash_attention_2
+  low_cpu_mem_usage: true
 
-# accelerate:
-#   use_accelerate: true
-#   mixed_precision: "fp16"  # V100 supports fp16, NOT bf16
-#   gradient_checkpointing: true
+quantization:
+  load_in_8bit: false  # V100 has enough VRAM for float16
+  load_in_4bit: false
 
-# peft:
-#   use_peft: true
-#   peft_method: "lora"
-#   r: 8
-#   alpha: 16
-#   dropout: 0.1
-
-# quantization:
-#   load_in_4bit: true
-#   bnb_4bit_quant_type: "nf4"
-#   bnb_4bit_compute_dtype: "float16"
-
-# device: "cuda"
+device: "auto"