Immich ML Proxy

A proxy service for Immich ML with support for multi-backend routing, task-aware dispatch, health monitoring, and comprehensive debugging capabilities.

Features

Multi-backend Support: Configure multiple Immich ML backend servers
Task-aware Routing: Keep dependent task sub-types together for tasks like facial-recognition and ocr
CLIP Split Routing: Route clip.textual and clip.visual independently when they should run on different backends
Per-route Policy: Configure each task/modelType route as strict or fallback
Round-robin Load Balancing: Distribute requests across healthy backends for each routed task
Health Monitoring: Continuous health checking with automatic failover
Concurrent Processing: Process independent dispatch groups in parallel for improved performance
Web Configuration UI: Simple web interface for managing backends and routing with real-time health status
Debug Mode: Comprehensive request/response logging and debugging tools
Request Recording: Capture and inspect incoming and outgoing HTTP requests/responses

API Endpoints

GET /

Returns a simple web page with links to the configuration and debug interfaces.

GET /ping

Checks the health status of all configured backends and verifies that each routed task/model type has a healthy backend.

Behavior:

Checks health of all backends in parallel by calling their /ping endpoint
Updates health status for each backend based on response
Verifies that the default backend is healthy (handles all non-routed types)
Verifies strict task routes in taskRouting have a healthy backend
Verifies strict modelTypeRouting targets are healthy
Allows fallback routes to skip to defaultBackend when their routed backend is unhealthy

Response:

Returns "pong" with HTTP 200 if:
- Default backend is healthy
- Every strict task route in taskRouting has at least one healthy backend
- Every strict modelTypeRouting target is healthy
Returns HTTP 503 (Service Unavailable) if:
- No backends are configured
- Default backend is not set or not found in the backends list
- Default backend is unhealthy
- Any strict task route in taskRouting lacks healthy backends
- Any strict modelTypeRouting target is unhealthy

POST /predict

Routes inference requests to appropriate backends based on task semantics. Dependent tasks stay grouped, while CLIP can be split by model type and merged back into one response.

Request Parameters:

entries: JSON string containing task configuration with nested structure
- Format: {"taskName": {"type": config, ...}}
image: Image file (optional, multipart form data)
text: Text content (optional, multipart form data)

Behavior:

Normalizes legacy facial_recognition requests/config entries to facial-recognition
Keeps facial-recognition and ocr grouped so their dependent sub-types are forwarded together
Splits clip into independent textual and visual requests
Supports CLIP requests that contain either one model type or both model types
For each dispatch group:
- Uses modelTypeRouting first for split CLIP requests
- Falls back to task routing from taskRouting
- Falls back again to defaultBackend if no task-specific route exists
- Applies per-route policy:
  - strict: use routed backend even if unhealthy; request fails if the backend is down
  - fallback: use routed backend when healthy; if unhealthy or the request fails, skip to the next fallback level within the same request
- Updates backend health status based on response (200 = healthy, other = unhealthy)
Processes dispatch groups concurrently and merges split CLIP responses back into one JSON response

Health Status Updates:

Backend marked as healthy: Returns HTTP 200
Backend marked as unhealthy: Returns non-200 status or connection error

Response: JSON object with results merged back under their original task keys

GET /config

Returns the web configuration interface.

GET /api/config

Returns current configuration in JSON format.

GET /api/health

Returns health status of all backends in real-time.

Response:

{
  "backend1": {
    "status": "healthy",
    "lastCheck": 1735278000
  },
  "backend2": {
    "status": "unhealthy",
    "lastCheck": 1735278010,
    "error": "connection refused"
  }
}

Status Values:

healthy: Backend is responding correctly
unhealthy: Backend is not responding or returning errors
unknown: Health status not yet checked

POST /api/config

Saves configuration.

Request Body:

{
  "defaultBackend": "backend1",
  "backends": [
    {
      "name": "backend1",
      "url": "http://localhost:3003"
    },
    {
      "name": "backend2",
      "url": "http://localhost:3004"
    },
    {
      "name": "backend3",
      "url": "http://localhost:3005"
    }
  ],
  "taskRouting": {
    "facial-recognition": "backend1",
    "search": "backend2"
  },
  "modelTypeRouting": {
    "textual": "backend2",
    "visual": "backend3"
  },
  "taskRoutingPolicy": {
    "search": "strict"
  },
  "modelTypeRoutingPolicy": {
    "textual": "fallback",
    "visual": "strict"
  }
}

GET /debug

Returns the debug monitoring interface.

GET /api/debug/status

Returns current debug status.

Response:

{
  "enabled": true,
  "maxRecords": 100,
  "filterPing": true,
  "recordCount": 42
}

POST /api/debug/toggle

Enables or disables debug mode.

Request Body:

{
  "enabled": true
}

POST /api/debug/max-records

Sets the maximum number of debug records to keep (1-10000).

Request Body:

{
  "maxRecords": 500
}

POST /api/debug/filter-ping

Toggles whether /ping health check requests are excluded from debug records.

Request Body:

{
  "filterPing": true
}

GET /api/debug/records

Returns all debug records (incoming and outgoing HTTP requests/responses).

DELETE /api/debug/records

Clears all debug records.

Configuration

Configuration is saved in config.json:

{
  "defaultBackend": "backend1",
  "backends": [
    {
      "name": "backend1",
      "url": "http://localhost:3003"
    },
    {
      "name": "backend2",
      "url": "http://localhost:3004"
    },
    {
      "name": "backend3",
      "url": "http://localhost:3005"
    }
  ],
  "taskRouting": {
    "clip": "backend2",
    "facial-recognition": "backend1"
  },
  "modelTypeRouting": {
    "textual": "backend2",
    "visual": "backend3"
  },
  "taskRoutingPolicy": {
    "clip": "strict",
    "facial-recognition": "strict"
  },
  "modelTypeRoutingPolicy": {
    "textual": "fallback",
    "visual": "strict"
  }
}

Configuration Fields:

defaultBackend: Name of the backend that handles tasks without explicit routing
backends: List of backend servers with name and URL
taskRouting: Maps task names to backend names (e.g., facial-recognition → backend1)
modelTypeRouting: Maps CLIP model types to backend names (e.g., textual → backend2)
taskRoutingPolicy: Optional per-task policy: strict or fallback
modelTypeRoutingPolicy: Optional per-modelType policy: strict or fallback

Dispatch Rules:

facial-recognition and ocr stay grouped and are routed using taskRouting
clip can arrive with textual, visual, or both, and each present model type is routed independently via modelTypeRouting, with fallback to taskRouting["clip"] and then defaultBackend
strict route policy uses the routed backend even if unhealthy; request fails if the backend is down
fallback route policy uses the routed backend when healthy; if unhealthy or the request fails, skips to the next fallback level
All other tasks are routed to the defaultBackend
Health checks verify the default backend, routed tasks, and configured modelType routes

Running

# Install dependencies
go mod download

# Run the service (production mode)
go run main.go

# Run the service with debug mode enabled
go run main.go --debug

# Run on a custom port
go run main.go --port 8080

The service listens on port :3004 by default. Use --port to change it.

Usage Example

Basic Setup

Start the service:
```
go run main.go
```
Visit http://localhost:3004/config to configure backends
Add backend servers and configure task routing
Save configuration

Making Predictions

Send a POST request to http://localhost:3004/predict with multipart form data:

# Request for grouped facial-recognition (routed to backend1)
curl -X POST http://localhost:3004/predict \
  -F "entries={\"facial-recognition\": {\"detection\": {}, \"recognition\": {}}}" \
  -F "image=@photo.jpg"

# Request for split CLIP with both model types present
curl -X POST http://localhost:3004/predict \
  -F "entries={\"clip\": {\"textual\": {}, \"visual\": {}}}" \
  -F "image=@photo.jpg"

# Request for CLIP textual only
curl -X POST http://localhost:3004/predict \
  -F "entries={\"clip\": {\"textual\": {}}}" \
  -F "text=cat on a sofa"

# Request for grouped OCR (types stay together on one backend)
curl -X POST http://localhost:3004/predict \
  -F "entries={\"ocr\": {\"detection\": {}, \"recognition\": {}}}" \
  -F "image=@document.jpg"

Health Monitoring

Check overall health:
```
curl http://localhost:3004/ping
```
- Returns "pong" if all routed tasks/model types have healthy backends
- Returns 503 if any backend is unhealthy
View individual backend health status:
```
curl http://localhost:3004/api/health
```
- Returns health status for all backends with timestamps and error details
Monitor health in real-time:
- Visit http://localhost:3004/config to see live health status for each backend
- Health status refreshes every 5 seconds automatically

Debugging

Enable debug mode:
- Visit http://localhost:3004/debug and click "Enable Debug"
- Or use the API: POST /api/debug/toggle with {"enabled": true}
Make some requests
View recorded requests/responses at http://localhost:3004/debug
Clear records when needed: DELETE /api/debug/records

Project Structure

immich_ml_proxy/
├── main.go              # Main entry point
├── config/
│   └── config.go        # Configuration management (singleton pattern)
├── proxy/
│   └── proxy.go         # Proxy logic and request forwarding
├── handlers/
│   ├── handlers.go      # Main HTTP handlers
│   └── debug.go         # Debug-related handlers
├── debug/
│   └── debug.go         # Debug manager for request/response recording
└── static/
    ├── config.html      # Web configuration interface
    ├── debug.html       # Debug monitoring interface
    ├── shared.css       # Shared styles
    └── shared.js        # Shared header and notice manager

Architecture

Configuration: Thread-safe singleton configuration manager with file persistence and health status tracking
Proxy: Handles request parsing, task-aware grouping/splitting, round-robin load balancing, and concurrent forwarding to backends
Health Monitoring: Continuous health checking with automatic status updates and failover logic
Handlers: HTTP endpoint handlers for configuration, prediction, health monitoring, and debugging
Debug: Comprehensive request/response recording with configurable retention
Middleware: Debug middleware that captures all HTTP traffic when enabled

Routing Logic:

Parse request entries and normalize task names such as facial_recognition → facial-recognition
Keep facial-recognition and ocr grouped by task so dependent sub-types stay together
Split clip into textual / visual dispatch groups for whichever model types are present
Route split CLIP groups with modelTypeRouting, otherwise use taskRouting
Apply route policy (strict or fallback) at each routing level
Fall back to defaultBackend when no explicit route is selected
Forward each dispatch group and merge split-task responses back together

Health Check Logic:

Check all backends in parallel via /ping endpoint
Verify defaultBackend is healthy (required for non-routed types)
Verify each strict task route in taskRouting has at least one healthy backend
Verify each strict modelTypeRouting backend is healthy
Return healthy only if all conditions are met

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Immich ML Proxy

Features

API Endpoints

GET /

GET /ping

POST /predict

GET /config

GET /api/config

GET /api/health

POST /api/config

GET /debug

GET /api/debug/status

POST /api/debug/toggle

POST /api/debug/max-records

POST /api/debug/filter-ping

GET /api/debug/records

DELETE /api/debug/records

Configuration

Running

Usage Example

Basic Setup

Making Predictions

Health Monitoring

Debugging

Project Structure

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
config		config
debug		debug
handlers		handlers
proxy		proxy
static		static
tasks		tasks
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

Immich ML Proxy

Features

API Endpoints

GET /

GET /ping

POST /predict

GET /config

GET /api/config

GET /api/health

POST /api/config

GET /debug

GET /api/debug/status

POST /api/debug/toggle

POST /api/debug/max-records

POST /api/debug/filter-ping

GET /api/debug/records

DELETE /api/debug/records

Configuration

Running

Usage Example

Basic Setup

Making Predictions

Health Monitoring

Debugging

Project Structure

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages