A proxy service for Immich ML with support for multi-backend routing, task-aware dispatch, health monitoring, and comprehensive debugging capabilities.
- Multi-backend Support: Configure multiple Immich ML backend servers
- Task-aware Routing: Keep dependent task sub-types together for tasks like
facial-recognitionandocr - CLIP Split Routing: Route
clip.textualandclip.visualindependently when they should run on different backends - Per-route Policy: Configure each task/modelType route as
strictorfallback - Round-robin Load Balancing: Distribute requests across healthy backends for each routed task
- Health Monitoring: Continuous health checking with automatic failover
- Concurrent Processing: Process independent dispatch groups in parallel for improved performance
- Web Configuration UI: Simple web interface for managing backends and routing with real-time health status
- Debug Mode: Comprehensive request/response logging and debugging tools
- Request Recording: Capture and inspect incoming and outgoing HTTP requests/responses
Returns a simple web page with links to the configuration and debug interfaces.
Checks the health status of all configured backends and verifies that each routed task/model type has a healthy backend.
Behavior:
- Checks health of all backends in parallel by calling their
/pingendpoint - Updates health status for each backend based on response
- Verifies that the default backend is healthy (handles all non-routed types)
- Verifies strict task routes in
taskRoutinghave a healthy backend - Verifies strict
modelTypeRoutingtargets are healthy - Allows fallback routes to skip to
defaultBackendwhen their routed backend is unhealthy
Response:
- Returns
"pong"with HTTP 200 if:- Default backend is healthy
- Every strict task route in
taskRoutinghas at least one healthy backend - Every strict
modelTypeRoutingtarget is healthy
- Returns HTTP 503 (Service Unavailable) if:
- No backends are configured
- Default backend is not set or not found in the backends list
- Default backend is unhealthy
- Any strict task route in
taskRoutinglacks healthy backends - Any strict
modelTypeRoutingtarget is unhealthy
Routes inference requests to appropriate backends based on task semantics. Dependent tasks stay grouped, while CLIP can be split by model type and merged back into one response.
Request Parameters:
entries: JSON string containing task configuration with nested structure- Format:
{"taskName": {"type": config, ...}}
- Format:
image: Image file (optional, multipart form data)text: Text content (optional, multipart form data)
Behavior:
- Normalizes legacy
facial_recognitionrequests/config entries tofacial-recognition - Keeps
facial-recognitionandocrgrouped so their dependent sub-types are forwarded together - Splits
clipinto independenttextualandvisualrequests - Supports CLIP requests that contain either one model type or both model types
- For each dispatch group:
- Uses
modelTypeRoutingfirst for split CLIP requests - Falls back to task routing from
taskRouting - Falls back again to
defaultBackendif no task-specific route exists - Applies per-route policy:
strict: use routed backend even if unhealthy; request fails if the backend is downfallback: use routed backend when healthy; if unhealthy or the request fails, skip to the next fallback level within the same request
- Updates backend health status based on response (200 = healthy, other = unhealthy)
- Uses
- Processes dispatch groups concurrently and merges split CLIP responses back into one JSON response
Health Status Updates:
- Backend marked as healthy: Returns HTTP 200
- Backend marked as unhealthy: Returns non-200 status or connection error
Response: JSON object with results merged back under their original task keys
Returns the web configuration interface.
Returns current configuration in JSON format.
Returns health status of all backends in real-time.
Response:
{
"backend1": {
"status": "healthy",
"lastCheck": 1735278000
},
"backend2": {
"status": "unhealthy",
"lastCheck": 1735278010,
"error": "connection refused"
}
}Status Values:
healthy: Backend is responding correctlyunhealthy: Backend is not responding or returning errorsunknown: Health status not yet checked
Saves configuration.
Request Body:
{
"defaultBackend": "backend1",
"backends": [
{
"name": "backend1",
"url": "http://localhost:3003"
},
{
"name": "backend2",
"url": "http://localhost:3004"
},
{
"name": "backend3",
"url": "http://localhost:3005"
}
],
"taskRouting": {
"facial-recognition": "backend1",
"search": "backend2"
},
"modelTypeRouting": {
"textual": "backend2",
"visual": "backend3"
},
"taskRoutingPolicy": {
"search": "strict"
},
"modelTypeRoutingPolicy": {
"textual": "fallback",
"visual": "strict"
}
}Returns the debug monitoring interface.
Returns current debug status.
Response:
{
"enabled": true,
"maxRecords": 100,
"filterPing": true,
"recordCount": 42
}Enables or disables debug mode.
Request Body:
{
"enabled": true
}Sets the maximum number of debug records to keep (1-10000).
Request Body:
{
"maxRecords": 500
}Toggles whether /ping health check requests are excluded from debug records.
Request Body:
{
"filterPing": true
}Returns all debug records (incoming and outgoing HTTP requests/responses).
Clears all debug records.
Configuration is saved in config.json:
{
"defaultBackend": "backend1",
"backends": [
{
"name": "backend1",
"url": "http://localhost:3003"
},
{
"name": "backend2",
"url": "http://localhost:3004"
},
{
"name": "backend3",
"url": "http://localhost:3005"
}
],
"taskRouting": {
"clip": "backend2",
"facial-recognition": "backend1"
},
"modelTypeRouting": {
"textual": "backend2",
"visual": "backend3"
},
"taskRoutingPolicy": {
"clip": "strict",
"facial-recognition": "strict"
},
"modelTypeRoutingPolicy": {
"textual": "fallback",
"visual": "strict"
}
}Configuration Fields:
defaultBackend: Name of the backend that handles tasks without explicit routingbackends: List of backend servers with name and URLtaskRouting: Maps task names to backend names (e.g.,facial-recognition→backend1)modelTypeRouting: Maps CLIP model types to backend names (e.g.,textual→backend2)taskRoutingPolicy: Optional per-task policy:strictorfallbackmodelTypeRoutingPolicy: Optional per-modelType policy:strictorfallback
Dispatch Rules:
facial-recognitionandocrstay grouped and are routed usingtaskRoutingclipcan arrive withtextual,visual, or both, and each present model type is routed independently viamodelTypeRouting, with fallback totaskRouting["clip"]and thendefaultBackendstrictroute policy uses the routed backend even if unhealthy; request fails if the backend is downfallbackroute policy uses the routed backend when healthy; if unhealthy or the request fails, skips to the next fallback level- All other tasks are routed to the
defaultBackend - Health checks verify the default backend, routed tasks, and configured modelType routes
# Install dependencies
go mod download
# Run the service (production mode)
go run main.go
# Run the service with debug mode enabled
go run main.go --debug
# Run on a custom port
go run main.go --port 8080The service listens on port :3004 by default. Use --port to change it.
-
Start the service:
go run main.go
-
Visit
http://localhost:3004/configto configure backends -
Add backend servers and configure task routing
-
Save configuration
Send a POST request to http://localhost:3004/predict with multipart form data:
# Request for grouped facial-recognition (routed to backend1)
curl -X POST http://localhost:3004/predict \
-F "entries={\"facial-recognition\": {\"detection\": {}, \"recognition\": {}}}" \
-F "image=@photo.jpg"
# Request for split CLIP with both model types present
curl -X POST http://localhost:3004/predict \
-F "entries={\"clip\": {\"textual\": {}, \"visual\": {}}}" \
-F "image=@photo.jpg"
# Request for CLIP textual only
curl -X POST http://localhost:3004/predict \
-F "entries={\"clip\": {\"textual\": {}}}" \
-F "text=cat on a sofa"
# Request for grouped OCR (types stay together on one backend)
curl -X POST http://localhost:3004/predict \
-F "entries={\"ocr\": {\"detection\": {}, \"recognition\": {}}}" \
-F "image=@document.jpg"-
Check overall health:
curl http://localhost:3004/ping
- Returns "pong" if all routed tasks/model types have healthy backends
- Returns 503 if any backend is unhealthy
-
View individual backend health status:
curl http://localhost:3004/api/health
- Returns health status for all backends with timestamps and error details
-
Monitor health in real-time:
- Visit
http://localhost:3004/configto see live health status for each backend - Health status refreshes every 5 seconds automatically
- Visit
-
Enable debug mode:
- Visit
http://localhost:3004/debugand click "Enable Debug" - Or use the API:
POST /api/debug/togglewith{"enabled": true}
- Visit
-
Make some requests
-
View recorded requests/responses at
http://localhost:3004/debug -
Clear records when needed:
DELETE /api/debug/records
immich_ml_proxy/
├── main.go # Main entry point
├── config/
│ └── config.go # Configuration management (singleton pattern)
├── proxy/
│ └── proxy.go # Proxy logic and request forwarding
├── handlers/
│ ├── handlers.go # Main HTTP handlers
│ └── debug.go # Debug-related handlers
├── debug/
│ └── debug.go # Debug manager for request/response recording
└── static/
├── config.html # Web configuration interface
├── debug.html # Debug monitoring interface
├── shared.css # Shared styles
└── shared.js # Shared header and notice manager
- Configuration: Thread-safe singleton configuration manager with file persistence and health status tracking
- Proxy: Handles request parsing, task-aware grouping/splitting, round-robin load balancing, and concurrent forwarding to backends
- Health Monitoring: Continuous health checking with automatic status updates and failover logic
- Handlers: HTTP endpoint handlers for configuration, prediction, health monitoring, and debugging
- Debug: Comprehensive request/response recording with configurable retention
- Middleware: Debug middleware that captures all HTTP traffic when enabled
Routing Logic:
- Parse request entries and normalize task names such as
facial_recognition→facial-recognition - Keep
facial-recognitionandocrgrouped by task so dependent sub-types stay together - Split
clipintotextual/visualdispatch groups for whichever model types are present - Route split CLIP groups with
modelTypeRouting, otherwise usetaskRouting - Apply route policy (
strictorfallback) at each routing level - Fall back to
defaultBackendwhen no explicit route is selected - Forward each dispatch group and merge split-task responses back together
Health Check Logic:
- Check all backends in parallel via
/pingendpoint - Verify
defaultBackendis healthy (required for non-routed types) - Verify each strict task route in
taskRoutinghas at least one healthy backend - Verify each strict
modelTypeRoutingbackend is healthy - Return healthy only if all conditions are met