The gateway should only translate API requests and enqueue backend work. Today FFT makes this awkward because create_model needs a dedicated worker to exist before the normal model queue can be drained. To avoid launching workers directly from the gateway, we added a worker-launch queue that starts the FFT worker and then forwards create_model to the normal queue.
This works, but the boundary is still fuzzy. We should move toward a backend-owned orchestrator that handles worker lifecycle for both local processes and future Kubernetes pods, while the gateway only enqueues requests.
The gateway should only translate API requests and enqueue backend work. Today FFT makes this awkward because create_model needs a dedicated worker to exist before the normal model queue can be drained. To avoid launching workers directly from the gateway, we added a worker-launch queue that starts the FFT worker and then forwards create_model to the normal queue.
This works, but the boundary is still fuzzy. We should move toward a backend-owned orchestrator that handles worker lifecycle for both local processes and future Kubernetes pods, while the gateway only enqueues requests.