Retry gateway model-name discovery for KAITO llama.cpp by sozercan · Pull Request #167 · kaito-project/airunway

sozercan · 2026-03-18T17:48:58Z

Summary

This draft PR retries gateway model-name discovery for KAITO llama.cpp deployments when the controller has to fall back to spec.model.id because the model server is not ready to answer /v1/models yet.

Concretely, it:

returns a retry signal from reconcileGateway
adds a modelNameResolution result so model-name resolution can carry both the resolved name and whether the controller should try again later
requeues the ModelDeployment reconcile after 1 minute when the fallback path is hit for KAITO llama.cpp
adds unit coverage for the retry and explicit-override cases

Why

054ba06 changed gateway model-name resolution so KAITO llama.cpp deployments do not trust spec.model.servedName and instead prefer runtime discovery from /v1/models. That makes sense because AIKit / LocalAI can expose a served model name derived from the downloaded GGUF file rather than the original Hugging Face ID.

The remaining gap is timing:

the ModelDeployment reaches Running
gateway reconciliation runs
/v1/models is not ready yet, so discovery fails
the controller falls back to spec.model.id
the HTTPRoute gets created or updated with the fallback header match

At that point there is no guaranteed later reconcile to correct the route header once the model server becomes ready.

Relevant current behavior:

the controller watches ModelDeployment and owned InferencePools, but not HTTPRoute
auto-created routes are annotated with airunway.ai/httproute-created
missing routes are intentionally not recreated after user deletion
existing routes are updated on reconcile, but only if a later reconcile actually happens

So this PR is trying to close a controller timing hole where the route can stay pinned to the fallback model name even though the runtime later exposes the correct model name.

What Changed

Controller behavior

reconcileGateway now returns (bool, error) where the boolean means "please requeue for model-name discovery"
the main reconcile loop uses ctrl.Result{RequeueAfter: time.Minute} when gateway reconciliation requests a retry
the retry signal is limited to:
- provider = kaito
- engine = llamacpp
- no explicit spec.gateway.modelName override

Model-name resolution

introduced modelNameResolution with:
- name
- retry
explicit spec.gateway.modelName still wins and suppresses retry
successful /v1/models discovery returns the discovered runtime model name and no retry
fallback to spec.model.id can now also request a later retry

Tests

Added coverage for:

KAITO llama.cpp fallback requesting retry
explicit gateway model-name override suppressing retry
existing gateway tests updated for the new reconcileGateway signature

Investigation Context

This PR came out of reconstructing some local unstaged changes after the original session history was lost.

While investigating, I also checked the current live cluster state. That cluster does not have this patch deployed yet, and the live failure I reproduced there appears to be a separate problem:

current test request:
- curl http://102.133.128.103/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "NVIDIA-Nemotron-3-Nano-4B-UD-IQ2_M", "messages": [{"role": "user", "content": "Hello"}]}'
gateway response:
- 503 Service Unavailable
- upstream connect error or disconnect/reset before headers. reset reason: connection termination
direct service checks:
- GET /v1/models from the model service returned 200 with NVIDIA-Nemotron-3-Nano-4B-UD-IQ2_M
- direct POST /v1/chat/completions also failed
pod state at the time of investigation:
- model pod restarted multiple times
- last termination reason was OOMKilled / exit code 137
- pod briefly entered CrashLoopBackOff
- current pod spec showed resources: {}
- runner log showed Default capability (no GPU detected)

That means the currently observed cluster failure is most likely an upstream model crash / resource issue, not something caused by this PR.

I am including that context here because it is easy to conflate the two:

this PR addresses a controller retry / route-correction gap for KAITO llama.cpp gateway model-name discovery
the current live cluster failure looked like a runtime OOM / inference crash path

Validation

Ran locally in controller/:

go build ./...
go test ./...

Open Questions For Review

Because this is a draft, a few follow-ups are worth deciding explicitly:

Should the retry be capped instead of continuing indefinitely while discovery keeps failing?
Should retry be skipped when the user provides spec.gateway.httpRouteRef and owns route updates externally?
Is 1 minute the right retry delay, or should it be shorter / longer for AIKit startup behavior?

Retry gateway model-name discovery for KAITO llama.cpp

728a6d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry gateway model-name discovery for KAITO llama.cpp#167

Retry gateway model-name discovery for KAITO llama.cpp#167
sozercan wants to merge 1 commit into
mainfrom
fix/kaito-llamacpp-gateway-model-name-retry

sozercan commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sozercan commented Mar 18, 2026

Summary

Why

What Changed

Controller behavior

Model-name resolution

Tests

Investigation Context

Validation

Open Questions For Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant