Fix gateway and router issues#141
Conversation
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
| if (poolCount === 0) { | ||
| return { available: false }; | ||
| } | ||
|
|
There was a problem hiding this comment.
This fixes the issue that the UI was showing "Not Detected" even when Gateway and InferencePool CRD are installed.
| - list | ||
| - patch | ||
| - update | ||
| - watch |
There was a problem hiding this comment.
This is required if we want to auto generate DestinationRules. Otherwise, the cluster operator will need to create them.
|
Verified with: curl -m 15 http://<public ip>/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "microsoft/Phi-3-mini-4k-instruct", "messages": [{"role": "user", "content": "Hello"}]}'
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":" Hello!","reasoning_content":null,"role":"assistant","tool_calls":[]},"stop_reason":32007}],"created":1773280441,"id":"chatcmpl-3a77255c-0ccc-49d5-9368-95f8fc2571ec","kv_transfer_params":null,"model":"microsoft/Phi-3-mini-4k-instruct","object":"chat.completion","prompt_logprobs":null,"usage":{"completion_tokens":3,"prompt_tokens":4,"prompt_tokens_details":null,"total_tokens":7}}
curl -m 15 http://<public ip>/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hello"}]}'
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"\u003cthink\u003e\nOkay, the user just said \"Hello\". I need to respond appropriately. Since the user hasn't asked a specific question, I should acknowledge their greeting. Maybe say \"Hello! How can I assist you today?\" to open the conversation. I should keep it friendly and offer help. Let me check if there's any context I'm missing, but since there isn't, just a simple greeting should work. Make sure the tone is cheerful and helpful.\n\u003c/think\u003e\n\nHello! How can I assist you today? 😊","reasoning_content":null,"role":"assistant","tool_calls":[]},"stop_reason":null}],"created":1773281223,"id":"chatcmpl-73ca65c4-3d69-4bb5-80b3-2656793542fc","kv_transfer_params":null,"model":"Qwen/Qwen3-0.6B","object":"chat.completion","prompt_logprobs":null,"usage":{"completion_tokens":107,"prompt_tokens":9,"prompt_tokens_details":null,"total_tokens":116} |
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
keithmattix
left a comment
There was a problem hiding this comment.
LGTM - just have two future looking comments
| // but only if Istio is detected (i.e. the DestinationRule CRD is registered in the cluster). | ||
| // DestinationRule: tell Istio to use SIMPLE TLS (insecureSkipVerify) | ||
| // to skip cert validation. | ||
| func (r *ModelDeploymentReconciler) reconcileEPPDestinationRule(ctx context.Context, md *kubeairunwayv1alpha1.ModelDeployment, eppName string) error { |
There was a problem hiding this comment.
Not a blocker for this PR, but we need to figure out if KAIR will continue to auto deploy gateway provider specific solutions like this. Should it deploy and configure the BBR to be used by Istio for example? Definitely an open question
|
|
||
| // restartBBRIfPresent triggers a rolling restart of the body-based-router Deployment (if present | ||
| // in the given namespace) by updating its restart annotation. This is necessary because BBR builds | ||
| // its internal model registry on startup and does not dynamically watch InferencePools. |
There was a problem hiding this comment.
Can we file an issue in GAIE for this? That feels like the right place to solve this
Description
AI Prompt (Optional)
🤖 AI Prompt Used
AI Tool:
Type of Change
Related Issues
Changes Made
Testing
bun run test)Checklist
bun run lintScreenshots
Additional Notes