-
Notifications
You must be signed in to change notification settings - Fork 48
Add implementation for llm artifact deployment in gateway #1093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe control plane client is extended to handle LLM provider and proxy lifecycle events via WebSocket, introducing event-driven deployment/undeployment flows. New event types, handler scaffolding, and API utilities are added to integrate LLM deployment services with YAML-based configuration management. Changes
Sequence DiagramsequenceDiagram
participant WebSocket Event Source
participant Client
participant API Utils Service
participant LLM Deployment Service
participant Control Plane API
WebSocket Event Source->>Client: LLM Provider Deployed Event
activate Client
Client->>API Utils Service: FetchLLMProviderDefinition(providerID)
activate API Utils Service
API Utils Service->>Control Plane API: GET /llm-providers/{providerID}
Control Plane API-->>API Utils Service: ZIP file
API Utils Service-->>Client: []byte (ZIP data)
deactivate API Utils Service
Client->>API Utils Service: ExtractYAMLFromZip(zipData)
activate API Utils Service
API Utils Service-->>Client: []byte (YAML)
deactivate API Utils Service
Client->>API Utils Service: CreateLLMProviderFromYAML(yamlData, providerID, correlationID, service)
activate API Utils Service
API Utils Service->>LLM Deployment Service: DeployLLMProviderConfiguration(...)
activate LLM Deployment Service
LLM Deployment Service-->>API Utils Service: APIDeploymentResult
deactivate LLM Deployment Service
API Utils Service-->>Client: APIDeploymentResult or error
deactivate API Utils Service
Client->>Client: Log processing result
deactivate Client
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
gateway/gateway-controller/pkg/utils/api_utils.go (2)
122-234: Extract a commonfetchDefinitionZiphelper to eliminate duplication across all three Fetch methods.
FetchLLMProviderDefinition,FetchLLMProxyDefinition, and the existingFetchAPIDefinitionshare identical logic — only the URL path segment and log entity name differ. This is classic copy-paste duplication.Consider something like:
♻️ Sketch of a shared helper
+// fetchDefinitionZip downloads a zip definition from the given resource path. +func (s *APIUtilsService) fetchDefinitionZip(resourcePath, entityType, entityID string) ([]byte, error) { + url := s.config.BaseURL + resourcePath + s.logger.Info("Fetching "+entityType+" definition", + slog.String("id", entityID), + slog.String("url", url), + ) + + client := s.newHTTPClient() + + req, err := http.NewRequest("GET", url, nil) + if err != nil { + return nil, fmt.Errorf("failed to create request: %w", err) + } + req.Header.Add("api-key", s.config.Token) + req.Header.Add("Accept", "application/zip") + + resp, err := client.Do(req) + if err != nil { + return nil, fmt.Errorf("failed to fetch %s definition: %w", entityType, err) + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusOK { + bodyBytes, _ := io.ReadAll(resp.Body) + return nil, fmt.Errorf("%s request failed with status %d: %s", entityType, resp.StatusCode, string(bodyBytes)) + } + + bodyBytes, err := io.ReadAll(resp.Body) + if err != nil { + return nil, fmt.Errorf("failed to read response body: %w", err) + } + + s.logger.Info("Successfully fetched "+entityType+" definition", + slog.String("id", entityID), + slog.Int("size_bytes", len(bodyBytes)), + ) + return bodyBytes, nil +}Then each public method becomes a one-liner:
func (s *APIUtilsService) FetchLLMProviderDefinition(providerID string) ([]byte, error) { return s.fetchDefinitionZip("/llm-providers/"+providerID, "LLM provider", providerID) }
132-140: MissingMinVersionin TLS config (flagged by static analysis).The new
tls.Configblocks omitMinVersion, defaulting to TLS 1.0 when acting as a server and TLS 1.2 as a client. While the client default is generally acceptable, explicitly settingMinVersion: tls.VersionTLS12(ortls.VersionTLS13) is a defense-in-depth best practice and silences the static analysis warning. Note that the same issue exists in the pre-existingFetchAPIDefinition(line 79-81) andNotifyAPIDeployment— this would be a good time to fix all of them together.🔒 Proposed fix (apply to all tls.Config instances)
TLSClientConfig: &tls.Config{ InsecureSkipVerify: s.config.InsecureSkipVerify, + MinVersion: tls.VersionTLS12, },Also applies to: 189-197
gateway/gateway-controller/pkg/controlplane/client.go (1)
824-1042: Consider extracting common logic from the four LLM event handlers.The deployed handlers (
handleLLMProviderDeployedEventandhandleLLMProxyDeployedEvent) share an identical flow: parse event → extract ID → fetch zip → extract YAML → check service nil → create from YAML. Similarly, the two undeployed handlers share: parse event → extract ID → check service nil → delete.A helper like
handleLLMDeployedEvent(event, entityType, fetchFunc, createFunc)could reduce ~100 lines of near-duplicate code.
| // handleLLMProviderUndeployedEvent handles LLM provider undeployment events | ||
| func (c *Client) handleLLMProviderUndeployedEvent(event map[string]interface{}) { | ||
| c.logger.Info("LLM Provider Undeployment Event", | ||
| slog.Any("payload", event["payload"]), | ||
| slog.Any("timestamp", event["timestamp"]), | ||
| slog.Any("correlationId", event["correlationId"]), | ||
| ) | ||
|
|
||
| // Parse the event into structured format | ||
| eventBytes, err := json.Marshal(event) | ||
| if err != nil { | ||
| c.logger.Error("Failed to marshal LLM provider undeployment event for parsing", | ||
| slog.Any("error", err), | ||
| ) | ||
| return | ||
| } | ||
|
|
||
| var undeployedEvent LLMProviderUndeployedEvent | ||
| if err := json.Unmarshal(eventBytes, &undeployedEvent); err != nil { | ||
| c.logger.Error("Failed to parse LLM provider undeployment event", | ||
| slog.Any("error", err), | ||
| ) | ||
| return | ||
| } | ||
|
|
||
| providerID := undeployedEvent.Payload.ProviderID | ||
| if providerID == "" { | ||
| c.logger.Error("Provider ID is empty in LLM provider undeployment event") | ||
| return | ||
| } | ||
|
|
||
| if c.llmDeploymentService == nil { | ||
| c.logger.Error("LLM deployment service not available", | ||
| slog.String("provider_id", providerID), | ||
| slog.String("correlation_id", undeployedEvent.CorrelationID), | ||
| ) | ||
| return | ||
| } | ||
|
|
||
| _, err = c.llmDeploymentService.DeleteLLMProvider(providerID, undeployedEvent.CorrelationID, c.logger) | ||
| if err != nil { | ||
| c.logger.Error("Failed to delete LLM provider configuration", | ||
| slog.String("provider_id", providerID), | ||
| slog.Any("error", err), | ||
| ) | ||
| return | ||
| } | ||
|
|
||
| c.logger.Info("Successfully processed LLM provider undeployment event", | ||
| slog.String("provider_id", providerID), | ||
| slog.String("correlation_id", undeployedEvent.CorrelationID), | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the LLM deployment service and inspect DeleteLLMProvider and DeleteLLMProxy implementations
find . -type f -name "*.go" | xargs grep -l "DeleteLLMProvider\|DeleteLLMProxy" | head -20Repository: wso2/api-platform
Length of output: 798
🏁 Script executed:
#!/bin/bash
# Search for DeleteLLMProvider function definition
rg "func.*DeleteLLMProvider" -A 30 --type goRepository: wso2/api-platform
Length of output: 50373
🏁 Script executed:
#!/bin/bash
# Search for DeleteLLMProxy function definition
rg "func.*DeleteLLMProxy" -A 30 --type goRepository: wso2/api-platform
Length of output: 21443
🏁 Script executed:
#!/bin/bash
# Look at the complete event handler to understand the intent
sed -n '900,970p' gateway/gateway-controller/pkg/controlplane/client.goRepository: wso2/api-platform
Length of output: 2302
🏁 Script executed:
#!/bin/bash
# Check the event type/structure and how undeployment events are named
rg "LLMProviderUndeployedEvent|LLMProxyUndeployedEvent" -B 2 -A 10 --type go | head -60Repository: wso2/api-platform
Length of output: 5090
🏁 Script executed:
#!/bin/bash
# Look at the second handler mentioned (lines 1044-1096)
sed -n '1044,1096p' gateway/gateway-controller/pkg/controlplane/client.goRepository: wso2/api-platform
Length of output: 1638
🏁 Script executed:
#!/bin/bash
# Look at API undeployment handling to compare (mentioned at line 774)
sed -n '760,800p' gateway/gateway-controller/pkg/controlplane/client.goRepository: wso2/api-platform
Length of output: 1242
🏁 Script executed:
#!/bin/bash
# Search for StatusUndeployed usage to understand the API soft undeployment pattern
rg "StatusUndeployed" -B 3 -A 3 --type go | head -80Repository: wso2/api-platform
Length of output: 6384
🏁 Script executed:
#!/bin/bash
# Check if there's a soft undeployment mechanism for LLM (like UpdateConfig pattern for APIs)
rg "UpdateLLMProvider|UndeployLLMProvider|StatusUndeployed.*LLM" -B 2 -A 5 --type goRepository: wso2/api-platform
Length of output: 37975
🏁 Script executed:
#!/bin/bash
# Look at how LLM provider deployments are managed to understand lifecycle
rg "DeployLLMProvider" -B 5 -A 15 --type go | grep -A 20 "func.*DeployLLMProvider"Repository: wso2/api-platform
Length of output: 7577
LLM undeployment events should preserve configuration for potential redeployment, consistent with the API undeployment pattern.
The handleLLMProviderUndeployedEvent and handleLLMProxyUndeployedEvent handlers call DeleteLLMProvider and DeleteLLMProxy, which permanently remove configurations from the database and memory store. This hard-deletion approach contradicts the soft-undeployment pattern used for APIs (which sets StatusUndeployed and preserves config, keys, and policies for potential redeploy).
The event type names (llmprovider.undeployed) and the existence of UndeployLLMProviderDeployment in platform-api (which preserves deployment records) indicate that undeployment should be a soft removal. Instead, implement soft undeployment for LLM entities: mark as undeployed (similar to StatusUndeployed for APIs) rather than deleting, and preserve the configuration for redeploy scenarios. Reserve hard deletion for explicit DELETE operations initiated by users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deployments will be stored in the control plane and in case a previous revision is restored, the gateway can fetch the relevant artifact from the CP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Arshardh, thank you for clarifying! I understand now—LLM deployments follow a control-plane-managed lifecycle where the gateway doesn't need to preserve local state for redeployment scenarios. Unlike APIs, which use soft undeployment with StatusUndeployed to maintain configuration locally for quick redeployment, LLM providers and proxies can be safely deleted from the gateway since the control plane maintains the deployment history and can provide the artifact when a previous revision is restored.
This architectural difference makes sense and the hard deletion approach in handleLLMProviderUndeployedEvent and handleLLMProxyUndeployedEvent is appropriate for this model.
✏️ Learnings added
Learnt from: Arshardh
Repo: wso2/api-platform PR: 1093
File: gateway/gateway-controller/pkg/controlplane/client.go:907-959
Timestamp: 2026-02-11T20:09:30.046Z
Learning: In the gateway-controller (file: gateway/gateway-controller/pkg/controlplane/client.go), LLM provider and LLM proxy undeployment events intentionally perform hard deletion (via DeleteLLMProvider/DeleteLLMProxy) rather than soft undeployment. Unlike API undeployment which preserves local state with StatusUndeployed for quick redeployment, LLM deployments follow a control-plane-managed lifecycle where the control plane stores deployment history and provides artifacts when previous revisions are restored. The gateway does not need to preserve local LLM configuration for redeployment scenarios.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: dushaniw
Repo: wso2/api-platform PR: 997
File: gateway/gateway-controller/pkg/controlplane/client.go:688-795
Timestamp: 2026-02-09T11:15:41.017Z
Learning: In gateway-controller (file: gateway/gateway-controller/pkg/controlplane/client.go), during API undeployment events, policies should NOT be removed from the policy engine. Policies (along with API keys and config) are intentionally preserved for potential redeployment. Only DELETE operations should remove policies from the policy engine. Undeployment is a soft removal that maintains state, while deletion is a hard cleanup.
Learnt from: dushaniw
Repo: wso2/api-platform PR: 671
File: platform-api/src/internal/service/deployment.go:0-0
Timestamp: 2026-01-25T17:49:39.364Z
Learning: In the platform-api deployment service (file: platform-api/src/internal/service/deployment.go), the DeployAPI method does not broadcast undeployment events when deploying a new API version to a gateway. The gateway is designed to intelligently handle the replacement of existing deployments when it receives a new deployment event for the same API, eliminating the need for explicit undeployment notifications from the platform-api service during the deployment flow.
Learnt from: Arshardh
Repo: wso2/api-platform PR: 841
File: platform-api/src/internal/database/schema.postgres.sql:345-423
Timestamp: 2026-02-01T19:10:55.797Z
Learning: In the platform-api repository, orphaned LLM policies resulting from deletion of llm_providers or llm_proxies are intentionally handled at the application layer rather than through database-level foreign key cascades. The llm_policies table references only organization_uuid to maintain schema flexibility.
Learnt from: O-sura
Repo: wso2/api-platform PR: 665
File: gateway/policy-engine/internal/analytics/analytics.go:277-277
Timestamp: 2026-01-13T15:42:24.281Z
Learning: In gateway/policy-engine/internal/analytics/analytics.go, when processing AI-related metadata for LLM providers, the aiMetadata.VendorVersion field should be populated from APIVersionKey (the API's version) rather than AIProviderAPIVersionMetadataKey (the AI provider's API version). This is intentional based on how LLM providers are handled in the system.
Learnt from: thivindu
Repo: wso2/api-platform PR: 110
File: platform-api/src/internal/service/api.go:432-476
Timestamp: 2025-11-08T13:06:22.133Z
Learning: In the platform-api Go codebase (file: platform-api/src/internal/service/api.go), the DeployAPIRevision method is designed to automatically create API-gateway associations during deployment if they don't already exist. There is no requirement that associations must pre-exist before deployment; the system checks for existing associations and creates them on-the-fly as needed.
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.
Summary by CodeRabbit
Release Notes