Skip to content

Conversation

@hanzjk
Copy link
Contributor

@hanzjk hanzjk commented Dec 24, 2025

Purpose

This PR is to fix #121 & #120

  1. Add config hash annotation to the k8s deployment resource to rollout restart of the deployment when the configs change.
  2. Initial component creation intermittently fails in quick start guide. #120 occurs when
    • Component is created
    • Component is fetched from Kubernetes
    • OTEL trait is added to the component's spec
    • Update is attempted
    • But between step 2 and 4, the component might have been modified (possibly by a Kubernetes controller or admission webhook), causing the conflict

To fix this, updated that whole read-write-modify operation to execute in a retry block, retrying on such conflicts.

Goals

Describe the solutions that this feature/fix will introduce to resolve the problems described above

Approach

Describe how you are implementing the solutions. Include an animated GIF or screenshot if the change affects the UI (email [email protected] to review all UI text). Include a link to a Markdown file or Google doc if the feature write-up is too long to paste here.

User stories

Summary of user stories addressed by this change>

Release note

Brief description of the new feature or bug fix as it will appear in the release notes

Documentation

Link(s) to product documentation that addresses the changes of this PR. If no doc impact, enter �N/A� plus brief explanation of why there�s no doc impact

Training

Link to the PR for changes to the training content in https://github.com/wso2/WSO2-Training, if applicable

Certification

Type â��Sentâ�� when you have provided new/updated certification questions, plus four answers for each question (correct answer highlighted in bold), based on this change. Certification questions/answers should be sent to [email protected] and NOT pasted in this PR. If there is no impact on certification exams, type â��N/Aâ�� and explain why.

Marketing

Link to drafts of marketing content that will describe and promote this feature, including product page changes, technical articles, blog posts, videos, etc., if applicable

Automation tests

  • Unit tests

    Code coverage information

  • Integration tests

    Details about the test cases and coverage

Security checks

Samples

Provide high-level details about the samples related to this feature

Related PRs

List any other related PRs

Migrations (if applicable)

Describe migration steps and platforms on which migration has been tested

Test environment

List all JDK versions, operating systems, databases, and browser/versions on which this feature/fix was tested

Learning

Describe the research phase and any blog posts, patterns, libraries, or add-ons you used to solve the problem.

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of concurrent resource updates so operations automatically retry on conflicts, reducing update failures.
  • Improvements

    • Enhanced detection of environment configuration changes by adding a computed config-hash annotation to pods, ensuring config updates trigger rolling changes.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

Refactors two client methods to perform get-validate-mutate-update inside a retry loop (now treating Kubernetes Conflict as retryable) and adds a conditional pod template openchoreo.dev/config-hash annotation derived from container env vars or "no-env".

Changes

Cohort / File(s) Summary
Kubernetes Concurrency Control
agent-manager-service/clients/openchoreosvc/client.go, agent-manager-service/clients/openchoreosvc/retry.go
AttachComponentTrait and DeployAgentComponent now fetch the latest resource, validate/mutate, and update inside a retry loop to handle optimistic concurrency. isRetryableK8sError treats 409 Conflict as retryable.
Deployment Configuration Tracking
deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/agent-api.yaml
Added metadata.annotations["openchoreo.dev/config-hash"] on pod template, computed from container env vars or set to "no-env" when none exist.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Retry as Retry Loop
    participant K8s as Kubernetes API

    Client->>Retry: Request update (Attach/Deploy)
    rect rgb(235,245,235)
    Note over Retry,K8s: Retry attempt begins
    Retry->>K8s: GET resource (latest)
    K8s-->>Retry: Resource (resourceVersion)
    Retry->>Retry: Validate & mutate resource
    Retry->>K8s: UPDATE resource (with version)
    end

    alt 409 Conflict
        K8s-->>Retry: 409 Conflict
        Retry->>Retry: Backoff & retry (fetch latest)
        Retry->>K8s: GET resource (latest)
        K8s-->>Retry: Resource (new version)
        Retry->>Retry: Re-validate & mutate
        Retry->>K8s: UPDATE resource
    end

    rect rgb(230,240,255)
    Note over K8s,Client: Success
    K8s-->>Retry: 200 OK
    Retry-->>Client: Success
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hop, I fetch, then tweak with care,
When versions clash, I retry the fair.
A hash for env keeps changes in sight,
Conflicts bowed down by patient delight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description partially fills the template, explaining the purpose and linking to issues, but leaves most sections (Goals, Approach, User stories, etc.) as template placeholders without substantive content. Complete the remaining template sections with details about goals, approach, test coverage, documentation impact, and other required metadata to meet the repository standards.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main changes: fixing deployment configuration updates and trait update issues, which aligns with the primary objectives.
Linked Issues check ✅ Passed The PR successfully addresses issue #121 by adding a config-hash annotation to trigger deployment rollouts on configuration changes, fulfilling the requirement to propagate config updates.
Out of Scope Changes check ✅ Passed All changes are directly related to the linked objectives: deployment annotation for config updates, retry logic for trait attachment conflicts, and duplicate trait prevention. No out-of-scope modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e796a9 and 2812af4.

📒 Files selected for processing (1)
  • agent-manager-service/clients/openchoreosvc/client.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-18T16:41:56.601Z
Learnt from: hanzjk
Repo: wso2/ai-agent-management-platform PR: 70
File: agent-manager-service/services/agent_manager.go:978-980
Timestamp: 2025-12-18T16:41:56.601Z
Learning: In the agent-manager-service, enforce that external agents cannot have a SubType; only internal/managed agents may have SubType (e.g., chat-api, custom-api). When converting external agents to AgentResponse, the Type payload should include only the Type field and should not expose SubType. Implement this in the conversion/serialization path (likely in services/agent_manager.go) and update related DTO mappings or serializers. Add unit tests to verify internal vs external agent handling and ensure SubType is omitted for external agents.

Applied to files:

  • agent-manager-service/clients/openchoreosvc/client.go
🧬 Code graph analysis (1)
agent-manager-service/clients/openchoreosvc/client.go (2)
agent-manager-service/utils/errors.go (1)
  • ErrAgentNotFound (24-24)
agent-manager-service/clients/openchoreosvc/types.go (1)
  • TraitTypeOTELInstrumentation (31-31)
🔇 Additional comments (3)
agent-manager-service/clients/openchoreosvc/client.go (3)

246-288: Excellent: Retry pattern properly implements conflict resolution.

The refactored get-modify-update cycle now correctly handles concurrent modifications by fetching the latest Component on each retry attempt and using its resourceVersion for the update. The trait existence check at lines 262-268 addresses the previous critical review comment about trait duplication.


536-557: LGTM: Workload update properly wrapped in retry loop.

The refactored workload update correctly fetches the latest resource on each retry attempt (via getComponentWorkload at line 539) and uses its resourceVersion for the update. This pattern matches the trait attachment approach and properly handles concurrent modifications to the Workload resource.


262-268: The code is correct as written.

The constant TraitTypeOTELInstrumentation is named with "Type" but the createOTELInstrumentationTrait function explicitly sets the Name field (line 247 in utils.go), and the check at line 265 correctly compares against trait.Name. No change needed.

Likely an incorrect or invalid review comment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 228334b and 8e796a9.

📒 Files selected for processing (3)
  • agent-manager-service/clients/openchoreosvc/client.go
  • agent-manager-service/clients/openchoreosvc/retry.go
  • deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/agent-api.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-18T16:41:56.601Z
Learnt from: hanzjk
Repo: wso2/ai-agent-management-platform PR: 70
File: agent-manager-service/services/agent_manager.go:978-980
Timestamp: 2025-12-18T16:41:56.601Z
Learning: In the agent-manager-service, enforce that external agents cannot have a SubType; only internal/managed agents may have SubType (e.g., chat-api, custom-api). When converting external agents to AgentResponse, the Type payload should include only the Type field and should not expose SubType. Implement this in the conversion/serialization path (likely in services/agent_manager.go) and update related DTO mappings or serializers. Add unit tests to verify internal vs external agent handling and ensure SubType is omitted for external agents.

Applied to files:

  • agent-manager-service/clients/openchoreosvc/retry.go
  • agent-manager-service/clients/openchoreosvc/client.go
🔇 Additional comments (2)
agent-manager-service/clients/openchoreosvc/retry.go (1)

142-145: LGTM - Correct handling of optimistic concurrency conflicts.

Adding IsConflict to retryable errors enables the standard Kubernetes pattern for handling concurrent updates. When a resource is modified between fetch and update operations, the API returns a Conflict error (HTTP 409). Retrying with a fresh fetch ensures the update is applied to the latest resource version.

agent-manager-service/clients/openchoreosvc/client.go (1)

528-542: The retry logic is safe: updateWorkloadSpec directly assigns (rather than appends to) the Containers map and builds environment variables from scratch on each call using only request parameters, making it idempotent.

Comment on lines +57 to +59
annotations:
openchoreo.dev/config-hash: |
${has(configurations[parameters.containerName].configs.envs) && configurations[parameters.containerName].configs.envs.size() > 0 ? oc_hash(configurations[parameters.containerName].configs.envs.map(e, e.name + "=" + e.value).join(",")) : oc_hash("no-env")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Config hash should include all configuration sources.

The hash currently only includes environment variables from configs.envs. Changes to the following will NOT trigger a deployment rollout:

  • secrets.envs (line 87-92)
  • configs.files (line 94-107)
  • secrets.files (line 101-106)

This means users updating secrets or file-based configurations won't see their changes reflected in the deployment without a manual restart, which doesn't fully address issue #121.

🔎 Suggested fix to include all configuration sources
               annotations:
-                openchoreo.dev/config-hash: |
-                  ${has(configurations[parameters.containerName].configs.envs) && configurations[parameters.containerName].configs.envs.size() > 0 ? oc_hash(configurations[parameters.containerName].configs.envs.map(e, e.name + "=" + e.value).join(",")) : oc_hash("no-env")}
+                openchoreo.dev/config-hash: |
+                  ${oc_hash(
+                    (has(configurations[parameters.containerName].configs.envs) && configurations[parameters.containerName].configs.envs.size() > 0 ? 
+                      configurations[parameters.containerName].configs.envs.map(e, e.name + "=" + e.value).join(",") : "") + "|" +
+                    (has(configurations[parameters.containerName].secrets.envs) && configurations[parameters.containerName].secrets.envs.size() > 0 ? 
+                      configurations[parameters.containerName].secrets.envs.map(e, e.name).join(",") : "") + "|" +
+                    (has(configurations[parameters.containerName].configs.files) && configurations[parameters.containerName].configs.files.size() > 0 ? 
+                      configurations[parameters.containerName].configs.files.map(f, f.name + "@" + f.mountPath).join(",") : "") + "|" +
+                    (has(configurations[parameters.containerName].secrets.files) && configurations[parameters.containerName].secrets.files.size() > 0 ? 
+                      configurations[parameters.containerName].secrets.files.map(f, f.name + "@" + f.mountPath).join(",") : "")
+                  )}

Note: For secrets, include only the names/keys in the hash, not the values, as the values are sensitive. The goal is to detect when the set of secrets changes, which will trigger a rollout.

🤖 Prompt for AI Agents
In
deployments/helm-charts/wso2-amp-platform-resources-extension/templates/component-types/agent-api.yaml
around lines 57-59, the config-hash only includes configs.envs so changes to
secrets.envs, configs.files and secrets.files won't trigger rollouts; update the
expression to combine and hash all four sources (configs.envs values,
secrets.envs names/keys only, configs.files filenames/keys, and secrets.files
filenames/keys) by mapping each collection to a stable "key=value" or "name"
string and joining them (falling back to a fixed sentinel like "no-config" when
a collection is empty) before calling oc_hash so any change to those sources
updates the annotation and triggers a rollout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configuration updates aren't reflected in deployment

1 participant