feat: Add API endpoint for RLA task query by kunzhao-nv · Pull Request #252 · NVIDIA/ncx-infra-controller-rest

kunzhao-nv · 2026-03-13T03:09:53Z

Description

Add a new REST endpoint GET /v2/org/{org}/carbide/rack/task/{uuid} to retrieve task status by UUID. This enables callers to poll task progress after submitting async operations.

Type of Change

Feature - New feature or functionality (feat:)
Fix - Bug fixes (fix:)
Chore - Modification or removal of existing functionality (chore:)
Refactor - Refactoring of existing functionality (refactor:)
Docs - Changes in documentation or OpenAPI schema (docs:)
CI - Changes in Github workflows. Requires additional scrutiny (ci:)
Version - Issuing a new release version (version:)

Services Affected

API - API models or endpoints updated
Workflow - Workflow service updated
DB - DB DAOs or migrations updated
Site Manager - Site Manager updated
Cert Manager - Cert Manager updated
Site Agent - Site Agent updated
RLA - RLA service updated
Powershelf Manager - Powershelf Manager updated
NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

This PR contains breaking changes

Testing

Unit tests added/updated
Integration tests added/updated
Manual testing performed
No testing required (docs, internal refactor, etc.)

Additional Notes

Summary by CodeRabbit

New Features
- Introduced a new API endpoint to retrieve rack tasks by UUID, including task status, description, start/end times, and metadata.
- Added SDK support for accessing the new task retrieval capability with comprehensive error handling for missing or unauthorized access scenarios.

copy-pr-bot · 2026-03-13T03:09:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-13T03:10:11Z

📝 Walkthrough

Walkthrough

A new API endpoint is introduced to retrieve a Rack Task by UUID. The changes span from the API handler layer through Temporal workflows and site-agent integration, including handler logic with authorization validation, protobuf-to-API model conversion, OpenAPI documentation, SDK bindings, and underlying activity-workflow implementation for asynchronous task retrieval.

Changes

Cohort / File(s)	Summary
API Handler and Routing `api/pkg/api/handler/task.go`, `api/pkg/api/handler/task_test.go`, `api/pkg/api/routes.go`, `api/pkg/api/routes_test.go`	New GetTaskHandler with request validation, authorization checks (user membership, Provider Admin role, RLA enforcement), and Temporal workflow invocation. Comprehensive test coverage for success and failure paths (bad request, forbidden, precondition failed, not found). Route registration and route count test update.
API Models `api/pkg/api/model/task.go`, `api/pkg/api/model/task_test.go`	APITask with protobuf-to-API conversion (FromProto, NewAPITask constructor), ProtoToAPITaskStatusName mapping for status translation. APIGetTaskRequest with SiteID validation. Unit tests for conversions and request validation.
OpenAPI Specification `openapi/spec.yaml`	New RackTask schema with fields (id, status, description, startTime, endTime, message, metadata) and GET endpoint `/v2/org/{org}/carbide/rack/task/{uuid}` with 200, 403, and 404 response definitions.
SDK Support `sdk/standard/api_rack.go`, `sdk/standard/model_rack_task.go`	New ApiGetRackTaskRequest with SiteId setter and Execute method on RackAPIService. RackTask model with getters, setters, serialization (MarshalJSON, ToMap), and nullable variant support.
Workflow and Activity Layer `site-workflow/pkg/workflow/rack.go`, `site-workflow/pkg/workflow/rack_test.go`, `site-workflow/pkg/activity/rack.go`, `site-workflow/pkg/activity/rack_test.go`	New GetTaskByID workflow function with retry policy and activity invocation. Corresponding GetTaskByID activity method that validates request, invokes RLA client, and handles errors. Test suites covering success, empty result, and failure scenarios.
Site Agent Integration `site-agent/pkg/components/managers/rla/subscriber.go`	Registration of GetTaskByID workflow and activity for the Rack side in the subscriber initialization.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Handler as API Handler<br/>(task.go)
    participant Auth as Authorization<br/>Validator
    participant Temporal as Temporal Client
    participant Workflow as GetTaskByID<br/>Workflow
    participant Activity as GetTaskByID<br/>Activity
    participant RLA as RLA API

    Client->>Handler: GET /v2/org/{org}/carbide/rack/task/{uuid}?siteId={siteId}
    Handler->>Handler: Validate request params
    Handler->>Auth: Check user membership<br/>& Provider Admin role
    Auth-->>Handler: Authorization result
    Handler->>Handler: Resolve Infrastructure Provider<br/>& validate Site ownership
    Handler->>Handler: Enforce Rack Level<br/>Administration (RLA)
    Handler->>Temporal: GetRackTask(org, uuid, siteId)
    Temporal->>Workflow: Execute GetTaskByID workflow
    Workflow->>Activity: Invoke GetTaskByID activity<br/>with TaskIds
    Activity->>RLA: Call GetTasksByIDs API
    RLA-->>Activity: Return tasks
    Activity-->>Workflow: Return GetTasksByIDsResponse
    Workflow-->>Temporal: Workflow complete
    Temporal-->>Handler: Task result
    Handler->>Handler: Convert protobuf<br/>to APITask
    Handler-->>Client: HTTP 200 + APITask

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A task retrieves with queried grace,
Through handlers, workflows, cyberspace!
From Temporal's dance to API's song,
The Rack Task endpoint hops along! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.89% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a new API endpoint for querying Rack-Level Administration (RLA) tasks by UUID, which is the primary objective of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/rla/task-api

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

Copilot

Pull request overview

Adds a new Rack-level REST endpoint to fetch/poll an RLA task by UUID, wiring API → Temporal workflow → RLA activity, and updating the OpenAPI + generated SDK accordingly.

Changes:

Add GetTaskByID Temporal workflow/activity (and register them in the site-agent worker).
Add API handler + API model for GET /v2/org/{org}/carbide/rack/task/{uuid} with unit tests.
Update openapi/spec.yaml and generated sdk/standard client to expose the new endpoint/model.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
site-workflow/pkg/workflow/rack.go	Adds `GetTaskByID` workflow calling the new activity.
site-workflow/pkg/workflow/rack_test.go	Adds workflow test suite coverage for `GetTaskByID`.
site-workflow/pkg/activity/rack.go	Adds `ManageRack.GetTaskByID` activity calling RLA `GetTasksByIDs`.
site-workflow/pkg/activity/rack_test.go	Adds unit tests for `ManageRack.GetTaskByID` validation/success paths.
site-agent/pkg/components/managers/rla/subscriber.go	Registers the new workflow/activity with the Temporal worker.
api/pkg/api/routes.go	Registers the new GET route for `/rack/task/:uuid`.
api/pkg/api/routes_test.go	Updates expected rack route count to include the new endpoint.
api/pkg/api/handler/task.go	Implements `GetTaskHandler` (authZ + site validation + workflow execution).
api/pkg/api/handler/task_test.go	Adds handler tests for success and major failure modes.
api/pkg/api/model/task.go	Introduces `APITask` and query model `APIGetTaskRequest`.
api/pkg/api/model/task_test.go	Adds tests for proto→API mapping and request validation.
openapi/spec.yaml	Adds the new path and `RackTask` schema.
sdk/standard/api_rack.go	Adds generated SDK method `GetRackTask` for the new endpoint.
sdk/standard/model_rack_task.go	Adds generated SDK model `RackTask`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

api/pkg/api/model/task.go

+	StartTime   string            `json:"startTime"`
+	EndTime     string            `json:"endTime"`
+	Message     string            `json:"message"`
+	Metadata    map[string]string `json:"metadata"`


api/pkg/api/handler/task.go

+	taskUUID := c.Param("uuid")
+


site-workflow/pkg/workflow/rack.go


+// GetTaskByID is a workflow to get a task by its UUID
+func GetTaskByID(ctx workflow.Context, request *rlav1.GetTasksByIDsRequest) (*rlav1.GetTasksByIDsResponse, error) {
+	logger := log.With().Str("Workflow", "Task").Str("Action", "GetByID").Logger()


coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (2)

api/pkg/api/routes_test.go (1)
83-83: Add an explicit assertion for the new rack task route.

Updating the count is good, but this test can still pass if the specific path/method is wrong while total route count remains unchanged. Consider asserting GET /org/:orgName/{apiName}/rack/task/:uuid exists.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/routes_test.go` at line 83, The test only updates the total route
count but doesn't verify the specific new route; update the test in
routes_test.go (the test that asserts total routes) to also assert that a GET
route for "GET /org/:orgName/{apiName}/rack/task/:uuid" is registered: locate
where routes or router.Routes() are inspected (e.g., the existing routes
slice/map in the test) and add an assertion that a route with Method "GET" and
Path "/org/:orgName/{apiName}/rack/task/:uuid" exists (use the same
helper/assertion style as other route existence checks in the file).
api/pkg/api/handler/task_test.go (1)
138-146: This case is “site not found”, not “invalid siteId”.

uuid.New().String() is syntactically valid, so this only covers the nonexistent-site branch. The malformed-ID path in api/pkg/api/handler/task.go Lines 125-127 remains untested; please add a separate case such as siteId=not-a-uuid, or split this into two cases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/task_test.go` around lines 138 - 146, Rename the existing
test case in task_test.go (the one using uuid.New().String()) to reflect "site
not found" and add a new separate test case for the malformed-ID branch (e.g.,
name "failure - malformed siteId") that sets queryParams["siteId"] =
"not-a-uuid" so the handler's malformed-ID path (in api/pkg/api/handler/task.go
around the logic that parses siteId at lines ~125-127) is exercised; both cases
should expect http.StatusBadRequest.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/pkg/api/handler/task.go`:
- Around line 159-165: The current StartWorkflowOptions builds a shared workflow
ID (fmt.Sprintf("task-get-%s", taskUUID)) with WorkflowIDConflictPolicy set to
WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING, but the code later unconditionally
terminates that execution on timeout; change the workflow start to use a unique
per-request ID (e.g., include a request-specific nonce/UUID:
fmt.Sprintf("task-get-%s-%s", taskUUID, requestUUID)) so each GET poll starts
its own execution, and remove the unconditional termination call (the
TerminateWorkflow/TerminateWorkflowExecution path) on timeout for shared
executions; ensure StartWorkflowOptions and WorkflowIDConflictPolicy references
are updated accordingly and the timeout branch returns the timeout error without
killing other pollers.
- Around line 121-157: Validate the incoming task UUID string (assigned to
taskUUID via c.Param("uuid")) before forwarding it to RLA/Temporal: use
github.com/google/uuid to parse/validate the value and if parsing fails return a
400 Bad Request via cutil.NewAPIErrorResponse rather than continuing; perform
this check prior to constructing rlav1.GetTasksByIDsRequest and before calling
gth.scp.GetClientByID so malformed UUIDs are rejected at the API boundary, and
add the google/uuid import to the file.

In `@openapi/spec.yaml`:
- Around line 16088-16091: The OpenAPI spec has a mismatch: the endTime
description mentions "canceled" but the related status enum (property name
"status") does not include that value; reconcile them by either adding
"canceled" to the status enum or removing "canceled" from the endTime
description so both reflect the same allowed terminal states—update the "status"
enum definition (symbol: status) or the "endTime" description (symbol: endTime)
accordingly and ensure any schema/enum references are adjusted to stay
consistent.
- Around line 16063-16104: The OpenAPI change added/updated the RackTask schema
but the generated SDK/docs artifacts were not updated; run the repository's
OpenAPI generation flow to regenerate the SDK and docs (updating the
sdk/standard artifacts and the docs index output such as docs/index.html),
verify the generated outputs reflect the RackTask schema (id, status,
description, startTime, endTime, message, metadata), and commit the resulting
generated files alongside the spec change.
- Around line 9595-9644: The path '/v2/org/{org}/carbide/rack/task/{uuid}' is
ambiguous with existing templated rack routes (e.g.
'/v2/org/{org}/carbide/rack/{id}/validation'); rename the route to be
structurally unique (for example '/v2/org/{org}/carbide/rack/tasks/{uuid}' or
'/v2/org/{org}/carbide/rack/{rackId}/task/{taskUuid}'), update the path
parameter names (e.g. change parameter name from uuid to taskUuid and/or add
rackId if using the second form), and update the operationId 'get-rack-task' and
any references to the RackTask schema or siteId query param to match the new
route so routing and OpenAPI linting no longer produce ambiguous matches.

In `@sdk/standard/api_rack.go`:
- Around line 924-1063: The SDK was generated but not fully refreshed for the
new rack task endpoint; run the generator and commit the diffs so client/spec
stay in sync. Re-run the project SDK generation (make generate-sdk or your
repo's generator) and ensure updated artifacts for ApiGetRackTaskRequest,
GetRackTask and GetRackTaskExecute (and any new RackTask/CarbideAPIError models)
are included in sdk/standard, then add/commit the resulting changes so CI no
longer reports drift.

In `@sdk/standard/model_rack_task.go`:
- Around line 260-263: GetMetadataOk currently returns an empty map when
metadata is unset which contradicts its contract; modify RackTask.GetMetadataOk
so that when o == nil or IsNil(o.Metadata) it returns nil, false instead of
map[string]string{}, false; locate the method named GetMetadataOk on type
RackTask and replace the empty-map return with a nil map to preserve the unset
vs empty distinction.

In `@site-workflow/pkg/activity/rack.go`:
- Around line 298-302: Guard against a nil RlaAtomicClient before calling
GetClient: check that mr.RlaAtomicClient is not nil (and optionally that mr
itself is not nil) and return cClient.ErrClientNotConnected if it is; then call
mr.RlaAtomicClient.GetClient(), keep the existing nil check on rlaClient, and
only then call rlaClient.Rla(). This prevents dereferencing a nil
ManageRack.RlaAtomicClient and avoids the panic.

---

Nitpick comments:
In `@api/pkg/api/handler/task_test.go`:
- Around line 138-146: Rename the existing test case in task_test.go (the one
using uuid.New().String()) to reflect "site not found" and add a new separate
test case for the malformed-ID branch (e.g., name "failure - malformed siteId")
that sets queryParams["siteId"] = "not-a-uuid" so the handler's malformed-ID
path (in api/pkg/api/handler/task.go around the logic that parses siteId at
lines ~125-127) is exercised; both cases should expect http.StatusBadRequest.

In `@api/pkg/api/routes_test.go`:
- Line 83: The test only updates the total route count but doesn't verify the
specific new route; update the test in routes_test.go (the test that asserts
total routes) to also assert that a GET route for "GET
/org/:orgName/{apiName}/rack/task/:uuid" is registered: locate where routes or
router.Routes() are inspected (e.g., the existing routes slice/map in the test)
and add an assertion that a route with Method "GET" and Path
"/org/:orgName/{apiName}/rack/task/:uuid" exists (use the same helper/assertion
style as other route existence checks in the file).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d3bb7dfe-8246-4172-b538-a6f67dad97f7

📥 Commits

Reviewing files that changed from the base of the PR and between 0fffc26 and 1307faa.

📒 Files selected for processing (15)

api/pkg/api/handler/task.go
api/pkg/api/handler/task_test.go
api/pkg/api/model/task.go
api/pkg/api/model/task_test.go
api/pkg/api/routes.go
api/pkg/api/routes_test.go
docs/index.html
openapi/spec.yaml
sdk/standard/api_rack.go
sdk/standard/model_rack_task.go
site-agent/pkg/components/managers/rla/subscriber.go
site-workflow/pkg/activity/rack.go
site-workflow/pkg/activity/rack_test.go
site-workflow/pkg/workflow/rack.go
site-workflow/pkg/workflow/rack_test.go

coderabbitai · 2026-03-13T03:18:41Z

api/pkg/api/handler/task.go

+	taskUUID := c.Param("uuid")
+
+	site, err := common.GetSiteFromIDString(ctx, nil, apiRequest.SiteID, gth.dbSession)
+	if err != nil {
+		if errors.Is(err, common.ErrInvalidID) {
+			return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Failed to validate Site specified in request: invalid ID", nil)
+		}
+		if errors.Is(err, cdb.ErrDoesNotExist) {
+			return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Site specified in request does not exist", nil)
+		}
+		logger.Error().Err(err).Msg("error retrieving Site from DB")
+		return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to retrieve Site specified in request due to DB error", nil)
+	}
+
+	if site.InfrastructureProviderID != infrastructureProvider.ID {
+		return cutil.NewAPIErrorResponse(c, http.StatusForbidden, "Site specified in request doesn't belong to current org's Provider", nil)
+	}
+
+	siteConfig := &cdbm.SiteConfig{}
+	if site.Config != nil {
+		siteConfig = site.Config
+	}
+
+	if !siteConfig.RackLevelAdministration {
+		logger.Warn().Msg("site does not have Rack Level Administration enabled")
+		return cutil.NewAPIErrorResponse(c, http.StatusPreconditionFailed, "Site does not have Rack Level Administration enabled", nil)
+	}
+
+	stc, err := gth.scp.GetClientByID(site.ID)
+	if err != nil {
+		logger.Error().Err(err).Msg("failed to retrieve Temporal client for Site")
+		return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to retrieve client for Site", nil)
+	}
+
+	rlaRequest := &rlav1.GetTasksByIDsRequest{
+		TaskIds: []*rlav1.UUID{{Id: taskUUID}},
+	}


⚠️ Potential issue | 🟠 Major

Reject malformed task UUIDs before starting the workflow.

uuid is documented as a UUID, but Lines 121-157 forward any string into Temporal/RLA. That turns a client-side validation error into unnecessary Site traffic and backend-dependent error behavior instead of a deterministic 400 at the API boundary.

🛠️ Suggested fix

- taskUUID := c.Param("uuid") + taskID, err := uuid.Parse(c.Param("uuid")) + if err != nil { + return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Failed to validate Task specified in request: invalid ID", nil) + } + taskUUID := taskID.String()

Also add the corresponding github.com/google/uuid import.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/pkg/api/handler/task.go` around lines 121 - 157, Validate the incoming task UUID string (assigned to taskUUID via c.Param("uuid")) before forwarding it to RLA/Temporal: use github.com/google/uuid to parse/validate the value and if parsing fails return a 400 Bad Request via cutil.NewAPIErrorResponse rather than continuing; perform this check prior to constructing rlav1.GetTasksByIDsRequest and before calling gth.scp.GetClientByID so malformed UUIDs are rejected at the API boundary, and add the google/uuid import to the file.

coderabbitai · 2026-03-13T03:18:41Z

api/pkg/api/handler/task.go

+	workflowOptions := tClient.StartWorkflowOptions{
+		ID:                       fmt.Sprintf("task-get-%s", taskUUID),
+		WorkflowIDReusePolicy:    temporalEnums.WORKFLOW_ID_REUSE_POLICY_ALLOW_DUPLICATE,
+		WorkflowIDConflictPolicy: temporalEnums.WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING,
+		WorkflowExecutionTimeout: cutil.WorkflowExecutionTimeout,
+		TaskQueue:                queue.SiteTaskQueue,
+	}


⚠️ Potential issue | 🟠 Major

Don't terminate a shared polling workflow on one caller timeout.

Lines 159-165 reuse the same workflow execution per task with WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING, but Lines 179-181 unconditionally terminate that workflow on timeout. One slow poller can therefore cancel another in-flight poll for the same task.

For this read-only GET path, prefer either a unique workflow ID per request or returning the timeout without terminating the shared execution.

Also applies to: 179-181

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/pkg/api/handler/task.go` around lines 159 - 165, The current StartWorkflowOptions builds a shared workflow ID (fmt.Sprintf("task-get-%s", taskUUID)) with WorkflowIDConflictPolicy set to WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING, but the code later unconditionally terminates that execution on timeout; change the workflow start to use a unique per-request ID (e.g., include a request-specific nonce/UUID: fmt.Sprintf("task-get-%s-%s", taskUUID, requestUUID)) so each GET poll starts its own execution, and remove the unconditional termination call (the TerminateWorkflow/TerminateWorkflowExecution path) on timeout for shared executions; ensure StartWorkflowOptions and WorkflowIDConflictPolicy references are updated accordingly and the timeout branch returns the timeout error without killing other pollers.

coderabbitai · 2026-03-13T03:18:41Z

openapi/spec.yaml

+  '/v2/org/{org}/carbide/rack/task/{uuid}':
+    parameters:
+      - schema:
+          type: string
+        name: org
+        in: path
+        required: true
+        description: Name of the Org
+      - schema:
+          type: string
+          format: uuid
+        name: uuid
+        in: path
+        required: true
+        description: UUID of the Task
+    get:
+      summary: Retrieve a Task
+      operationId: get-rack-task
+      description: |-
+        Get a Task by UUID.
+
+        Org must have an Infrastructure Provider entity. User must have `FORGE_PROVIDER_ADMIN` authorization role.
+      parameters:
+        - schema:
+            type: string
+            format: uuid
+          name: siteId
+          in: query
+          required: true
+          description: ID of the Site
+      responses:
+        '200':
+          description: OK
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/RackTask'
+              examples:
+                example-1:
+                  value:
+                    id: 550e8400-e29b-41d4-a716-446655440000
+                    status: running
+                    description: Power on rack components
+                    message: 'Processing 3 of 5 components'
+        '403':
+          $ref: '#/components/responses/ForbiddenError'
+        '404':
+          $ref: '#/components/responses/NotFoundError'
+      tags:
+        - Rack


⚠️ Potential issue | 🟠 Major

Resolve ambiguous route template before merge

Line 9595 introduces a path that conflicts with existing templated rack routes (/v2/org/{org}/carbide/rack/{id}/validation), which makes matching ambiguous for values like task/validation. This is already flagged by OpenAPI lint and should be made structurally unique.

Suggested spec change

- '/v2/org/{org}/carbide/rack/task/{uuid}': + '/v2/org/{org}/carbide/rack/task/by-id/{uuid}':

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

'/v2/org/{org}/carbide/rack/task/{uuid}':

parameters:

- schema:

type: string

name: org

in: path

required: true

description: Name of the Org

- schema:

type: string

format: uuid

name: uuid

in: path

required: true

description: UUID of the Task

get:

summary: Retrieve a Task

operationId: get-rack-task

description: |-

Get a Task by UUID.

Org must have an Infrastructure Provider entity. User must have `FORGE_PROVIDER_ADMIN` authorization role.

parameters:

- schema:

type: string

format: uuid

name: siteId

in: query

required: true

description: ID of the Site

responses:

'200':

description: OK

content:

application/json:

schema:

$ref: '#/components/schemas/RackTask'

examples:

example-1:

value:

id: 550e8400-e29b-41d4-a716-446655440000

status: running

description: Power on rack components

message: 'Processing 3 of 5 components'

'403':

$ref: '#/components/responses/ForbiddenError'

'404':

$ref: '#/components/responses/NotFoundError'

tags:

- Rack

'/v2/org/{org}/carbide/rack/task/by-id/{uuid}':

parameters:

- schema:

type: string

name: org

in: path

required: true

description: Name of the Org

- schema:

type: string

format: uuid

name: uuid

in: path

required: true

description: UUID of the Task

get:

summary: Retrieve a Task

operationId: get-rack-task

description: |-

Get a Task by UUID.

Org must have an Infrastructure Provider entity. User must have `FORGE_PROVIDER_ADMIN` authorization role.

parameters:

- schema:

type: string

format: uuid

name: siteId

in: query

required: true

description: ID of the Site

responses:

'200':

description: OK

content:

application/json:

schema:

$ref: '#/components/schemas/RackTask'

examples:

example-1:

value:

id: 550e8400-e29b-41d4-a716-446655440000

status: running

description: Power on rack components

message: 'Processing 3 of 5 components'

'403':

$ref: '#/components/responses/ForbiddenError'

'404':

$ref: '#/components/responses/NotFoundError'

tags:

- Rack

🧰 Tools

🪛 GitHub Check: Lint and Test / Lint OpenAPI

[warning] 9595-9595: no-ambiguous-paths
Paths should resolve unambiguously. Found two ambiguous paths: /v2/org/{org}/carbide/rack/{id}/validation and /v2/org/{org}/carbide/rack/task/{uuid}.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@openapi/spec.yaml` around lines 9595 - 9644, The path '/v2/org/{org}/carbide/rack/task/{uuid}' is ambiguous with existing templated rack routes (e.g. '/v2/org/{org}/carbide/rack/{id}/validation'); rename the route to be structurally unique (for example '/v2/org/{org}/carbide/rack/tasks/{uuid}' or '/v2/org/{org}/carbide/rack/{rackId}/task/{taskUuid}'), update the path parameter names (e.g. change parameter name from uuid to taskUuid and/or add rackId if using the second form), and update the operationId 'get-rack-task' and any references to the RackTask schema or siteId query param to match the new route so routing and OpenAPI linting no longer produce ambiguous matches.

coderabbitai · 2026-03-13T03:18:42Z

openapi/spec.yaml

+    RackTask:
+      title: RackTask
+      type: object
+      description: A task representing an asynchronous operation on rack infrastructure.
+      properties:
+        id:
+          type: string
+          format: uuid
+          description: Unique identifier of the task.
+        status:
+          type: string
+          enum:
+            - unknown
+            - pending
+            - running
+            - succeeded
+            - failed
+          description: Current state of the task.
+        description:
+          type: string
+          description: Human-readable description provided when the task was created.
+        startTime:
+          type: string
+          format: date-time
+          description: Timestamp when the task started execution.
+        endTime:
+          type: string
+          format: date-time
+          description: Timestamp when the task finished (succeeded, failed or canceled).
+        message:
+          type: string
+          description: Optional status or error message describing the current state or result.
+        metadata:
+          type: object
+          additionalProperties:
+            type: string
+          description: Extensible key-value data for additional task details.
+      examples:
+        - id: 550e8400-e29b-41d4-a716-446655440000
+          status: running
+          description: Power on rack components
+          message: 'Processing 3 of 5 components'


⚠️ Potential issue | 🟠 Major

Regenerate SDK/docs artifacts required by CI

This OpenAPI change currently fails pipeline checks because generated outputs were not updated (sdk/standard/ and/or docs/index.html). Please run the repo’s OpenAPI generation flow and commit the resulting artifacts.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@openapi/spec.yaml` around lines 16063 - 16104, The OpenAPI change added/updated the RackTask schema but the generated SDK/docs artifacts were not updated; run the repository's OpenAPI generation flow to regenerate the SDK and docs (updating the sdk/standard artifacts and the docs index output such as docs/index.html), verify the generated outputs reflect the RackTask schema (id, status, description, startTime, endTime, message, metadata), and commit the resulting generated files alongside the spec change.

coderabbitai · 2026-03-13T03:18:42Z

openapi/spec.yaml

+        endTime:
+          type: string
+          format: date-time
+          description: Timestamp when the task finished (succeeded, failed or canceled).


⚠️ Potential issue | 🟡 Minor

Fix endTime description/status mismatch

Line 16091 mentions canceled, but status (Line 16074 onward) does not include it. Align the enum and description to avoid contract ambiguity for clients.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@openapi/spec.yaml` around lines 16088 - 16091, The OpenAPI spec has a mismatch: the endTime description mentions "canceled" but the related status enum (property name "status") does not include that value; reconcile them by either adding "canceled" to the status enum or removing "canceled" from the endTime description so both reflect the same allowed terminal states—update the "status" enum definition (symbol: status) or the "endTime" description (symbol: endTime) accordingly and ensure any schema/enum references are adjusted to stay consistent.

coderabbitai · 2026-03-13T03:18:42Z

sdk/standard/api_rack.go

+type ApiGetRackTaskRequest struct {
+	ctx context.Context
+	ApiService *RackAPIService
+	siteId *string
+	org string
+	uuid string
+}
+
+// ID of the Site
+func (r ApiGetRackTaskRequest) SiteId(siteId string) ApiGetRackTaskRequest {
+	r.siteId = &siteId
+	return r
+}
+
+func (r ApiGetRackTaskRequest) Execute() (*RackTask, *http.Response, error) {
+	return r.ApiService.GetRackTaskExecute(r)
+}
+
+/*
+GetRackTask Retrieve a Task
+
+Get an RLA Task by UUID.
+
+Org must have an Infrastructure Provider entity. User must have `FORGE_PROVIDER_ADMIN` authorization role.
+
+ @param ctx context.Context - for authentication, logging, cancellation, deadlines, tracing, etc. Passed from http.Request or context.Background().
+ @param org Name of the Org
+ @param uuid UUID of the Task
+ @return ApiGetRackTaskRequest
+*/
+func (a *RackAPIService) GetRackTask(ctx context.Context, org string, uuid string) ApiGetRackTaskRequest {
+	return ApiGetRackTaskRequest{
+		ApiService: a,
+		ctx: ctx,
+		org: org,
+		uuid: uuid,
+	}
+}
+
+// Execute executes the request
+//  @return RackTask
+func (a *RackAPIService) GetRackTaskExecute(r ApiGetRackTaskRequest) (*RackTask, *http.Response, error) {
+	var (
+		localVarHTTPMethod   = http.MethodGet
+		localVarPostBody     interface{}
+		formFiles            []formFile
+		localVarReturnValue  *RackTask
+	)
+
+	localBasePath, err := a.client.cfg.ServerURLWithContext(r.ctx, "RackAPIService.GetRackTask")
+	if err != nil {
+		return localVarReturnValue, nil, &GenericOpenAPIError{error: err.Error()}
+	}
+
+	localVarPath := localBasePath + "/v2/org/{org}/carbide/rack/task/{uuid}"
+	localVarPath = strings.Replace(localVarPath, "{"+"org"+"}", url.PathEscape(parameterValueToString(r.org, "org")), -1)
+	localVarPath = strings.Replace(localVarPath, "{"+"uuid"+"}", url.PathEscape(parameterValueToString(r.uuid, "uuid")), -1)
+
+	localVarHeaderParams := make(map[string]string)
+	localVarQueryParams := url.Values{}
+	localVarFormParams := url.Values{}
+	if r.siteId == nil {
+		return localVarReturnValue, nil, reportError("siteId is required and must be specified")
+	}
+
+	parameterAddToHeaderOrQuery(localVarQueryParams, "siteId", r.siteId, "form", "")
+	// to determine the Content-Type header
+	localVarHTTPContentTypes := []string{}
+
+	// set Content-Type header
+	localVarHTTPContentType := selectHeaderContentType(localVarHTTPContentTypes)
+	if localVarHTTPContentType != "" {
+		localVarHeaderParams["Content-Type"] = localVarHTTPContentType
+	}
+
+	// to determine the Accept header
+	localVarHTTPHeaderAccepts := []string{"application/json"}
+
+	// set Accept header
+	localVarHTTPHeaderAccept := selectHeaderAccept(localVarHTTPHeaderAccepts)
+	if localVarHTTPHeaderAccept != "" {
+		localVarHeaderParams["Accept"] = localVarHTTPHeaderAccept
+	}
+	req, err := a.client.prepareRequest(r.ctx, localVarPath, localVarHTTPMethod, localVarPostBody, localVarHeaderParams, localVarQueryParams, localVarFormParams, formFiles)
+	if err != nil {
+		return localVarReturnValue, nil, err
+	}
+
+	localVarHTTPResponse, err := a.client.callAPI(req)
+	if err != nil || localVarHTTPResponse == nil {
+		return localVarReturnValue, localVarHTTPResponse, err
+	}
+
+	localVarBody, err := io.ReadAll(localVarHTTPResponse.Body)
+	localVarHTTPResponse.Body.Close()
+	localVarHTTPResponse.Body = io.NopCloser(bytes.NewBuffer(localVarBody))
+	if err != nil {
+		return localVarReturnValue, localVarHTTPResponse, err
+	}
+
+	if localVarHTTPResponse.StatusCode >= 300 {
+		newErr := &GenericOpenAPIError{
+			body:  localVarBody,
+			error: localVarHTTPResponse.Status,
+		}
+		if localVarHTTPResponse.StatusCode == 403 {
+			var v CarbideAPIError
+			err = a.client.decode(&v, localVarBody, localVarHTTPResponse.Header.Get("Content-Type"))
+			if err != nil {
+				newErr.error = err.Error()
+				return localVarReturnValue, localVarHTTPResponse, newErr
+			}
+					newErr.error = formatErrorMessage(localVarHTTPResponse.Status, &v)
+					newErr.model = v
+			return localVarReturnValue, localVarHTTPResponse, newErr
+		}
+		if localVarHTTPResponse.StatusCode == 404 {
+			var v CarbideAPIError
+			err = a.client.decode(&v, localVarBody, localVarHTTPResponse.Header.Get("Content-Type"))
+			if err != nil {
+				newErr.error = err.Error()
+				return localVarReturnValue, localVarHTTPResponse, newErr
+			}
+					newErr.error = formatErrorMessage(localVarHTTPResponse.Status, &v)
+					newErr.model = v
+		}
+		return localVarReturnValue, localVarHTTPResponse, newErr
+	}
+
+	err = a.client.decode(&localVarReturnValue, localVarBody, localVarHTTPResponse.Header.Get("Content-Type"))
+	if err != nil {
+		newErr := &GenericOpenAPIError{
+			body:  localVarBody,
+			error: err.Error(),
+		}
+		return localVarReturnValue, localVarHTTPResponse, newErr
+	}
+
+	return localVarReturnValue, localVarHTTPResponse, nil
+}


⚠️ Potential issue | 🟠 Major

Regenerate SDK artifacts before merge to avoid client/spec drift.

CI is warning that sdk/standard/ is not fully updated (make generate-sdk). Please regenerate and commit any additional generated deltas so this new endpoint/model surface is fully synchronized.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/standard/api_rack.go` around lines 924 - 1063, The SDK was generated but not fully refreshed for the new rack task endpoint; run the generator and commit the diffs so client/spec stay in sync. Re-run the project SDK generation (make generate-sdk or your repo's generator) and ensure updated artifacts for ApiGetRackTaskRequest, GetRackTask and GetRackTaskExecute (and any new RackTask/CarbideAPIError models) are included in sdk/standard, then add/commit the resulting changes so CI no longer reports drift.

coderabbitai · 2026-03-13T03:18:42Z

sdk/standard/model_rack_task.go

+func (o *RackTask) GetMetadataOk() (map[string]string, bool) {
+	if o == nil || IsNil(o.Metadata) {
+		return map[string]string{}, false
+	}


⚠️ Potential issue | 🟡 Minor

Return nil for unset metadata in GetMetadataOk.

GetMetadataOk is documented to return nil when unset, but currently returns an empty map. That mismatches the method contract and makes unset-vs-empty less clear for callers.

🔧 Suggested fix

func (o *RackTask) GetMetadataOk() (map[string]string, bool) { if o == nil || IsNil(o.Metadata) { - return map[string]string{}, false + return nil, false } return o.Metadata, true }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (o *RackTask) GetMetadataOk() (map[string]string, bool) {

if o == nil || IsNil(o.Metadata) {

return map[string]string{}, false

}

func (o *RackTask) GetMetadataOk() (map[string]string, bool) {

if o == nil || IsNil(o.Metadata) {

return nil, false

}

return o.Metadata, true

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/standard/model_rack_task.go` around lines 260 - 263, GetMetadataOk currently returns an empty map when metadata is unset which contradicts its contract; modify RackTask.GetMetadataOk so that when o == nil or IsNil(o.Metadata) it returns nil, false instead of map[string]string{}, false; locate the method named GetMetadataOk on type RackTask and replace the empty-map return with a nil map to preserve the unset vs empty distinction.

coderabbitai · 2026-03-13T03:18:42Z

site-workflow/pkg/activity/rack.go

+	rlaClient := mr.RlaAtomicClient.GetClient()
+	if rlaClient == nil {
+		return nil, cClient.ErrClientNotConnected
+	}
+	rla := rlaClient.Rla()


⚠️ Potential issue | 🟠 Major

Guard mr.RlaAtomicClient before dereferencing it.

A zero-value ManageRack or a nil-injected client will panic on Line 298, so the worker crashes instead of returning ErrClientNotConnected.

🛠️ Suggested fix

+ if mr.RlaAtomicClient == nil { + return nil, cClient.ErrClientNotConnected + } rlaClient := mr.RlaAtomicClient.GetClient() if rlaClient == nil { return nil, cClient.ErrClientNotConnected }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rlaClient := mr.RlaAtomicClient.GetClient()

if rlaClient == nil {

return nil, cClient.ErrClientNotConnected

}

rla := rlaClient.Rla()

if mr.RlaAtomicClient == nil {

return nil, cClient.ErrClientNotConnected

}

rlaClient := mr.RlaAtomicClient.GetClient()

if rlaClient == nil {

return nil, cClient.ErrClientNotConnected

}

rla := rlaClient.Rla()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@site-workflow/pkg/activity/rack.go` around lines 298 - 302, Guard against a nil RlaAtomicClient before calling GetClient: check that mr.RlaAtomicClient is not nil (and optionally that mr itself is not nil) and return cClient.ErrClientNotConnected if it is; then call mr.RlaAtomicClient.GetClient(), keep the existing nil check on rlaClient, and only then call rlaClient.Rla(). This prevents dereferencing a nil ManageRack.RlaAtomicClient and avoids the panic.

task api

1307faa

Copilot AI review requested due to automatic review settings March 13, 2026 03:09

kunzhao-nv marked this pull request as draft March 13, 2026 03:09

Copilot started reviewing on behalf of kunzhao-nv March 13, 2026 03:10 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

jw-nvidia approved these changes Mar 13, 2026

View reviewed changes

Conversation

kunzhao-nv commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Services Affected

Related Issues (Optional)

Breaking Changes

Testing

Additional Notes

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kunzhao-nv commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading