Skip to content

feat: Add API endpoint for RLA task query#252

Draft
kunzhao-nv wants to merge 1 commit intomainfrom
feat/rla/task-api
Draft

feat: Add API endpoint for RLA task query#252
kunzhao-nv wants to merge 1 commit intomainfrom
feat/rla/task-api

Conversation

@kunzhao-nv
Copy link
Contributor

@kunzhao-nv kunzhao-nv commented Mar 13, 2026

Description

Add a new REST endpoint GET /v2/org/{org}/carbide/rack/task/{uuid} to retrieve task status by UUID. This enables callers to poll task progress after submitting async operations.

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in Github workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • RLA - RLA service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

Summary by CodeRabbit

  • New Features
    • Introduced a new API endpoint to retrieve rack tasks by UUID, including task status, description, start/end times, and metadata.
    • Added SDK support for accessing the new task retrieval capability with comprehensive error handling for missing or unauthorized access scenarios.

Copilot AI review requested due to automatic review settings March 13, 2026 03:09
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kunzhao-nv kunzhao-nv marked this pull request as draft March 13, 2026 03:09
@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

A new API endpoint is introduced to retrieve a Rack Task by UUID. The changes span from the API handler layer through Temporal workflows and site-agent integration, including handler logic with authorization validation, protobuf-to-API model conversion, OpenAPI documentation, SDK bindings, and underlying activity-workflow implementation for asynchronous task retrieval.

Changes

Cohort / File(s) Summary
API Handler and Routing
api/pkg/api/handler/task.go, api/pkg/api/handler/task_test.go, api/pkg/api/routes.go, api/pkg/api/routes_test.go
New GetTaskHandler with request validation, authorization checks (user membership, Provider Admin role, RLA enforcement), and Temporal workflow invocation. Comprehensive test coverage for success and failure paths (bad request, forbidden, precondition failed, not found). Route registration and route count test update.
API Models
api/pkg/api/model/task.go, api/pkg/api/model/task_test.go
APITask with protobuf-to-API conversion (FromProto, NewAPITask constructor), ProtoToAPITaskStatusName mapping for status translation. APIGetTaskRequest with SiteID validation. Unit tests for conversions and request validation.
OpenAPI Specification
openapi/spec.yaml
New RackTask schema with fields (id, status, description, startTime, endTime, message, metadata) and GET endpoint /v2/org/{org}/carbide/rack/task/{uuid} with 200, 403, and 404 response definitions.
SDK Support
sdk/standard/api_rack.go, sdk/standard/model_rack_task.go
New ApiGetRackTaskRequest with SiteId setter and Execute method on RackAPIService. RackTask model with getters, setters, serialization (MarshalJSON, ToMap), and nullable variant support.
Workflow and Activity Layer
site-workflow/pkg/workflow/rack.go, site-workflow/pkg/workflow/rack_test.go, site-workflow/pkg/activity/rack.go, site-workflow/pkg/activity/rack_test.go
New GetTaskByID workflow function with retry policy and activity invocation. Corresponding GetTaskByID activity method that validates request, invokes RLA client, and handles errors. Test suites covering success, empty result, and failure scenarios.
Site Agent Integration
site-agent/pkg/components/managers/rla/subscriber.go
Registration of GetTaskByID workflow and activity for the Rack side in the subscriber initialization.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Handler as API Handler<br/>(task.go)
    participant Auth as Authorization<br/>Validator
    participant Temporal as Temporal Client
    participant Workflow as GetTaskByID<br/>Workflow
    participant Activity as GetTaskByID<br/>Activity
    participant RLA as RLA API

    Client->>Handler: GET /v2/org/{org}/carbide/rack/task/{uuid}?siteId={siteId}
    Handler->>Handler: Validate request params
    Handler->>Auth: Check user membership<br/>& Provider Admin role
    Auth-->>Handler: Authorization result
    Handler->>Handler: Resolve Infrastructure Provider<br/>& validate Site ownership
    Handler->>Handler: Enforce Rack Level<br/>Administration (RLA)
    Handler->>Temporal: GetRackTask(org, uuid, siteId)
    Temporal->>Workflow: Execute GetTaskByID workflow
    Workflow->>Activity: Invoke GetTaskByID activity<br/>with TaskIds
    Activity->>RLA: Call GetTasksByIDs API
    RLA-->>Activity: Return tasks
    Activity-->>Workflow: Return GetTasksByIDsResponse
    Workflow-->>Temporal: Workflow complete
    Temporal-->>Handler: Task result
    Handler->>Handler: Convert protobuf<br/>to APITask
    Handler-->>Client: HTTP 200 + APITask
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A task retrieves with queried grace,
Through handlers, workflows, cyberspace!
From Temporal's dance to API's song,
The Rack Task endpoint hops along! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a new API endpoint for querying Rack-Level Administration (RLA) tasks by UUID, which is the primary objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/rla/task-api
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Rack-level REST endpoint to fetch/poll an RLA task by UUID, wiring API → Temporal workflow → RLA activity, and updating the OpenAPI + generated SDK accordingly.

Changes:

  • Add GetTaskByID Temporal workflow/activity (and register them in the site-agent worker).
  • Add API handler + API model for GET /v2/org/{org}/carbide/rack/task/{uuid} with unit tests.
  • Update openapi/spec.yaml and generated sdk/standard client to expose the new endpoint/model.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
site-workflow/pkg/workflow/rack.go Adds GetTaskByID workflow calling the new activity.
site-workflow/pkg/workflow/rack_test.go Adds workflow test suite coverage for GetTaskByID.
site-workflow/pkg/activity/rack.go Adds ManageRack.GetTaskByID activity calling RLA GetTasksByIDs.
site-workflow/pkg/activity/rack_test.go Adds unit tests for ManageRack.GetTaskByID validation/success paths.
site-agent/pkg/components/managers/rla/subscriber.go Registers the new workflow/activity with the Temporal worker.
api/pkg/api/routes.go Registers the new GET route for /rack/task/:uuid.
api/pkg/api/routes_test.go Updates expected rack route count to include the new endpoint.
api/pkg/api/handler/task.go Implements GetTaskHandler (authZ + site validation + workflow execution).
api/pkg/api/handler/task_test.go Adds handler tests for success and major failure modes.
api/pkg/api/model/task.go Introduces APITask and query model APIGetTaskRequest.
api/pkg/api/model/task_test.go Adds tests for proto→API mapping and request validation.
openapi/spec.yaml Adds the new path and RackTask schema.
sdk/standard/api_rack.go Adds generated SDK method GetRackTask for the new endpoint.
sdk/standard/model_rack_task.go Adds generated SDK model RackTask.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +42
StartTime string `json:"startTime"`
EndTime string `json:"endTime"`
Message string `json:"message"`
Metadata map[string]string `json:"metadata"`
Comment on lines +121 to +122
taskUUID := c.Param("uuid")


// GetTaskByID is a workflow to get a task by its UUID
func GetTaskByID(ctx workflow.Context, request *rlav1.GetTasksByIDsRequest) (*rlav1.GetTasksByIDsResponse, error) {
logger := log.With().Str("Workflow", "Task").Str("Action", "GetByID").Logger()
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (2)
api/pkg/api/routes_test.go (1)

83-83: Add an explicit assertion for the new rack task route.

Updating the count is good, but this test can still pass if the specific path/method is wrong while total route count remains unchanged. Consider asserting GET /org/:orgName/{apiName}/rack/task/:uuid exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/routes_test.go` at line 83, The test only updates the total route
count but doesn't verify the specific new route; update the test in
routes_test.go (the test that asserts total routes) to also assert that a GET
route for "GET /org/:orgName/{apiName}/rack/task/:uuid" is registered: locate
where routes or router.Routes() are inspected (e.g., the existing routes
slice/map in the test) and add an assertion that a route with Method "GET" and
Path "/org/:orgName/{apiName}/rack/task/:uuid" exists (use the same
helper/assertion style as other route existence checks in the file).
api/pkg/api/handler/task_test.go (1)

138-146: This case is “site not found”, not “invalid siteId”.

uuid.New().String() is syntactically valid, so this only covers the nonexistent-site branch. The malformed-ID path in api/pkg/api/handler/task.go Lines 125-127 remains untested; please add a separate case such as siteId=not-a-uuid, or split this into two cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/task_test.go` around lines 138 - 146, Rename the existing
test case in task_test.go (the one using uuid.New().String()) to reflect "site
not found" and add a new separate test case for the malformed-ID branch (e.g.,
name "failure - malformed siteId") that sets queryParams["siteId"] =
"not-a-uuid" so the handler's malformed-ID path (in api/pkg/api/handler/task.go
around the logic that parses siteId at lines ~125-127) is exercised; both cases
should expect http.StatusBadRequest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/pkg/api/handler/task.go`:
- Around line 159-165: The current StartWorkflowOptions builds a shared workflow
ID (fmt.Sprintf("task-get-%s", taskUUID)) with WorkflowIDConflictPolicy set to
WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING, but the code later unconditionally
terminates that execution on timeout; change the workflow start to use a unique
per-request ID (e.g., include a request-specific nonce/UUID:
fmt.Sprintf("task-get-%s-%s", taskUUID, requestUUID)) so each GET poll starts
its own execution, and remove the unconditional termination call (the
TerminateWorkflow/TerminateWorkflowExecution path) on timeout for shared
executions; ensure StartWorkflowOptions and WorkflowIDConflictPolicy references
are updated accordingly and the timeout branch returns the timeout error without
killing other pollers.
- Around line 121-157: Validate the incoming task UUID string (assigned to
taskUUID via c.Param("uuid")) before forwarding it to RLA/Temporal: use
github.com/google/uuid to parse/validate the value and if parsing fails return a
400 Bad Request via cutil.NewAPIErrorResponse rather than continuing; perform
this check prior to constructing rlav1.GetTasksByIDsRequest and before calling
gth.scp.GetClientByID so malformed UUIDs are rejected at the API boundary, and
add the google/uuid import to the file.

In `@openapi/spec.yaml`:
- Around line 16088-16091: The OpenAPI spec has a mismatch: the endTime
description mentions "canceled" but the related status enum (property name
"status") does not include that value; reconcile them by either adding
"canceled" to the status enum or removing "canceled" from the endTime
description so both reflect the same allowed terminal states—update the "status"
enum definition (symbol: status) or the "endTime" description (symbol: endTime)
accordingly and ensure any schema/enum references are adjusted to stay
consistent.
- Around line 16063-16104: The OpenAPI change added/updated the RackTask schema
but the generated SDK/docs artifacts were not updated; run the repository's
OpenAPI generation flow to regenerate the SDK and docs (updating the
sdk/standard artifacts and the docs index output such as docs/index.html),
verify the generated outputs reflect the RackTask schema (id, status,
description, startTime, endTime, message, metadata), and commit the resulting
generated files alongside the spec change.
- Around line 9595-9644: The path '/v2/org/{org}/carbide/rack/task/{uuid}' is
ambiguous with existing templated rack routes (e.g.
'/v2/org/{org}/carbide/rack/{id}/validation'); rename the route to be
structurally unique (for example '/v2/org/{org}/carbide/rack/tasks/{uuid}' or
'/v2/org/{org}/carbide/rack/{rackId}/task/{taskUuid}'), update the path
parameter names (e.g. change parameter name from uuid to taskUuid and/or add
rackId if using the second form), and update the operationId 'get-rack-task' and
any references to the RackTask schema or siteId query param to match the new
route so routing and OpenAPI linting no longer produce ambiguous matches.

In `@sdk/standard/api_rack.go`:
- Around line 924-1063: The SDK was generated but not fully refreshed for the
new rack task endpoint; run the generator and commit the diffs so client/spec
stay in sync. Re-run the project SDK generation (make generate-sdk or your
repo's generator) and ensure updated artifacts for ApiGetRackTaskRequest,
GetRackTask and GetRackTaskExecute (and any new RackTask/CarbideAPIError models)
are included in sdk/standard, then add/commit the resulting changes so CI no
longer reports drift.

In `@sdk/standard/model_rack_task.go`:
- Around line 260-263: GetMetadataOk currently returns an empty map when
metadata is unset which contradicts its contract; modify RackTask.GetMetadataOk
so that when o == nil or IsNil(o.Metadata) it returns nil, false instead of
map[string]string{}, false; locate the method named GetMetadataOk on type
RackTask and replace the empty-map return with a nil map to preserve the unset
vs empty distinction.

In `@site-workflow/pkg/activity/rack.go`:
- Around line 298-302: Guard against a nil RlaAtomicClient before calling
GetClient: check that mr.RlaAtomicClient is not nil (and optionally that mr
itself is not nil) and return cClient.ErrClientNotConnected if it is; then call
mr.RlaAtomicClient.GetClient(), keep the existing nil check on rlaClient, and
only then call rlaClient.Rla(). This prevents dereferencing a nil
ManageRack.RlaAtomicClient and avoids the panic.

---

Nitpick comments:
In `@api/pkg/api/handler/task_test.go`:
- Around line 138-146: Rename the existing test case in task_test.go (the one
using uuid.New().String()) to reflect "site not found" and add a new separate
test case for the malformed-ID branch (e.g., name "failure - malformed siteId")
that sets queryParams["siteId"] = "not-a-uuid" so the handler's malformed-ID
path (in api/pkg/api/handler/task.go around the logic that parses siteId at
lines ~125-127) is exercised; both cases should expect http.StatusBadRequest.

In `@api/pkg/api/routes_test.go`:
- Line 83: The test only updates the total route count but doesn't verify the
specific new route; update the test in routes_test.go (the test that asserts
total routes) to also assert that a GET route for "GET
/org/:orgName/{apiName}/rack/task/:uuid" is registered: locate where routes or
router.Routes() are inspected (e.g., the existing routes slice/map in the test)
and add an assertion that a route with Method "GET" and Path
"/org/:orgName/{apiName}/rack/task/:uuid" exists (use the same helper/assertion
style as other route existence checks in the file).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d3bb7dfe-8246-4172-b538-a6f67dad97f7

📥 Commits

Reviewing files that changed from the base of the PR and between 0fffc26 and 1307faa.

📒 Files selected for processing (15)
  • api/pkg/api/handler/task.go
  • api/pkg/api/handler/task_test.go
  • api/pkg/api/model/task.go
  • api/pkg/api/model/task_test.go
  • api/pkg/api/routes.go
  • api/pkg/api/routes_test.go
  • docs/index.html
  • openapi/spec.yaml
  • sdk/standard/api_rack.go
  • sdk/standard/model_rack_task.go
  • site-agent/pkg/components/managers/rla/subscriber.go
  • site-workflow/pkg/activity/rack.go
  • site-workflow/pkg/activity/rack_test.go
  • site-workflow/pkg/workflow/rack.go
  • site-workflow/pkg/workflow/rack_test.go

Comment on lines +121 to +157
taskUUID := c.Param("uuid")

site, err := common.GetSiteFromIDString(ctx, nil, apiRequest.SiteID, gth.dbSession)
if err != nil {
if errors.Is(err, common.ErrInvalidID) {
return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Failed to validate Site specified in request: invalid ID", nil)
}
if errors.Is(err, cdb.ErrDoesNotExist) {
return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Site specified in request does not exist", nil)
}
logger.Error().Err(err).Msg("error retrieving Site from DB")
return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to retrieve Site specified in request due to DB error", nil)
}

if site.InfrastructureProviderID != infrastructureProvider.ID {
return cutil.NewAPIErrorResponse(c, http.StatusForbidden, "Site specified in request doesn't belong to current org's Provider", nil)
}

siteConfig := &cdbm.SiteConfig{}
if site.Config != nil {
siteConfig = site.Config
}

if !siteConfig.RackLevelAdministration {
logger.Warn().Msg("site does not have Rack Level Administration enabled")
return cutil.NewAPIErrorResponse(c, http.StatusPreconditionFailed, "Site does not have Rack Level Administration enabled", nil)
}

stc, err := gth.scp.GetClientByID(site.ID)
if err != nil {
logger.Error().Err(err).Msg("failed to retrieve Temporal client for Site")
return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to retrieve client for Site", nil)
}

rlaRequest := &rlav1.GetTasksByIDsRequest{
TaskIds: []*rlav1.UUID{{Id: taskUUID}},
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject malformed task UUIDs before starting the workflow.

uuid is documented as a UUID, but Lines 121-157 forward any string into Temporal/RLA. That turns a client-side validation error into unnecessary Site traffic and backend-dependent error behavior instead of a deterministic 400 at the API boundary.

🛠️ Suggested fix
-	taskUUID := c.Param("uuid")
+	taskID, err := uuid.Parse(c.Param("uuid"))
+	if err != nil {
+		return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, "Failed to validate Task specified in request: invalid ID", nil)
+	}
+	taskUUID := taskID.String()

Also add the corresponding github.com/google/uuid import.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/task.go` around lines 121 - 157, Validate the incoming
task UUID string (assigned to taskUUID via c.Param("uuid")) before forwarding it
to RLA/Temporal: use github.com/google/uuid to parse/validate the value and if
parsing fails return a 400 Bad Request via cutil.NewAPIErrorResponse rather than
continuing; perform this check prior to constructing rlav1.GetTasksByIDsRequest
and before calling gth.scp.GetClientByID so malformed UUIDs are rejected at the
API boundary, and add the google/uuid import to the file.

Comment on lines +159 to +165
workflowOptions := tClient.StartWorkflowOptions{
ID: fmt.Sprintf("task-get-%s", taskUUID),
WorkflowIDReusePolicy: temporalEnums.WORKFLOW_ID_REUSE_POLICY_ALLOW_DUPLICATE,
WorkflowIDConflictPolicy: temporalEnums.WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING,
WorkflowExecutionTimeout: cutil.WorkflowExecutionTimeout,
TaskQueue: queue.SiteTaskQueue,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't terminate a shared polling workflow on one caller timeout.

Lines 159-165 reuse the same workflow execution per task with WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING, but Lines 179-181 unconditionally terminate that workflow on timeout. One slow poller can therefore cancel another in-flight poll for the same task.

For this read-only GET path, prefer either a unique workflow ID per request or returning the timeout without terminating the shared execution.

Also applies to: 179-181

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/task.go` around lines 159 - 165, The current
StartWorkflowOptions builds a shared workflow ID (fmt.Sprintf("task-get-%s",
taskUUID)) with WorkflowIDConflictPolicy set to
WORKFLOW_ID_CONFLICT_POLICY_USE_EXISTING, but the code later unconditionally
terminates that execution on timeout; change the workflow start to use a unique
per-request ID (e.g., include a request-specific nonce/UUID:
fmt.Sprintf("task-get-%s-%s", taskUUID, requestUUID)) so each GET poll starts
its own execution, and remove the unconditional termination call (the
TerminateWorkflow/TerminateWorkflowExecution path) on timeout for shared
executions; ensure StartWorkflowOptions and WorkflowIDConflictPolicy references
are updated accordingly and the timeout branch returns the timeout error without
killing other pollers.

Comment on lines +9595 to +9644
'/v2/org/{org}/carbide/rack/task/{uuid}':
parameters:
- schema:
type: string
name: org
in: path
required: true
description: Name of the Org
- schema:
type: string
format: uuid
name: uuid
in: path
required: true
description: UUID of the Task
get:
summary: Retrieve a Task
operationId: get-rack-task
description: |-
Get a Task by UUID.

Org must have an Infrastructure Provider entity. User must have `FORGE_PROVIDER_ADMIN` authorization role.
parameters:
- schema:
type: string
format: uuid
name: siteId
in: query
required: true
description: ID of the Site
responses:
'200':
description: OK
content:
application/json:
schema:
$ref: '#/components/schemas/RackTask'
examples:
example-1:
value:
id: 550e8400-e29b-41d4-a716-446655440000
status: running
description: Power on rack components
message: 'Processing 3 of 5 components'
'403':
$ref: '#/components/responses/ForbiddenError'
'404':
$ref: '#/components/responses/NotFoundError'
tags:
- Rack
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve ambiguous route template before merge

Line 9595 introduces a path that conflicts with existing templated rack routes (/v2/org/{org}/carbide/rack/{id}/validation), which makes matching ambiguous for values like task/validation. This is already flagged by OpenAPI lint and should be made structurally unique.

Suggested spec change
-  '/v2/org/{org}/carbide/rack/task/{uuid}':
+  '/v2/org/{org}/carbide/rack/task/by-id/{uuid}':
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
'/v2/org/{org}/carbide/rack/task/{uuid}':
parameters:
- schema:
type: string
name: org
in: path
required: true
description: Name of the Org
- schema:
type: string
format: uuid
name: uuid
in: path
required: true
description: UUID of the Task
get:
summary: Retrieve a Task
operationId: get-rack-task
description: |-
Get a Task by UUID.
Org must have an Infrastructure Provider entity. User must have `FORGE_PROVIDER_ADMIN` authorization role.
parameters:
- schema:
type: string
format: uuid
name: siteId
in: query
required: true
description: ID of the Site
responses:
'200':
description: OK
content:
application/json:
schema:
$ref: '#/components/schemas/RackTask'
examples:
example-1:
value:
id: 550e8400-e29b-41d4-a716-446655440000
status: running
description: Power on rack components
message: 'Processing 3 of 5 components'
'403':
$ref: '#/components/responses/ForbiddenError'
'404':
$ref: '#/components/responses/NotFoundError'
tags:
- Rack
'/v2/org/{org}/carbide/rack/task/by-id/{uuid}':
parameters:
- schema:
type: string
name: org
in: path
required: true
description: Name of the Org
- schema:
type: string
format: uuid
name: uuid
in: path
required: true
description: UUID of the Task
get:
summary: Retrieve a Task
operationId: get-rack-task
description: |-
Get a Task by UUID.
Org must have an Infrastructure Provider entity. User must have `FORGE_PROVIDER_ADMIN` authorization role.
parameters:
- schema:
type: string
format: uuid
name: siteId
in: query
required: true
description: ID of the Site
responses:
'200':
description: OK
content:
application/json:
schema:
$ref: '#/components/schemas/RackTask'
examples:
example-1:
value:
id: 550e8400-e29b-41d4-a716-446655440000
status: running
description: Power on rack components
message: 'Processing 3 of 5 components'
'403':
$ref: '#/components/responses/ForbiddenError'
'404':
$ref: '#/components/responses/NotFoundError'
tags:
- Rack
🧰 Tools
🪛 GitHub Check: Lint and Test / Lint OpenAPI

[warning] 9595-9595: no-ambiguous-paths
Paths should resolve unambiguously. Found two ambiguous paths: /v2/org/{org}/carbide/rack/{id}/validation and /v2/org/{org}/carbide/rack/task/{uuid}.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openapi/spec.yaml` around lines 9595 - 9644, The path
'/v2/org/{org}/carbide/rack/task/{uuid}' is ambiguous with existing templated
rack routes (e.g. '/v2/org/{org}/carbide/rack/{id}/validation'); rename the
route to be structurally unique (for example
'/v2/org/{org}/carbide/rack/tasks/{uuid}' or
'/v2/org/{org}/carbide/rack/{rackId}/task/{taskUuid}'), update the path
parameter names (e.g. change parameter name from uuid to taskUuid and/or add
rackId if using the second form), and update the operationId 'get-rack-task' and
any references to the RackTask schema or siteId query param to match the new
route so routing and OpenAPI linting no longer produce ambiguous matches.

Comment on lines +16063 to +16104
RackTask:
title: RackTask
type: object
description: A task representing an asynchronous operation on rack infrastructure.
properties:
id:
type: string
format: uuid
description: Unique identifier of the task.
status:
type: string
enum:
- unknown
- pending
- running
- succeeded
- failed
description: Current state of the task.
description:
type: string
description: Human-readable description provided when the task was created.
startTime:
type: string
format: date-time
description: Timestamp when the task started execution.
endTime:
type: string
format: date-time
description: Timestamp when the task finished (succeeded, failed or canceled).
message:
type: string
description: Optional status or error message describing the current state or result.
metadata:
type: object
additionalProperties:
type: string
description: Extensible key-value data for additional task details.
examples:
- id: 550e8400-e29b-41d4-a716-446655440000
status: running
description: Power on rack components
message: 'Processing 3 of 5 components'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Regenerate SDK/docs artifacts required by CI

This OpenAPI change currently fails pipeline checks because generated outputs were not updated (sdk/standard/ and/or docs/index.html). Please run the repo’s OpenAPI generation flow and commit the resulting artifacts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openapi/spec.yaml` around lines 16063 - 16104, The OpenAPI change
added/updated the RackTask schema but the generated SDK/docs artifacts were not
updated; run the repository's OpenAPI generation flow to regenerate the SDK and
docs (updating the sdk/standard artifacts and the docs index output such as
docs/index.html), verify the generated outputs reflect the RackTask schema (id,
status, description, startTime, endTime, message, metadata), and commit the
resulting generated files alongside the spec change.

Comment on lines +16088 to +16091
endTime:
type: string
format: date-time
description: Timestamp when the task finished (succeeded, failed or canceled).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix endTime description/status mismatch

Line 16091 mentions canceled, but status (Line 16074 onward) does not include it. Align the enum and description to avoid contract ambiguity for clients.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openapi/spec.yaml` around lines 16088 - 16091, The OpenAPI spec has a
mismatch: the endTime description mentions "canceled" but the related status
enum (property name "status") does not include that value; reconcile them by
either adding "canceled" to the status enum or removing "canceled" from the
endTime description so both reflect the same allowed terminal states—update the
"status" enum definition (symbol: status) or the "endTime" description (symbol:
endTime) accordingly and ensure any schema/enum references are adjusted to stay
consistent.

Comment on lines +924 to +1063
type ApiGetRackTaskRequest struct {
ctx context.Context
ApiService *RackAPIService
siteId *string
org string
uuid string
}

// ID of the Site
func (r ApiGetRackTaskRequest) SiteId(siteId string) ApiGetRackTaskRequest {
r.siteId = &siteId
return r
}

func (r ApiGetRackTaskRequest) Execute() (*RackTask, *http.Response, error) {
return r.ApiService.GetRackTaskExecute(r)
}

/*
GetRackTask Retrieve a Task

Get an RLA Task by UUID.

Org must have an Infrastructure Provider entity. User must have `FORGE_PROVIDER_ADMIN` authorization role.

@param ctx context.Context - for authentication, logging, cancellation, deadlines, tracing, etc. Passed from http.Request or context.Background().
@param org Name of the Org
@param uuid UUID of the Task
@return ApiGetRackTaskRequest
*/
func (a *RackAPIService) GetRackTask(ctx context.Context, org string, uuid string) ApiGetRackTaskRequest {
return ApiGetRackTaskRequest{
ApiService: a,
ctx: ctx,
org: org,
uuid: uuid,
}
}

// Execute executes the request
// @return RackTask
func (a *RackAPIService) GetRackTaskExecute(r ApiGetRackTaskRequest) (*RackTask, *http.Response, error) {
var (
localVarHTTPMethod = http.MethodGet
localVarPostBody interface{}
formFiles []formFile
localVarReturnValue *RackTask
)

localBasePath, err := a.client.cfg.ServerURLWithContext(r.ctx, "RackAPIService.GetRackTask")
if err != nil {
return localVarReturnValue, nil, &GenericOpenAPIError{error: err.Error()}
}

localVarPath := localBasePath + "/v2/org/{org}/carbide/rack/task/{uuid}"
localVarPath = strings.Replace(localVarPath, "{"+"org"+"}", url.PathEscape(parameterValueToString(r.org, "org")), -1)
localVarPath = strings.Replace(localVarPath, "{"+"uuid"+"}", url.PathEscape(parameterValueToString(r.uuid, "uuid")), -1)

localVarHeaderParams := make(map[string]string)
localVarQueryParams := url.Values{}
localVarFormParams := url.Values{}
if r.siteId == nil {
return localVarReturnValue, nil, reportError("siteId is required and must be specified")
}

parameterAddToHeaderOrQuery(localVarQueryParams, "siteId", r.siteId, "form", "")
// to determine the Content-Type header
localVarHTTPContentTypes := []string{}

// set Content-Type header
localVarHTTPContentType := selectHeaderContentType(localVarHTTPContentTypes)
if localVarHTTPContentType != "" {
localVarHeaderParams["Content-Type"] = localVarHTTPContentType
}

// to determine the Accept header
localVarHTTPHeaderAccepts := []string{"application/json"}

// set Accept header
localVarHTTPHeaderAccept := selectHeaderAccept(localVarHTTPHeaderAccepts)
if localVarHTTPHeaderAccept != "" {
localVarHeaderParams["Accept"] = localVarHTTPHeaderAccept
}
req, err := a.client.prepareRequest(r.ctx, localVarPath, localVarHTTPMethod, localVarPostBody, localVarHeaderParams, localVarQueryParams, localVarFormParams, formFiles)
if err != nil {
return localVarReturnValue, nil, err
}

localVarHTTPResponse, err := a.client.callAPI(req)
if err != nil || localVarHTTPResponse == nil {
return localVarReturnValue, localVarHTTPResponse, err
}

localVarBody, err := io.ReadAll(localVarHTTPResponse.Body)
localVarHTTPResponse.Body.Close()
localVarHTTPResponse.Body = io.NopCloser(bytes.NewBuffer(localVarBody))
if err != nil {
return localVarReturnValue, localVarHTTPResponse, err
}

if localVarHTTPResponse.StatusCode >= 300 {
newErr := &GenericOpenAPIError{
body: localVarBody,
error: localVarHTTPResponse.Status,
}
if localVarHTTPResponse.StatusCode == 403 {
var v CarbideAPIError
err = a.client.decode(&v, localVarBody, localVarHTTPResponse.Header.Get("Content-Type"))
if err != nil {
newErr.error = err.Error()
return localVarReturnValue, localVarHTTPResponse, newErr
}
newErr.error = formatErrorMessage(localVarHTTPResponse.Status, &v)
newErr.model = v
return localVarReturnValue, localVarHTTPResponse, newErr
}
if localVarHTTPResponse.StatusCode == 404 {
var v CarbideAPIError
err = a.client.decode(&v, localVarBody, localVarHTTPResponse.Header.Get("Content-Type"))
if err != nil {
newErr.error = err.Error()
return localVarReturnValue, localVarHTTPResponse, newErr
}
newErr.error = formatErrorMessage(localVarHTTPResponse.Status, &v)
newErr.model = v
}
return localVarReturnValue, localVarHTTPResponse, newErr
}

err = a.client.decode(&localVarReturnValue, localVarBody, localVarHTTPResponse.Header.Get("Content-Type"))
if err != nil {
newErr := &GenericOpenAPIError{
body: localVarBody,
error: err.Error(),
}
return localVarReturnValue, localVarHTTPResponse, newErr
}

return localVarReturnValue, localVarHTTPResponse, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Regenerate SDK artifacts before merge to avoid client/spec drift.

CI is warning that sdk/standard/ is not fully updated (make generate-sdk). Please regenerate and commit any additional generated deltas so this new endpoint/model surface is fully synchronized.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/standard/api_rack.go` around lines 924 - 1063, The SDK was generated but
not fully refreshed for the new rack task endpoint; run the generator and commit
the diffs so client/spec stay in sync. Re-run the project SDK generation (make
generate-sdk or your repo's generator) and ensure updated artifacts for
ApiGetRackTaskRequest, GetRackTask and GetRackTaskExecute (and any new
RackTask/CarbideAPIError models) are included in sdk/standard, then add/commit
the resulting changes so CI no longer reports drift.

Comment on lines +260 to +263
func (o *RackTask) GetMetadataOk() (map[string]string, bool) {
if o == nil || IsNil(o.Metadata) {
return map[string]string{}, false
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return nil for unset metadata in GetMetadataOk.

GetMetadataOk is documented to return nil when unset, but currently returns an empty map. That mismatches the method contract and makes unset-vs-empty less clear for callers.

🔧 Suggested fix
 func (o *RackTask) GetMetadataOk() (map[string]string, bool) {
 	if o == nil || IsNil(o.Metadata) {
-		return map[string]string{}, false
+		return nil, false
 	}
 	return o.Metadata, true
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (o *RackTask) GetMetadataOk() (map[string]string, bool) {
if o == nil || IsNil(o.Metadata) {
return map[string]string{}, false
}
func (o *RackTask) GetMetadataOk() (map[string]string, bool) {
if o == nil || IsNil(o.Metadata) {
return nil, false
}
return o.Metadata, true
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/standard/model_rack_task.go` around lines 260 - 263, GetMetadataOk
currently returns an empty map when metadata is unset which contradicts its
contract; modify RackTask.GetMetadataOk so that when o == nil or
IsNil(o.Metadata) it returns nil, false instead of map[string]string{}, false;
locate the method named GetMetadataOk on type RackTask and replace the empty-map
return with a nil map to preserve the unset vs empty distinction.

Comment on lines +298 to +302
rlaClient := mr.RlaAtomicClient.GetClient()
if rlaClient == nil {
return nil, cClient.ErrClientNotConnected
}
rla := rlaClient.Rla()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard mr.RlaAtomicClient before dereferencing it.

A zero-value ManageRack or a nil-injected client will panic on Line 298, so the worker crashes instead of returning ErrClientNotConnected.

🛠️ Suggested fix
+	if mr.RlaAtomicClient == nil {
+		return nil, cClient.ErrClientNotConnected
+	}
 	rlaClient := mr.RlaAtomicClient.GetClient()
 	if rlaClient == nil {
 		return nil, cClient.ErrClientNotConnected
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rlaClient := mr.RlaAtomicClient.GetClient()
if rlaClient == nil {
return nil, cClient.ErrClientNotConnected
}
rla := rlaClient.Rla()
if mr.RlaAtomicClient == nil {
return nil, cClient.ErrClientNotConnected
}
rlaClient := mr.RlaAtomicClient.GetClient()
if rlaClient == nil {
return nil, cClient.ErrClientNotConnected
}
rla := rlaClient.Rla()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@site-workflow/pkg/activity/rack.go` around lines 298 - 302, Guard against a
nil RlaAtomicClient before calling GetClient: check that mr.RlaAtomicClient is
not nil (and optionally that mr itself is not nil) and return
cClient.ErrClientNotConnected if it is; then call
mr.RlaAtomicClient.GetClient(), keep the existing nil check on rlaClient, and
only then call rlaClient.Rla(). This prevents dereferencing a nil
ManageRack.RlaAtomicClient and avoids the panic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants