Skip to content

chore: Enable use of local carbide-api instead of mock-core#197

Open
pbreton wants to merge 1 commit intomainfrom
chore/local-carbide
Open

chore: Enable use of local carbide-api instead of mock-core#197
pbreton wants to merge 1 commit intomainfrom
chore/local-carbide

Conversation

@pbreton
Copy link
Contributor

@pbreton pbreton commented Mar 5, 2026

  • Update local stack to connect to local carbide-core
  • add a test script for validation connection to carbide-api

This depends on PR#449.

Copilot AI review requested due to automatic review settings March 5, 2026 02:20
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

Test Results

7 623 tests  ±0   7 620 ✅ ±0   7m 46s ⏱️ -5s
  136 suites ±0       3 💤 ±0 
   13 files   ±0       0 ❌ ±0 

Results for commit 18e8f6c. ± Comparison against base commit 849196f.

♻️ This comment has been updated with latest results.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables the local kind stack to use a locally running carbide-api (from bare-metal-manager-core) instead of the in-cluster mock-core, and adds a smoke-test script to validate the end-to-end API → core integration path.

Changes:

  • Add scripts/test-local-core.sh to exercise auth, tenant/site discovery, and resource creation through the local core path.
  • Update Makefile kind-reset to optionally skip building/deploying mock-core and patch site-agent to point at the local core with TLS and injected certs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
scripts/test-local-core.sh New local integration smoke test script for API → core connectivity and basic workflows.
Makefile Adds LOCAL_CORE mode to connect site-agent to a locally running core and manage certs/config patches during kind-reset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +64 to +72
local path="$1"
local out http_code
out=$(curl -s -w "\n%{http_code}" "$BASE_URL$path" \
-H "Authorization: Bearer $TOKEN" \
-H "Accept: application/json")
http_code=$(echo "$out" | tail -n1)
body=$(echo "$out" | sed '$d')
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
die "GET $path → HTTP $http_code: $body"
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In api_get, body is assigned without being declared local, which leaks it into the global scope and makes the helper have hidden side effects. Declare local body (and include it in the local ... list) so each call is self-contained and can't accidentally overwrite a global body variable used elsewhere.

Copilot uses AI. Check for mistakes.
UPDATE_TAG=$(kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \
psql -U "$PG_USER" -d "$PG_DB" -c \
"UPDATE site SET status = 'Registered', updated = NOW() WHERE id = '$SITE_ID';" \
2>&1 | grep -E '^UPDATE')
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command substitution relies on grep matching ^UPDATE. With set -euo pipefail, a no-match grep (exit 1) can terminate the script immediately, so the later explicit checks/die message may never run. Make the pipeline non-fatal (e.g., append || true / disable errexit just for this capture) and then validate UPDATE_TAG/exit status explicitly.

Suggested change
2>&1 | grep -E '^UPDATE')
2>&1 | grep -E '^UPDATE' || true)

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +111
# Run a SQL statement inside the in-cluster postgres pod.
db_exec() {
local sql="$1"
kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \
psql -U "$PG_USER" -d "$PG_DB" -tAc "$sql"
}

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db_exec() is defined but not used anywhere in this script, which adds noise and can mislead readers about how DB access is performed. Remove it, or refactor the later direct kubectl exec ... psql call sites to use this helper consistently.

Suggested change
# Run a SQL statement inside the in-cluster postgres pod.
db_exec() {
local sql="$1"
kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \
psql -U "$PG_USER" -d "$PG_DB" -tAc "$sql"
}

Copilot uses AI. Check for mistakes.
Comment on lines +468 to +477
@if [ "$(LOCAL_CORE)" = "true" ]; then \
echo "Creating core-grpc-client-site-agent-certs secret from $(LOCAL_CORE_CERTS_DIR)..." ; \
for f in ca.crt client.crt client.key; do \
[ -f "$(LOCAL_CORE_CERTS_DIR)/$$f" ] || { echo "ERROR: $$f not found in $(LOCAL_CORE_CERTS_DIR). Run gen-certs.sh in bare-metal-manager-core first."; exit 1; } ; \
done ; \
kubectl -n carbide-rest create secret generic core-grpc-client-site-agent-certs \
--from-file=ca.crt="$(LOCAL_CORE_CERTS_DIR)/ca.crt" \
--from-file=tls.crt="$(LOCAL_CORE_CERTS_DIR)/client.crt" \
--from-file=tls.key="$(LOCAL_CORE_CERTS_DIR)/client.key" \
--dry-run=client -o yaml | kubectl apply -f - ; \
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When LOCAL_CORE=true, this creates/updates the core-grpc-client-site-agent-certs Secret, but deploy/kustomize/base/site-agent/certificate.yaml also defines a cert-manager Certificate that continuously reconciles the same secret name. That controller will likely overwrite the manually provided local-core certs, causing flapping/connection failures. Consider deleting/omitting the Certificate resource when LOCAL_CORE=true (e.g., kubectl delete certificate core-grpc-client-site-agent-certs) or using a distinct secret name for local-core certs and patching the site-agent deployment to mount that secret instead.

Copilot uses AI. Check for mistakes.
@pbreton pbreton changed the title chore: enable to use local carbide-api instead of mock-core chore: Enable use of local carbide-api instead of mock-core Mar 6, 2026
@pbreton pbreton force-pushed the chore/local-carbide branch from c8a1846 to 240cbc1 Compare March 6, 2026 03:27
@github-actions
Copy link

github-actions bot commented Mar 6, 2026

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-03-06 03:28:18 UTC | Commit: 240cbc1

@github-actions
Copy link

github-actions bot commented Mar 6, 2026

🛡️ Vulnerability Scan

🚨 Found 64 vulnerability(ies)

Severity Breakdown:

  • 🔴 Critical/High: 64
  • 🟡 Medium: 0
  • 🔵 Low/Info: 0
📋 Top Vulnerabilities
  • AVD-KSV-0109: Artifact: deploy/kustomize/base/api/configmap.yaml
    Type: kubernetes
    Vulnerability AVD-KSV-0109
    Severity: HIGH
    Message: ConfigMap 'carbide-rest-api-config' in 'default' namespace stores secrets in key(s) or value(s) '{" password"}'
    Link: AVD-KSV-0109 (deploy/kustomize/base/api/configmap.yaml)
  • KSV014: Artifact: deploy/kustomize/base/api/deployment.yaml
    Type: kubernetes
    Vulnerability KSV014
    Severity: HIGH
    Message: Container 'api' of Deployment 'carbide-rest-api' should set 'securityContext.readOnlyRootFilesystem' to true
    Link: KSV014 (deploy/kustomize/base/api/deployment.yaml)
  • KSV118: Artifact: deploy/kustomize/base/api/deployment.yaml
    Type: kubernetes
    Vulnerability KSV118
    Severity: HIGH
    Message: container carbide-rest-api in default namespace is using the default security context
    Link: KSV118 (deploy/kustomize/base/api/deployment.yaml)
  • KSV118: Artifact: deploy/kustomize/base/api/deployment.yaml
    Type: kubernetes
    Vulnerability KSV118
    Severity: HIGH
    Message: deployment carbide-rest-api in default namespace is using the default security context, which allows root privileges
    Link: KSV118 (deploy/kustomize/base/api/deployment.yaml)
  • KSV014: Artifact: deploy/kustomize/base/db/job.yaml
    Type: kubernetes
    Vulnerability KSV014
    Severity: HIGH
    Message: Container 'migrations' of Job 'carbide-rest-db-migration' should set 'securityContext.readOnlyRootFilesystem' to true
    Link: KSV014 (deploy/kustomize/base/db/job.yaml)
  • KSV014: Artifact: deploy/kustomize/base/db/job.yaml
    Type: kubernetes
    Vulnerability KSV014
    Severity: HIGH
    Message: Container 'wait-for-postgres' of Job 'carbide-rest-db-migration' should set 'securityContext.readOnlyRootFilesystem' to true
    Link: KSV014 (deploy/kustomize/base/db/job.yaml)
  • KSV118: Artifact: deploy/kustomize/base/db/job.yaml
    Type: kubernetes
    Vulnerability KSV118
    Severity: HIGH
    Message: container carbide-rest-db-migration in default namespace is using the default security context
    Link: KSV118 (deploy/kustomize/base/db/job.yaml)
  • KSV118: Artifact: deploy/kustomize/base/db/job.yaml
    Type: kubernetes
    Vulnerability KSV118
    Severity: HIGH
    Message: container carbide-rest-db-migration in default namespace is using the default security context
    Link: KSV118 (deploy/kustomize/base/db/job.yaml)
  • KSV118: Artifact: deploy/kustomize/base/db/job.yaml
    Type: kubernetes
    Vulnerability KSV118
    Severity: HIGH
    Message: job carbide-rest-db-migration in default namespace is using the default security context, which allows root privileges
    Link: KSV118 (deploy/kustomize/base/db/job.yaml)
  • KSV014: Artifact: deploy/kustomize/base/keycloak/deployment.yaml
    Type: kubernetes
    Vulnerability KSV014
    Severity: HIGH
    Message: Container 'keycloak' of Deployment 'keycloak' should set 'securityContext.readOnlyRootFilesystem' to true
    Link: KSV014 (deploy/kustomize/base/keycloak/deployment.yaml)

🔗 View full details in Security tab

🕐 Last updated: 2026-03-06 03:28:28 UTC | Commit: 240cbc1

Copilot AI review requested due to automatic review settings March 6, 2026 19:31
@pbreton pbreton force-pushed the chore/local-carbide branch from 240cbc1 to d12e81a Compare March 6, 2026 19:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +89 to +97
# Resource already exists; the conflict response carries the existing id
# under .data.id — reuse it so the script is idempotent across runs.
existing_id=$(echo "$body" | jq -r '.data.id // empty')
if [[ -n "$existing_id" ]]; then
info "Already exists (HTTP 409) — reusing id: $existing_id"
echo "{\"id\":\"$existing_id\"}"
return
fi
die "POST $path → HTTP 409 (conflict) but no id in response: $body"
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script claims idempotency on HTTP 409 by extracting an existing resource id from .data.id, but many API conflict responses use validation errors (e.g., { "data": { "name": "..." } }) and do not include an id. In those cases existing_id will be empty and the script exits, so re-running the smoke test fails. Consider handling 409 by performing a follow-up lookup (e.g., GET list filtered by name/site and extracting .id) rather than assuming .data.id is present.

Suggested change
# Resource already exists; the conflict response carries the existing id
# under .data.id — reuse it so the script is idempotent across runs.
existing_id=$(echo "$body" | jq -r '.data.id // empty')
if [[ -n "$existing_id" ]]; then
info "Already exists (HTTP 409) — reusing id: $existing_id"
echo "{\"id\":\"$existing_id\"}"
return
fi
die "POST $path → HTTP 409 (conflict) but no id in response: $body"
# Resource already exists; the conflict response is expected to carry
# the existing id under .data.id — reuse it so the script is idempotent
# across runs.
existing_id=$(echo "$body" | jq -r '.data.id // empty')
if [[ -n "$existing_id" ]]; then
info "Already exists (HTTP 409) — reusing id from conflict response: $existing_id"
echo "{\"id\":\"$existing_id\"}"
return
fi
# Some APIs return validation-style conflict responses without an id.
# In that case, attempt a follow-up lookup (e.g., by name) to recover
# the existing resource id and keep the script idempotent.
local name lookup_out lookup_code lookup_body lookup_id
name=$(echo "$payload" | jq -r '.name // .data.name // empty')
if [[ -n "$name" ]]; then
info "HTTP 409 without id; attempting lookup by name: \"$name\""
lookup_out=$(curl -s -w "\n%{http_code}" "$BASE_URL$path?name=$name" \
-H "Authorization: Bearer $TOKEN" \
-H "Accept: application/json")
lookup_code=$(echo "$lookup_out" | tail -n1)
lookup_body=$(echo "$lookup_out" | sed '$d')
if [[ "$lookup_code" -ge 200 && "$lookup_code" -lt 300 ]]; then
# Try a few common response shapes to find an id.
lookup_id=$(echo "$lookup_body" | jq -r '.data.id // .data[0].id // .items[0].id // .id // empty')
if [[ -n "$lookup_id" ]]; then
info "Already exists (HTTP 409) — reusing id from lookup: $lookup_id"
echo "{\"id\":\"$lookup_id\"}"
return
fi
fi
fi
die "POST $path → HTTP 409 (conflict) but no id in response or lookup: $body"

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +126
# Override defaults via environment variables:
# API_URL, KEYCLOAK_URL, ORG, SITE_NAME, PG_NAMESPACE, PG_STATEFULSET, PG_USER, PG_DB
#

set -euo pipefail

API_URL="${API_URL:-http://localhost:8388}"
KEYCLOAK_URL="${KEYCLOAK_URL:-http://localhost:8082}"
ORG="${ORG:-test-org}"
SITE_NAME="${SITE_NAME:-local-dev-site}"

# Postgres access via kubectl exec (no local psql or port-forward needed)
PG_NAMESPACE="${PG_NAMESPACE:-postgres}"
PG_STATEFULSET="${PG_STATEFULSET:-statefulset/postgres}"
PG_USER="${PG_USER:-forge}"
PG_DB="${PG_DB:-forge}"

BASE_URL="$API_URL/v2/org/$ORG/carbide"

# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------

step() { echo; echo "──── $* ────"; } >&2
ok() { echo " ✓ $*"; } >&2
info() { echo " $*"; } >&2
die() { echo; echo "ERROR: $*"; exit 1; } >&2

api_get() {
local path="$1"
local out http_code
out=$(curl -s -w "\n%{http_code}" "$BASE_URL$path" \
-H "Authorization: Bearer $TOKEN" \
-H "Accept: application/json")
http_code=$(echo "$out" | tail -n1)
body=$(echo "$out" | sed '$d')
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
die "GET $path → HTTP $http_code: $body"
fi
echo "$body"
}

api_post() {
local path="$1"
local payload="$2"
local out http_code body existing_id
out=$(curl -s -w "\n%{http_code}" -X POST "$BASE_URL$path" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d "$payload")
http_code=$(echo "$out" | tail -n1)
body=$(echo "$out" | sed '$d')
if [[ "$http_code" == "409" ]]; then
# Resource already exists; the conflict response carries the existing id
# under .data.id — reuse it so the script is idempotent across runs.
existing_id=$(echo "$body" | jq -r '.data.id // empty')
if [[ -n "$existing_id" ]]; then
info "Already exists (HTTP 409) — reusing id: $existing_id"
echo "{\"id\":\"$existing_id\"}"
return
fi
die "POST $path → HTTP 409 (conflict) but no id in response: $body"
fi
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
die "POST $path → HTTP $http_code: $body"
fi
echo "$body"
}

# Run a SQL statement inside the in-cluster postgres pod.
db_exec() {
local sql="$1"
kubectl exec -n "$PG_NAMESPACE" "$PG_STATEFULSET" -- \
psql -U "$PG_USER" -d "$PG_DB" -tAc "$sql"
}

# ---------------------------------------------------------------------------
# 1. Authenticate
# ---------------------------------------------------------------------------

step "Authenticating via Keycloak"

TOKEN=$(curl -sf -X POST \
"$KEYCLOAK_URL/realms/carbide-dev/protocol/openid-connect/token" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "client_id=carbide-api" \
-d "client_secret=carbide-local-secret" \
-d "grant_type=password" \
-d "username=admin@example.com" \
-d "password=adminpassword" \
| jq -r '.access_token')
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header says defaults can be overridden via environment variables, but the Keycloak username/password/client_secret are hard-coded below. If different local dev credentials are needed, users have to edit the script. Consider adding env overrides (e.g., KEYCLOAK_USER, KEYCLOAK_PASSWORD, KEYCLOAK_CLIENT_SECRET) and listing them in the "Override defaults" comment block.

Copilot uses AI. Check for mistakes.
…test script for validation

Signed-off-by: Patrice Breton <pbreton@nvidia.com>
@pbreton pbreton force-pushed the chore/local-carbide branch from d12e81a to 18e8f6c Compare March 9, 2026 18:54
@thossain-nv
Copy link
Contributor

Should we update README to reflect how to use local core?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants