Skip to content

feat: add shard liveness API and UI indicator#6166

Open
krancour wants to merge 1 commit intoakuity:mainfrom
krancour:krancour/heartbeats-fix
Open

feat: add shard liveness API and UI indicator#6166
krancour wants to merge 1 commit intoakuity:mainfrom
krancour:krancour/heartbeats-fix

Conversation

@krancour
Copy link
Copy Markdown
Member

Surface per-shard liveness in the API and UI, derived from heartbeat ConfigMaps in the Kargo namespace. A shard is considered alive when its heartbeat ConfigMap carries an observedAt timestamp within AGENT_STATUS_DEADLINE (default 10m), and dead otherwise.

  • Add REST endpoint GET /v1beta1/system/shards returning each known shard's alive/dead status and last-heartbeat timestamp.
  • Add api.defaultShardName chart value plus DEFAULT_SHARD_NAME on the API server, used to resolve Stages with no explicit spec.shard.
  • Render a small shard-status indicator (broadcast tower icon, green for alive, red for dead, hidden when no shard data is available) on each Stage card in the project DAG view.
  • Default the Tilt dev environment to a single shard named 'tilt-shard' so the indicator has something to render.

cc @rpelczar, I let Claude try the UI parts. I don't feel git to evaluate front end stuff. I can at least tell you it doesn't look spectacular. Would you mind having a look and amending the PR directly if you would like to change anything?

Screenshot 2026-04-25 at 12 48 19 AM

Surface per-shard liveness in the API and UI, derived from heartbeat
ConfigMaps in the Kargo namespace. A shard is considered alive when
its heartbeat ConfigMap carries an observedAt timestamp within
AGENT_STATUS_DEADLINE (default 10m), and dead otherwise.

- Add REST endpoint GET /v1beta1/system/shards returning each known
  shard's alive/dead status and last-heartbeat timestamp.
- Add api.defaultShardName chart value plus DEFAULT_SHARD_NAME on the
  API server, used to resolve Stages with no explicit spec.shard.
- Render a small shard-status indicator (broadcast tower icon, green
  for alive, red for dead, hidden when no shard data is available) on
  each Stage card in the project DAG view.
- Default the Tilt dev environment to a single shard named
  'tilt-shard' so the indicator has something to render.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
@krancour krancour added this to the v1.10.3 milestone Apr 25, 2026
@krancour krancour self-assigned this Apr 25, 2026
@krancour krancour added the kind/enhancement An entirely new feature label Apr 25, 2026
@krancour krancour requested a review from a team as a code owner April 25, 2026 04:49
@krancour krancour added area/ui Affects the UI priority/high Needs to be addressed sooner rather than later area/api-server Affects Kargo's API server labels Apr 25, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 25, 2026

Deploy Preview for docs-kargo-io ready!

Name Link
🔨 Latest commit 4d4247e
🔍 Latest deploy log https://app.netlify.com/projects/docs-kargo-io/deploys/69ec47c7c4da6f00087ff271
😎 Deploy Preview https://deploy-preview-6166.docs.kargo.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 25, 2026

Codecov Report

❌ Patch coverage is 76.74419% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.74%. Comparing base (c737ce5) to head (4d4247e).

Files with missing lines Patch % Lines
pkg/server/config/config.go 0.00% 5 Missing ⚠️
pkg/server/list_shards_v1alpha1.go 86.48% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6166      +/-   ##
==========================================
+ Coverage   57.72%   57.74%   +0.02%     
==========================================
  Files         475      476       +1     
  Lines       40541    40584      +43     
==========================================
+ Hits        23402    23435      +33     
- Misses      15746    15754       +8     
- Partials     1393     1395       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@thomastaylor312 thomastaylor312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go from the backend perspective. Someone with more frontend experiences needs to look at that side of things

DefaultShardName string
// AgentStatusDeadline is the maximum age of a shard heartbeat before the
// shard is considered dead.
AgentStatusDeadline time.Duration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Suggest renaming to ShardStatusDeadline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api-server Affects Kargo's API server area/ui Affects the UI kind/enhancement An entirely new feature priority/high Needs to be addressed sooner rather than later

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants