Skip to content

Release v1.7.0

Latest

Choose a tag to compare

@github-actions github-actions released this 26 May 18:46
· 8 commits to main since this release
v1.7.0
c28bf0c

Release v1.7.0

This release fixes a fault-remediation bug where every historical cancellation replayed on every restart (eventually causing OOM kills), adds a Helm gate to disable the external-MongoDB setup job for tenants who provision the database themselves, brings the docs site onto NVIDIA's shared Fern global theme, and ships a large set of GitHub repository automation workflows.

Major New Features

External MongoDB Setup Job Gate (#1311)

The post-install/post-upgrade hook job that provisions collections, indexes, and x509 users on external MongoDB can now be disabled independently of the external-MongoDB configuration. Set global.datastore.setupJob.enabled: false to opt out — useful for deployments where the datastore is provisioned out-of-band and the setup job's auth requirements don't match the tenant identity. Defaults to true, so existing deployments are unaffected.

Repository Automation Workflows (#1306)

Adds a suite of GitHub Actions workflows and issue templates for repository hygiene:

  • Merge conflict check — runs on PR creation and main push; adds a needs-rebase label when a PR diverges from main.
  • Dependabot auto-merge — auto-merges Dependabot PRs that contain only semver-patch updates.
  • Issue triage — applies needs-triage and area/* labels to new issues.
  • Labeler — applies area/* labels to PRs based on the paths touched.
  • Welcome — posts a templated message on first-time contributors' issues and PRs.
  • Inactive PR reminder — comments on PRs that have been inactive for 14–30 days.
  • Issue SLAs — labels and comments on issues that have breached priority-tiered SLAs.
  • Lock threads — locks closed issues and PRs after 90 days.

New issue templates for documentation requests and updates are added; the Question template is removed in favor of Discussions; the Bug/Feature templates now require a contributor agreement checkbox and add a component selector.

Bug Fixes & Reliability

  • Fault-Remediation Cancellation Completion Marker (#1335): Fixed a bug where handleCancellationEvent cleared Kubernetes annotations and advanced the change-stream resume token but never wrote faultRemediated back to MongoDB, while the cold-start cancellation query had no faultremediated == nil filter. Together this meant every historical cancellation replayed on every fault-remediation restart, growing monotonically and eventually causing OOM kills. The fix:

    • handleCancellationEvent now calls updateNodeRemediatedStatus(true) after clearing annotations, writing the same completion marker the remediation path already writes.
    • The cold-start cancellation query leg now requires faultremediated == nil, so already-processed cancellations are excluded.
    • The call returns an error (rather than just logging) if the marker write fails, preventing the resume token from advancing without a durable terminal state.
  • Slinky Drainer Annotation Prefix (#1318): Corrected the node annotation prefix used by the Slinky Drainer plugin from [J] [NVSentinel] to [T] [NVSentinel] so automated breakfix is detected with the expected T prefix. Demo documentation updated to match.

Docs Site

  • NVIDIA Global Theme (#1320, #1321): Migrated the Fern docs site from per-repo theme assets to the shared global-theme: nvidia, deleting ~1,126 lines of custom theme code (footer/badge components, NVIDIA SVGs, main.css, and the footer/layout/colors/theme/logo/favicon/js/css blocks in docs.yml). Added multi-source: true to the Fern instance config so the global theme's JS bundle (OneTrust cookie consent SDK) loads alongside the CSS portion. Fern CLI was bumped to 5.30.2 (required for global-theme support).

  • Frozen-Only Versioning (#1319, #1315): All versions in the docs dropdown now serve frozen content from their git tag — the "live docs" entry served from main has been removed. The newest version is stamped "Latest · vX.Y.Z" transiently at publish time. Eliminates duplicate dropdown entries, off-by-one pruning, and the dependency on the GitHub releases API for stamping. Version entries are sorted by semver descending (sort -rV) after insertion, so backport patches like v1.5.1 don't end up above newer releases; registration is now skipped when the publishing tag equals the latest release (the "Latest" stamp already covers it).

  • CI Runner Migration (#1324): Standardized CI runners onto a dedicated linux-amd64-cpu4 flavor to unblock Dependabot PR merging.

Acknowledgments

This release includes contributions from:

Thanks also to @rohansav for diagnosing and authoring the cancellation completion marker fix that was cherry-picked into #1335.

Container Images

See versions.txt for the full list of container images and versions.

Helm Chart

Install with:

helm install nvsentinel oci://ghcr.io/nvidia/nvsentinel \
  --version v1.7.0 \
  --namespace nvsentinel \
  --create-namespace

To upgrade from v1.6.0:

helm upgrade nvsentinel oci://ghcr.io/nvidia/nvsentinel \
  --version v1.7.0 \
  --namespace nvsentinel \
  --reuse-values