Skip to content

slurm and dcgm docs#2564

Open
revmag wants to merge 5 commits intomainfrom
dcgm-slurm
Open

slurm and dcgm docs#2564
revmag wants to merge 5 commits intomainfrom
dcgm-slurm

Conversation

@revmag
Copy link
Collaborator

@revmag revmag commented Jan 21, 2026

  1. Added SLURM docs
  2. Added DCGM docs

@vercel
Copy link

vercel bot commented Jan 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
signoz-web Ready Ready Preview, Comment Jan 22, 2026 5:27am

Request Review

@claude
Copy link
Contributor

claude bot commented Jan 21, 2026

Claude finished @revmag's task in 3m 1s —— View job


Review Complete: SLURM and DCGM Documentation

I've reviewed the PR that adds documentation for SLURM and NVIDIA DCGM metrics monitoring. Both documents provide solid foundations, but there are several areas that need attention to meet the repository's documentation standards.

Summary of Changes

  • New files: 2 new MDX documentation files added
    • nvidia-dcgm-metrics.mdx - NVIDIA GPU monitoring via DCGM Exporter
    • slurm-metrics.mdx - SLURM cluster monitoring via Prometheus exporter
  • Navigation: Both docs properly added to docsSideNav.ts
  • UI Integration: Links added to MetricsQuickStartOverview.tsx component
  • Technical accuracy: Verified against official sources

Key Issues to Address

Both Documents:

  1. Missing Overview sections explaining what DCGM/SLURM are
  2. Missing Next Steps sections to guide users after setup
  3. Pipeline configuration should clarify appending vs replacing

SLURM Documentation:
4. Port correction needed: Default port is 8080, not :8080 in placeholder
5. Prerequisites: Should list specific CLI commands used (sinfo, squeue, sdiag, sacct, sshare)
6. Installation steps missing: No instructions on how to install/deploy the exporter

DCGM Documentation:
7. Prerequisites: Should mention driver/toolkit versions or link to NVIDIA docs
8. Localhost caveat: Needs clarification for containerized environments
9. Command context: Docker command needs explanation before showing it

Docs PR Checklist Status

Based on CONTRIBUTING.md checklist (lines 562-581):

  • ✅ Frontmatter complete (date, id, title, description, doc_type)
  • ✅ SEO keywords present in title and description
  • ⚠️ Commands need more context/explanation (see inline comments)
  • ⚠️ Missing Overview sections (optional per guidelines, but recommended for clarity)
  • ⚠️ Missing Next Steps sections (required per line 573)
  • ✅ Both properly added to sidebar navigation
  • ✅ Dashboard integration included
  • ✅ Troubleshooting sections present
  • ✅ Images would use WebP format (dashboard previews via DashboardActions component)
  • ✅ External links properly formatted with target="_blank" and security attributes

Technical Verification

I verified the technical accuracy against official sources:

Label Added

✅ Added add-to-onboarding label since these are new documentation files for sending metrics to SigNoz

Recommendation

The inline comments provide specific suggestions with code blocks. Please address these to align with the repository's documentation standards before merging.


@github-actions
Copy link
Contributor

📚 Onboarding issue created… see #2565

revmag and others added 3 commits January 21, 2026 12:45
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Removed redundant service configuration in YAML example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant