Thank you for your interest in contributing to NVIDIA AICR! We welcome contributions from developers of all backgrounds and experience levels.
- Code of Conduct
- Getting Started
- How to Contribute
- Design Principles
- Pull Request Process
- Developer Certificate of Origin
- Tips for Contributors
This project follows NVIDIA's commitment to fostering an open and welcoming environment. Please be respectful and professional in all interactions. See CODE_OF_CONDUCT.md for details.
Before contributing:
- Read the README.md to understand the project
- Check existing issues to avoid duplicates
- Review the security policy for security-related contributions
- Set up your development environment following DEVELOPMENT.md
- If using coding assistants, review AGENTS.md for project rules and workflows
- Use the bug report template
- Describe the issue clearly with steps to reproduce
- Include system information (OS, Go version, Kubernetes version)
- Attach logs or screenshots if applicable
- Check if the issue already exists before creating a new one
- Use the feature request template
- Clearly describe the proposed feature and its use case
- Explain how it benefits the project and users
- Provide examples or mockups if applicable
- Fix typos, clarify instructions, or add examples
- Update README.md for user-facing changes
- Update API documentation when endpoints change
- Ensure code comments are accurate and helpful
- Fix bugs, add features, or improve performance
- Follow the development workflow in DEVELOPMENT.md
- Ensure all tests pass and code meets quality standards
- Write tests for new functionality
This project vendors Go dependencies. After changing go.mod or go.sum, run make tidy (which runs go mod vendor) and commit go.mod, go.sum, and the vendor/ directory. CI will fail if vendor/ is out of sync.
AICR uses a validator framework to check cluster state against requirements. To add new validation constraints:
Quick Start:
# Generate all necessary files
make generate-validator ARGS="--constraint Deployment.my-app.version --phase deployment --description 'Validates my-app version'"This creates three files with TODOs guiding implementation:
- Helper functions with validation logic
- Unit tests with table-driven test cases
- Integration test with automatic registration
Next Steps:
- Implement the TODOs in generated files
- Add comprehensive test cases
- Run
make test- registration validation ensures completeness - Submit PR - CI enforces all requirements
See docs/contributor/validator.md for complete guide with examples, architecture overview, and troubleshooting.
These principles guide all design decisions in AICR. When faced with trade-offs, these principles take precedence.
The same tools, same versions, and same validation run locally and in CI.
What: Tool versions are centralized in .settings.yaml. Both make tools-setup (local) and GitHub Actions use this single source of truth. make qualify runs the exact same checks as CI.
Why: "Works on my machine" is not acceptable. If a contributor can run make qualify locally and it passes, CI will pass. This eliminates surprise failures and reduces feedback loops.
The system integrates into how users already work. We provide validated configuration, not a new operational model.
What: AICR outputs standard formats (Helm values, Kubernetes manifests) that work with existing tools (kubectl, ArgoCD, Flux). Users don't need to learn "the AICR way" of deploying.
Why: If adoption requires retraining users on a new workflow, our design has failed. Value comes from correctness, not from lock-in.
Given the same inputs, the same system version must always produce the same result (e.g. recipe, bundle artifacts).
What: No hidden state, no implicit defaults, no non-deterministic behavior. A recipe/bundle/image digest generated using the same version of aicr today must be identical to one generated tomorrow.
Why: Reproducibility is a prerequisite for debugging, validation, and trust. If users can't reproduce a result, they can't trust it.
Validated configuration exists independent of how it is rendered, packaged, or deployed.
What: Recipes define what is correct. Bundlers and deployers determine how to deliver it (Helm, ArgoCD, raw manifests). The recipe doesn't change based on the deployment mechanism.
Why: This prevents tight coupling of correctness to a specific tool, workflow, or delivery mechanism. Users can adopt new deployment tools without re-validating their configurations.
More specific recipes are never matched unless explicitly requested. Generic intent cannot silently resolve to specialized configurations.
What: If a user requests a "training" recipe, they get the training configuration. The system never silently upgrades to a more specific variant (e.g., "training-distributed-horovod") without explicit opt-in.
Why: This prevents accidental misconfiguration and preserves user control. Surprises in infrastructure configuration are dangerous.
Trust is established through evidence, not assertions. Every released artifact carries verifiable proof of origin and build process.
What: All releases include SLSA Build Level 3 provenance, SBOM attestations, and Sigstore signatures. Users can verify exactly which commit, workflow, and build produced any artifact.
Why: This underpins supply-chain security, compliance, and confidence. "Trust us" is not a security model.
-
Ensure all checks pass:
make qualify
-
Update documentation if needed:
- README.md for user-facing changes
- DEVELOPMENT.md for developer workflow changes
- Code comments and godoc for API changes
-
Commit with required provenance:
# External contributors (DCO sign-off required) git commit -s -m "feat: add network collector - Implement NetworkCollector interface - Add unit tests with 80% coverage - Update factory registration Fixes #123" # NVIDIA org members / automation (DCO sign-off exempt) git commit -S -m "feat: add network collector"
External contributors must use
-s. NVIDIA organization members are exempt from DCO bot sign-off checks and should use cryptographic signing (-S).
- Push your branch and open a PR against
main - Fill out the PR template completely:
- Summary: Brief description of changes
- Type of Change: Bug fix, feature, breaking change, etc.
- Testing: What testing was performed
- Checklist: Verify all items
-
Automated Checks run via GitHub Actions:
- Go tests with race detector
- golangci-lint
- YAML linting
- Security scans (Trivy in CI, Grype in
make scan) - Coverage tracking
- E2E tests
-
Maintainer Review covers:
- Correctness and functionality
- Code style and Go idioms
- Test coverage and quality
- Documentation completeness
-
Address Feedback by pushing new commits:
git commit -s -m "address review: improve error handling" # external contributors # or git commit -S -m "address review: improve error handling" # NVIDIA org members / automation git push origin your-branch
-
Merge: Once approved and CI passes, a maintainer will merge
Automated bots manage the lifecycle of issues and pull requests:
| Day | Action |
|---|---|
| 0 | Issue/PR opened, needs-triage label added to issues |
| 14 | Inactive PRs receive a reminder comment |
| 30 | Inactive PRs marked lifecycle/stale |
| 44 | Stale PRs auto-closed |
| 60 | Inactive issues marked lifecycle/stale |
| 74 | Stale issues auto-closed |
| 90+ | Closed issues/PRs locked |
To prevent auto-close: Add the lifecycle/frozen label. PRs with do-not-merge are also exempt.
# Update your local repository
git checkout main
git pull upstream main
# Delete your feature branch
git branch -d your-branch
git push origin --delete your-branchContributions must satisfy Developer Certificate of Origin (DCO) policy. External contributors (non-NVIDIA organization members) must include a DCO sign-off on each commit. NVIDIA organization members are exempt from DCO bot sign-off checks and should use cryptographic signing (-S).
Add the -s flag to your commit:
git commit -s -m "Your commit message"This adds a "Signed-off-by" line:
Signed-off-by: Jane Developer <jane@example.com>
git config user.name "Your Name"
git config user.email "your.email@example.com"If you forget to sign off:
git commit --amend --signoff
git push --force-with-lease origin your-branchNVIDIA organization members are exempt from DCO bot sign-off checks (.github/dco.yml). Use cryptographic commit signing:
git commit -S -m "Your commit message"By signing off, you certify the Developer Certificate of Origin 1.1:
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
Recommended starting points:
- Start with issues labeled
good first issue - Read existing code in the package you're modifying before writing
- Run
make tools-checkto verify your environment - Study the Design Principles section
Good first contributions:
- Documentation improvements (typos, clarifications)
- Adding test cases to existing tests
- Improving error messages with better context
Short summary (50 chars or less)
More detailed explanation if needed. Wrap at 72 characters.
Explain the problem being solved and why this approach was chosen.
- Bullet points are fine
- Use present tense ("Add feature" not "Added feature")
- Reference issues: "Fixes #123" or "Related to #456"
Signed-off-by: Your Name <your@email.com>
- Follow existing patterns in the codebase
- Use
pkg/errorsfor error handling (notfmt.Errorf) - Always check
ctx.Done()in loops and long operations - Write table-driven tests for multiple test cases
- Use functional options for configuration
- GitHub Issues: Create an issue with the "question" label
- Existing Issues: Search for similar questions first
- Recent PRs: Look at merged PRs for examples
- DEVELOPMENT.md - Development setup, architecture, and tooling
- README.md - Project overview and quick start
- docs/README.md - System overview and glossary
- docs/contributor/README.md - Architecture documentation
Thank you for contributing to NVIDIA AICR! Your efforts help improve GPU-accelerated infrastructure for everyone.