Skip to content

docs: end-to-end installation guide and prerequisite gap fixes#512

Open
vigviswa wants to merge 3 commits intoNVIDIA:mainfrom
vigviswa:docs/end-to-end-installation-guide
Open

docs: end-to-end installation guide and prerequisite gap fixes#512
vigviswa wants to merge 3 commits intoNVIDIA:mainfrom
vigviswa:docs/end-to-end-installation-guide

Conversation

@vigviswa
Copy link

Description

Add an end-to-end installation guide and fix documentation gaps that are blocking external customers from deploying Carbide.

Changes:

  • New: book/src/manuals/installation-guide.md -- 10-step deployment guide following the validated sequence used by SA/engineering teams, linking to existing docs and filling gaps (REST component deploy order, Vault setup, troubleshooting)
  • Updated: helm/PREREQUISITES.md -- Added Vault PKI configuration (forgeca mount, forge-cluster role, K8s auth, policies), AppRole/token generation steps, PostgreSQL user guidance, and Temporal section
  • Updated: book/src/manuals/building_bmm_containers.md -- Added container image summary table, registry tagging/push instructions, and REST container build section
  • Updated: book/src/manuals/site-setup.md -- Replaced nvcr.io/nvidian references with build-from-source instructions
  • Updated: book/src/SUMMARY.md and README.md -- Added links to the new guide

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • [ x ] Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Fixes #476

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • [x ] No testing required (docs, internal refactor, etc.)

Additional Notes

Overlaps with #479 on site-setup.md. This PR goes further by adding build-from-source links alongside the registry placeholder format. Based on real customer questions from partner deployments (Vault AppRole/token generation, PostgreSQL user setup, Temporal requirement, nvcr.io access).

vigviswa and others added 3 commits March 10, 2026 15:50
- Add book/src/manuals/installation-guide.md: 10-step deployment guide
  stitching together existing docs and filling gaps (Vault commands,
  Temporal setup, admin-cli build, Elektra OTP bootstrap, verification)
- Update building_bmm_containers.md: add image summary table, tagging/
  pushing section (auth before tag/push), REST image build steps, fix
  typo "perfrom" and stray backtick in tar command
- Update site-setup.md: replace nvcr.io/nvidian internal image refs with
  <YOUR_REGISTRY> placeholders and build-from-source links (fixes NVIDIA#476)
- Update helm/PREREQUISITES.md: add Vault PKI engine/role/auth/policy
  commands, explicit carbide DB/user requirements, pg extensions, and
  new Temporal section (optional for core, required for REST)
- Update book/src/SUMMARY.md: add installation guide entry, fix broken
  faqs.md link (file is faq.md)
- Update README.md: add installation guide link in Getting Started

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add step-by-step instructions for obtaining VAULT_ROLE_ID,
VAULT_SECRET_ID, and VAULT_TOKEN from Vault. These values were
previously listed as required but with no explanation of how to
generate them -- customers were blocked at this step.

Made-with: Cursor
@vigviswa vigviswa requested a review from a team as a code owner March 10, 2026 20:58
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.


**NOTE**: The `CONTAINER_RUNTIME_AARCH64=alpine:latest` build argument must be included. The aarch64 binaries are bundled into an x86 container.

## Tagging and Pushing to a Private Registry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the tagging and pushing of containers to a registry into its own document, independent of container building.

Comment on lines +207 to +210
### Prerequisites

* Go 1.25.4 or later
* Docker 20.10+ with BuildKit enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved up to the top of the doc so its appears with the other PreReqs - https://github.com/NVIDIA/bare-metal-manager-core/blob/main/book/src/manuals/building_bmm_containers.md?plain=1#L14

Comment on lines +212 to +216
### Build all REST images

```sh
cd bare-metal-manager-rest
make docker-build IMAGE_REGISTRY=$REGISTRY IMAGE_TAG=$TAG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +201 to +205
## Building BMM REST Containers

The BMM REST components (cloud-api, cloud-workflow, site-manager, site-agent,
db migrations, cert-manager) are built from the
[bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest) repository.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC]: Site Setup Guide refers to a number of BMM components from nvcr.io. How to get access to these locally

2 participants