Skip to content

Fix NVMe raw_instance_storage device enumeration for all instance families#196

Draft
neddp wants to merge 8 commits intomasterfrom
fix-nvme-raw-instance-storage
Draft

Fix NVMe raw_instance_storage device enumeration for all instance families#196
neddp wants to merge 8 commits intomasterfrom
fix-nvme-raw-instance-storage

Conversation

@neddp
Copy link
Copy Markdown
Member

@neddp neddp commented Jan 14, 2026

The raw_instance_storage feature was causing VM boot failures and timeouts on NVMe-based instances (i3, i3en, i4i, c6id, m6id, r6id, etc.) when attempting to use AWS instance storage as raw ephemeral disks.

Root Cause

NVMe device enumeration order is non-deterministic on AWS Nitro instances. The kernel discovers NVMe devices based on PCIe enumeration order, which varies between boots and instance types. This means:

  • /dev/nvme0n1 might be the root EBS volume OR instance storage
  • /dev/nvme1n1 might be instance storage OR the root EBS volume
  • The order is not guaranteed and can change

The previous implementation made a critical incorrect assumption:

  • Assumed /dev/nvme0n1 and /dev/nvme1n1 were always instance storage on i3/i3en instances
  • Hardcoded device paths without runtime discovery
  • When the agent attempted to partition what it thought was instance storage, it could corrupt the root disk → boot failure

Additionally, the CPI only handled i3/i3en instance families correctly, causing issues with newer NVMe instance types (i4i, c6id, m6id, r6id, etc.).

Solution

Implemented agent-side runtime discovery using AWS-maintained EBS volume symlinks:

How It Works

  1. CPI side (simplified):

    • Generates sequential device hints: /dev/nvme0n1, /dev/nvme1n1, etc.
    • Hints are informational only for NVMe instances
    • Sends the correct count of expected instance storage devices
  2. Agent side (new discovery logic):

    • Globs all NVMe devices: /dev/nvme*n1
    • Identifies EBS volumes via symlinks: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol*
    • Resolves symlinks to find which NVMe devices are EBS
    • Subtracts EBS volumes from all NVMe devices = instance storage
    • Partitions only the discovered instance storage devices

Potentially fixes #155

This must be merged together with the Agent changes - cloudfoundry/bosh-agent#396

neddp added 5 commits January 14, 2026 14:07
Update test to expect /dev/nvme2n1 (instance storage) instead of
/dev/nvme0n1 (root device). This aligns with the corrected NVMe
device enumeration logic:
- nvme0n1: root EBS
- nvme1n1: ephemeral EBS (when configured)
- nvme2n1+: instance storage devices
rkoster
rkoster previously approved these changes Jan 22, 2026
@github-project-automation github-project-automation Bot moved this from Inbox to Pending Merge | Prioritized in Foundational Infrastructure Working Group Jan 22, 2026
@rkoster rkoster requested review from a team, KauzClay and anshrupani and removed request for a team January 22, 2026 15:57
@rkoster rkoster moved this from Pending Merge | Prioritized to Pending Review | Discussion in Foundational Infrastructure Working Group Jan 22, 2026
@neddp neddp marked this pull request as draft February 2, 2026 07:26
neddp added 2 commits February 2, 2026 14:36
Device paths for NVMe raw ephemeral disks now start at nvme0n1 instead
of nvme2n1, as the agent performs runtime discovery and the hints are
informational only.
@neddp
Copy link
Copy Markdown
Member Author

neddp commented Feb 2, 2026

After discussing this on the community meeting, some people noted that the way we hardcode the NVMe numbering is dangerous since it is not guaranteed that the EBS volumes will be on the same ones every time.
Since there is no way to check this in the CPI code, I have added this logic to the agent code. The PR description has been updated with the new implementation.

@neddp neddp marked this pull request as ready for review February 2, 2026 13:08
@fmoehler
Copy link
Copy Markdown
Contributor

I did not entirely understand what is the issue, so maybe you can just ignore my comment.

I am just wondering if this might have been fixed already by cloudfoundry/bosh-linux-stemcell-builder#462 ?

At least from the issue description it looks like the exact same issue that I investigated some time ago.

@a-hassanin
Copy link
Copy Markdown
Contributor

I did not entirely understand what is the issue, so maybe you can just ignore my comment.

I am just wondering if this might have been fixed already by cloudfoundry/bosh-linux-stemcell-builder#462 ?

At least from the issue description it looks like the exact same issue that I investigated some time ago.

You might be right here. We did not have a bosh director release since you reverted the PR cloudfoundry/bosh-agent#391. I guess that is still the root cause of this issue ? @neddp @fmoehler

@neddp
Copy link
Copy Markdown
Member Author

neddp commented Feb 16, 2026

Hi @fmoehler,

Thank you for pointing out PR cloudfoundry/bosh-linux-stemcell-builder#462!

That PR fixes EBS volume identification (volumes with AWS metadata). This PR addresses instance storage discovery (see #155 for reference), which cannot use the same approach because instance storage volumes have no AWS metadata. The two PRs are complementary.

@beyhan beyhan requested a review from rkoster March 5, 2026 15:55
@rkoster
Copy link
Copy Markdown
Contributor

rkoster commented Mar 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

📝 Walkthrough

Walkthrough

Refactored NVMe instance storage device mapping to use virtualization-type-aware device name generation instead of hardcoded instance family checks. Introduced new raw_ephemeral_device_name method handling NVMe, paravirtual, and HVM instances separately; removed legacy sequential device chaining methods.

Changes

Cohort / File(s) Summary
Block Device Manager Implementation
src/bosh_aws_cpi/lib/cloud/aws/block_device_manager.rb
Updated NVME_INSTANCE_FAMILIES constant; replaced explicit i3/i3en checks with requires_nvme_device() call in mappings(). Refactored raw ephemeral device generation: removed first_raw_ephemeral_device and next_raw_ephemeral_disk methods; replaced with new raw_ephemeral_device_name(index, requires_nvme) method that generates /dev/nvme#{index}n1, /dev/sd#{...}, or /dev/xvdb#{...} paths based on virtualization type.
Block Device Manager Tests
src/bosh_aws_cpi/spec/unit/block_device_manager_spec.rb
Replaced hardcoded NVMe instance type test case with parameterized loop over NVME_INSTANCE_FAMILIES. Updated raw_ephemeral device hint expectations from fixed values to dynamically computed based on disk mapping. Preserved EBS and ephemeral agent entry validations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through storage lanes,
Where NVMe devices dance in chains,
No more hardcoded families to bind,
Dynamic paths of every kind!
Ephemeral hints now flow so free,
For every instance type, you see! 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing NVMe raw instance storage device enumeration to support all NVMe instance families, not just i3/i3en.
Description check ✅ Passed The description is directly related to the changeset, explaining the root cause of NVMe device enumeration failures and the solution implemented via agent-side runtime discovery.
Linked Issues check ✅ Passed The PR addresses issue #155 by implementing support for NVMe instance storage discovery across multiple instance families (i3, i3en, i4i, c6id, m6id, r6id) and enabling agent-side runtime discovery to avoid hardcoded device paths.
Out of Scope Changes check ✅ Passed All changes are within scope: refactoring NVMe instance family detection, implementing sequential device hint generation, and updating tests to validate behavior across all NVMe instance families.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-nvme-raw-instance-storage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bosh_aws_cpi/lib/cloud/aws/block_device_manager.rb`:
- Around line 177-188: In raw_ephemeral_device_name, replace the magic numbers
99 and 97 used to compute disk letters with character literals (e.g., 'c'.ord
and 'a'.ord) so the intent is explicit; update the branches that build
"/dev/sd#{(99 + index).chr}" and "/dev/xvdb#{(97 + index).chr}" to compute the
base ordinal from 'c' and 'a' respectively using `@virtualization_type` and index,
leaving the nvme and error branches unchanged.
- Around line 6-9: Update the NVME_INSTANCE_FAMILIES constant in
block_device_manager.rb to include the missing Nitro-based families by adding
the following identifiers to the array: c7g c8a c8gb c8gn c8i c8id c8i-flex m7g
m8a m8azn m8gb m8gn m8i m8id m8i-flex r7g r8a r8gb r8gn r8i r8id r8i-flex i4g
i7i i7ie i8g i8ge g7e p6-b200 p6-b300 trn2 trn2u; alternatively, implement a
runtime NVMe detection fallback in the code paths that use
NVME_INSTANCE_FAMILIES (e.g., methods referencing NVME_INSTANCE_FAMILIES in
block_device_manager.rb) so unknown families on Nitro are detected by checking
/dev/nvme* presence instead of relying solely on the static list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e16af91d-7bbd-4b84-b8e1-62d180f1d166

📥 Commits

Reviewing files that changed from the base of the PR and between d790df6 and 5d94f59.

📒 Files selected for processing (2)
  • src/bosh_aws_cpi/lib/cloud/aws/block_device_manager.rb
  • src/bosh_aws_cpi/spec/unit/block_device_manager_spec.rb

Comment on lines +6 to +9
# Instance families that use NVMe device naming (/dev/nvme*).
# This includes Nitro-based instances and some Xen-based instances with NVMe storage (e.g., i3 family).
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html#ec2-nitro-instances
NVME_INSTANCE_FAMILIES = %w[a1 c5 c5a c5ad c5d c5n c6a c6g c6gd c6gn c6i c6id c6in c7i c7a d3 d3en g4dn g4ad g5 g6 g6e i3en i4i inf1 m5 m5a m5ad m5d m5dn m5n m5zn m6a m6g m6gd m6i m6id m6idn m6in m7i m7a m7i-flex p3dn p4d p5 r5 r5a r5ad r5b r5d r5dn r5n r6a r6g r6gd r6i r6in r6id r6idn r7i r7a r7iz t3 t3a t4g z1d x2iezn].freeze
NVME_INSTANCE_FAMILIES = %w[a1 c5 c5a c5ad c5d c5n c6a c6g c6gd c6gn c6i c6id c6in c7i c7a d3 d3en g4dn g4ad g5 g6 g6e i3 i3en i4i inf1 m5 m5a m5ad m5d m5dn m5n m5zn m6a m6g m6gd m6i m6id m6idn m6in m7i m7a m7i-flex p3dn p4d p5 r5 r5a r5ad r5b r5d r5dn r5n r6a r6g r6gd r6i r6in r6id r6idn r7i r7a r7iz t3 t3a t4g z1d x2iezn].freeze
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

AWS EC2 Nitro instance types 2025

💡 Result:

AWS EC2 Nitro instance types encompass all current-generation instances built on the AWS Nitro System, which powers the majority of modern EC2 offerings as of 2026. Xen-based instances are legacy and not Nitro-based. The official AWS documentation lists Nitro-based instances across categories. Nitro versions range from v2 to v6, with newer 2025 announcements on v6 (e.g., M8gn, M8gb, C8a, R8i, I7i, I8ge). Key lists from docs: Virtualized Nitro instances (Nitro v6 highlighted as latest): - General Purpose: M8a, M8azn, M8gb, M8gn, M8i, M8id, M8i-flex; earlier like M5*, M6*, M7*, T3/T3a/T4g - Compute Optimized: C8a, C8gb, C8gn, C8i, C8id, C8i-flex; earlier C5*, C6*, C7* - Memory Optimized: R8a, R8gb, R8gn, R8i, R8id, R8i-flex, X8aedz, X8i; earlier R5*, R6*, R7*, U*, X* - Storage Optimized: I8ge; earlier I3en, I4*, I7* - Accelerated: G7e, P6-B200, P6-B300; many others like G4*, G5*, G6*, Inf*, P4*, P5*, Trn* - HPC: Hpc8a; earlier Hpc6*, Hpc7* Bare metal Nitro instances mirror virtualized but dedicated hardware, e.g., M8i.metal, C8i.metal, etc. 2025 Nitro highlights: M8gn/M8gb (Dec 2025, Graviton4, up to 600Gbps net), C8a (Dec 2025, AMD EPYC), R8i/R8i-flex (Sep 2025, Intel Xeon6), I7i (Apr 2025, storage opt), I8ge (Aug 2025). Full current Nitro-based from UserGuide: M5/M5a/M5ad/M5d/M5dn/M5n/M5zn, M6a/M6g/M6gd/M6i/M6id/M6idn/M6in, M7a/M7g/M7gd/M7i/M7i-flex/M8*, T3/T3a/T4g (general); similar for other categories up to *8 series; storage D3/D3en/I3en/I4g/I4i/I7i/I7ie/I8g/I8ge; accel DL1/DL2q/G4ad/G4dn/G5/G5g/G6/G6e/G6f/Gr6/Gr6f/G7e/Inf1/Inf2/P4d/P4de/P5/P5e/P5en/P6-B200/P6-B300/P6e-GB200/Trn1/Trn1n/Trn2/Trn2u/VT1; HPC Hpc6a/Hpc6id/Hpc7a/Hpc7g/Hpc8a. All post-2017 launches are Nitro-based. For complete specs/sizes/regions, see AWS docs.

Citations:


Update NVME_INSTANCE_FAMILIES to include current Nitro instance families.

The list is missing multiple instance families released in 2024-2025. According to AWS documentation, missing Nitro-based families include:

  • Compute: c7g, c8a, c8gb, c8gn, c8i, c8id, c8i-flex
  • General Purpose: m7g, m8a, m8azn, m8gb, m8gn, m8i, m8id, m8i-flex
  • Memory: r7g, r8a, r8gb, r8gn, r8i, r8id, r8i-flex
  • Storage: i4g, i7i, i7ie, i8g, i8ge
  • Accelerated: g7e, p6-b200, p6-b300
  • Training: trn2, trn2u

Without these families, the CPI will generate incorrect device paths on new instance types. Consider either adding these families to the list, or implementing a safer fallback mechanism for unknown instance families on Nitro (e.g., detect NVMe device naming at runtime).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bosh_aws_cpi/lib/cloud/aws/block_device_manager.rb` around lines 6 - 9,
Update the NVME_INSTANCE_FAMILIES constant in block_device_manager.rb to include
the missing Nitro-based families by adding the following identifiers to the
array: c7g c8a c8gb c8gn c8i c8id c8i-flex m7g m8a m8azn m8gb m8gn m8i m8id
m8i-flex r7g r8a r8gb r8gn r8i r8id r8i-flex i4g i7i i7ie i8g i8ge g7e p6-b200
p6-b300 trn2 trn2u; alternatively, implement a runtime NVMe detection fallback
in the code paths that use NVME_INSTANCE_FAMILIES (e.g., methods referencing
NVME_INSTANCE_FAMILIES in block_device_manager.rb) so unknown families on Nitro
are detected by checking /dev/nvme* presence instead of relying solely on the
static list.

Comment on lines +177 to 188
def raw_ephemeral_device_name(index, requires_nvme)
if requires_nvme
# Simple sequential hints - agent will discover actual devices via EBS symlink exclusion
"/dev/nvme#{index}n1"
elsif @virtualization_type == 'paravirtual'
"/dev/sd#{(99 + index).chr}" # 99 is 'c'.ord - starts at sdc, sdd, sde...
elsif @virtualization_type == 'hvm'
"/dev/xvdb#{(97 + index).chr}" # 97 is 'a'.ord - starts at xvdba, xvdbb...
else
raise Bosh::Clouds::CloudError, "unknown virtualization type #{@virtualization_type}"
end
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider using character literals instead of magic numbers for clarity.

While the comments explain the values, using 'c'.ord and 'a'.ord directly would be more self-documenting and reduce the risk of comment drift.

♻️ Proposed refactor
 def raw_ephemeral_device_name(index, requires_nvme)
   if requires_nvme
     # Simple sequential hints - agent will discover actual devices via EBS symlink exclusion
     "/dev/nvme#{index}n1"
   elsif `@virtualization_type` == 'paravirtual'
-    "/dev/sd#{(99 + index).chr}" # 99 is 'c'.ord - starts at sdc, sdd, sde...
+    "/dev/sd#{('c'.ord + index).chr}" # starts at sdc, sdd, sde...
   elsif `@virtualization_type` == 'hvm'
-    "/dev/xvdb#{(97 + index).chr}" # 97 is 'a'.ord - starts at xvdba, xvdbb...
+    "/dev/xvdb#{('a'.ord + index).chr}" # starts at xvdba, xvdbb...
   else
     raise Bosh::Clouds::CloudError, "unknown virtualization type #{`@virtualization_type`}"
   end
 end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bosh_aws_cpi/lib/cloud/aws/block_device_manager.rb` around lines 177 -
188, In raw_ephemeral_device_name, replace the magic numbers 99 and 97 used to
compute disk letters with character literals (e.g., 'c'.ord and 'a'.ord) so the
intent is explicit; update the branches that build "/dev/sd#{(99 + index).chr}"
and "/dev/xvdb#{(97 + index).chr}" to compute the base ordinal from 'c' and 'a'
respectively using `@virtualization_type` and index, leaving the nvme and error
branches unchanged.

@github-project-automation github-project-automation Bot moved this from Pending Review | Discussion to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Mar 27, 2026
@rkoster rkoster marked this pull request as draft April 9, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Waiting for Changes | Open for Contribution

Development

Successfully merging this pull request may close these issues.

Direct attached storage disk behavior.

6 participants