Skip to content

Conversation

@gmorpheme
Copy link
Member

Summary

Fixes intermittent CICD release build failures that were causing panics during garbage collection with the error "index out of bounds: the len is 2 but the index is 2".

Root Cause

Objects allocated with alloc_bytes() were storing the padded allocation size (including 16-byte alignment) in their headers, but the garbage collector was using this inflated size for marking memory regions. This caused attempts to mark memory beyond block boundaries during GC, leading to index out of bounds errors in the bitmaps crate.

Example failure case:

  • Object at offset 30608 with reported size 2176 bytes
  • Total: 30608 + 2176 = 32784 > 32768 (block size) ❌

Technical Details

The bug was in heap.rs line 1720:

// BEFORE (incorrect):
let header = AllocHeader::new_with_mark_state(alloc_size as u32, self.mark_state());

// AFTER (fixed):  
let header = AllocHeader::new_with_mark_state(size_bytes as u32, self.mark_state());

The alloc_size includes padding from alloc_size_of() which adds 16-byte alignment, but size_bytes is the actual requested object size.

Why Only in Release Builds?

  • Debug assertions in mark() would catch bounds violations in debug builds
  • Release builds strip debug assertions, allowing invalid indices to reach the bitmaps crate
  • Different optimization characteristics made the race condition more likely in release builds

Changes Made

  • Fix core bug: Store actual requested size in headers, not padded allocation size
  • Add debug assertions: Better bounds checking with clear error messages in mark_region()
  • Remove debug output: Clean up "DEBUG MASTER:" prints that appeared in release builds
  • Add regression test: Comprehensive unit test test_alloc_bytes_header_size_correctness()
  • Improve documentation: Clarify header field purpose

Test Plan

  • Local testing shows harness tests now complete without panics
  • Unit test verifies headers store correct sizes
  • Debug assertions provide clear diagnostics if issues recur
  • All existing tests continue to pass

🤖 Generated with Claude Code

Root cause: Objects allocated with alloc_bytes() were storing the padded
allocation size (including alignment) in their headers, but GC was using
this inflated size for marking, causing attempts to mark memory beyond
block boundaries.

Changes:
- Fix alloc_bytes() to store actual requested size, not padded alloc_size
- Add debug assertions in mark_region() with clear error messages
- Remove debug output from release builds in STG compiler
- Add comprehensive unit test to prevent regression
- Update header field documentation for clarity

The fix resolves intermittent panics during garbage collection that only
occurred in release builds due to optimization differences.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@gmorpheme gmorpheme merged commit 6428cf3 into master Jul 5, 2025
16 checks passed
@gmorpheme gmorpheme deleted the fix/cicd-release-build-failures branch July 5, 2025 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants