Skip to content

Conversation

@robobun
Copy link

@robobun robobun commented Jan 17, 2026

Summary

  • Split into two separate jobs: bootstrap and prebuilt
    • Bootstrap job checks if image exists and skips rebuild if cached (completes in ~20s when cached)
    • Prebuilt job depends on bootstrap and builds remaining stages
  • Pre-install node_modules in base stage BEFORE cmake runs
    • Fixes "Cannot find module '@lezer/lr'" race condition
    • The codegen scripts (cppbind.ts) now have all dependencies available when they run
  • Add Windows Defender exclusions and disable real-time monitoring
    • Exclusions for C:\ProgramData\docker and D:\a
    • Exclusions for docker.exe and dockerd.exe processes
  • Increase bootstrap timeout to 120 minutes
  • Use environment variables for registry and image name

Test Results

✅ Workflow run: https://github.com/oven-sh/bun-development-docker-image/actions/runs/21090665080

  • build-bootstrap: 20s (cached)
  • build-prebuilt: 1h37m25s
  • Total: ~1h38m (down from >6h timeout failures)

Key Fixes

Race Condition Fix

The main issue was that CMake's parallel build was running cppbind.ts (step 15) before bun install completed (step 17), causing:

error: Cannot find module '@lezer/lr' from 'C:\workspace\bun\node_modules\.bun\@[email protected]\node_modules\@lezer\cpp\dist\index.js'

Fixed by adding an explicit bun install step in the Dockerfile's base stage, ensuring dependencies are installed before the build runs.

Performance Optimizations

  • Disabling Windows Defender real-time monitoring
  • Adding exclusion paths for Docker directories
  • Docker system prune before builds

🤖 Generated with Claude Code

- Split into two separate jobs: bootstrap and prebuilt
  - Bootstrap job checks if image exists and skips rebuild if cached
  - Prebuilt job depends on bootstrap and builds remaining stages
- Pre-install node_modules in base stage BEFORE cmake runs
  - Fixes "Cannot find module '@lezer/lr'" race condition
  - cppbind.ts codegen now has dependencies available
- Add Windows Defender exclusions and disable real-time monitoring
  - Exclusions for C:\ProgramData\docker and D:\a
  - Exclusions for docker.exe and dockerd.exe processes
- Increase bootstrap timeout to 120 minutes
- Use environment variables for registry and image name

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Jan 17, 2026

Walkthrough

Restructures the Windows build workflow to split jobs with explicit dependencies, introduces hash and date-driven image caching, implements bootstrap image caching logic, and improves space management and resilience. Updates Dockerfile.windows to pre-install node_modules in the base stage before the build step.

Changes

Cohort / File(s) Summary
Workflow job orchestration and caching
.github/workflows/daily-build-windows.yml
Splits build into build-bootstrap and build-prebuilt jobs with explicit dependencies; adds job outputs for bootstrap_hash and date. Introduces REGISTRY and IMAGE_NAME environment variables. Replaces hard-coded repository references with IMAGE_NAME-based tags across all build stages (bootstrap, base, prebuilt, artifacts, run). Adds bootstrap image caching logic with conditional skip branches. Implements dynamic hash/date-based tagging strategy. Extends timeouts and retry policies for increased resilience. Adds space pruning, Defender monitoring toggles, and cache cleanup steps. Introduces new test section validating prebuilt image with dynamically-built test Dockerfile.
Dockerfile dependencies
Dockerfile.windows
Adds pre-installation step in base stage to run bun install with frozen lockfile for node_modules before cmake build, ensuring codegen dependencies are available. Updates build comment to reflect pre-installed node_modules.

Possibly related PRs

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: Windows Docker build improvements' directly relates to the main changes—improving Docker build reliability and performance on Windows by splitting jobs, fixing race conditions, and optimizing Defender settings.
Description check ✅ Passed The description thoroughly explains the changeset: job splitting, node_modules pre-installation to fix race condition, Defender exclusions, timeout increases, and environment variables. It includes test results demonstrating significant performance improvements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/daily-build-windows.yml (1)

424-433: Good smoke test, consider cleanup of test artifacts.

The inline test Dockerfile and version check is a solid smoke test. Consider adding cleanup to remove the test image and Dockerfile.test after the test:

          docker build --isolation=process -f Dockerfile.test -t test-bun .
          docker run --isolation=process --rm test-bun
+          Remove-Item -Force Dockerfile.test -ErrorAction SilentlyContinue
+          docker rmi test-bun -f 2>$null
        shell: pwsh
🤖 Fix all issues with AI agents
In @.github/workflows/daily-build-windows.yml:
- Around line 220-263: The "Clear disk space" step is duplicated between the
build-bootstrap and build-prebuilt jobs; extract it into a reusable composite
action (e.g., .github/actions/cleanup-windows) that runs PowerShell (shell:
pwsh) and accepts an input list of paths and a flag for running docker prune so
each job can pass its specific paths; update the build-bootstrap and
build-prebuilt jobs to call this composite action (preserving behavior such as
Get-WmiObject queries, Directory::Delete with try/catch, Java* deletion, and
docker system prune) so the logic is centralized but still configurable per-job.
- Around line 50-59: The PowerShell step with id "check-bootstrap" incorrectly
assumes docker manifest inspect non-zero exits throw into the try/catch; update
the logic to run docker manifest inspect and then check $LASTEXITCODE (or $?) to
determine success instead of relying on try/catch, set $exists accordingly, and
then write EXISTS to GITHUB_OUTPUT; reference the step id "check-bootstrap" and
the command "docker manifest inspect" when making the change.
- Around line 203-211: The step "Get Bun repo commit SHA" (id: bun-sha)
currently assumes the gh api call returns a value; add error handling to
validate $sha is non-empty and that the gh command succeeded before computing
$shortSha and writing to $GITHUB_OUTPUT. Wrap the gh api call in a try/catch or
check its exit status, emit a clear error and exit non-zero if the API call
fails or $sha is empty, or optionally implement a retry/backoff and/or fallback
tag, and only echo "SHA=$shortSha" when $shortSha is a valid 7-character commit
SHA to avoid producing empty image tags.

Comment on lines +50 to +59
- name: Check if bootstrap image exists
id: check-bootstrap
run: |
$exists = $false
try {
docker manifest inspect ghcr.io/${{ env.IMAGE_NAME }}:bootstrap-amd64-${{ steps.bootstrap-hash.outputs.HASH }} 2>$null
$exists = $true
} catch { }
echo "EXISTS=$exists" >> $env:GITHUB_OUTPUT
shell: pwsh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The docker manifest inspect error handling may not work as expected.

In PowerShell, docker manifest inspect returning a non-zero exit code doesn't automatically throw an exception. The try/catch block will only catch .NET exceptions, not process exit codes.

🔧 Proposed fix using $LASTEXITCODE
       - name: Check if bootstrap image exists
         id: check-bootstrap
         run: |
           $exists = $false
-          try {
-            docker manifest inspect ghcr.io/${{ env.IMAGE_NAME }}:bootstrap-amd64-${{ steps.bootstrap-hash.outputs.HASH }} 2>$null
-            $exists = $true
-          } catch { }
+          docker manifest inspect ghcr.io/${{ env.IMAGE_NAME }}:bootstrap-amd64-${{ steps.bootstrap-hash.outputs.HASH }} 2>$null
+          if ($LASTEXITCODE -eq 0) {
+            $exists = $true
+          }
           echo "EXISTS=$exists" >> $env:GITHUB_OUTPUT
         shell: pwsh
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Check if bootstrap image exists
id: check-bootstrap
run: |
$exists = $false
try {
docker manifest inspect ghcr.io/${{ env.IMAGE_NAME }}:bootstrap-amd64-${{ steps.bootstrap-hash.outputs.HASH }} 2>$null
$exists = $true
} catch { }
echo "EXISTS=$exists" >> $env:GITHUB_OUTPUT
shell: pwsh
- name: Check if bootstrap image exists
id: check-bootstrap
run: |
$exists = $false
docker manifest inspect ghcr.io/${{ env.IMAGE_NAME }}:bootstrap-amd64-${{ steps.bootstrap-hash.outputs.HASH }} 2>$null
if ($LASTEXITCODE -eq 0) {
$exists = $true
}
echo "EXISTS=$exists" >> $env:GITHUB_OUTPUT
shell: pwsh
🤖 Prompt for AI Agents
In @.github/workflows/daily-build-windows.yml around lines 50 - 59, The
PowerShell step with id "check-bootstrap" incorrectly assumes docker manifest
inspect non-zero exits throw into the try/catch; update the logic to run docker
manifest inspect and then check $LASTEXITCODE (or $?) to determine success
instead of relying on try/catch, set $exists accordingly, and then write EXISTS
to GITHUB_OUTPUT; reference the step id "check-bootstrap" and the command
"docker manifest inspect" when making the change.

Comment on lines +203 to +211
- name: Get Bun repo commit SHA
id: bun-sha
run: |
$sha = gh api repos/oven-sh/bun/commits/main --jq .sha
$shortSha = $sha.Substring(0, 7)
echo "SHA=$shortSha" >> $env:GITHUB_OUTPUT
shell: pwsh
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add error handling for the GitHub API call.

If the gh api call fails (rate limiting, network issues), $sha will be empty, causing downstream issues with image tags like base-amd64- (no SHA suffix).

🔧 Proposed fix with validation
       - name: Get Bun repo commit SHA
         id: bun-sha
         run: |
           $sha = gh api repos/oven-sh/bun/commits/main --jq .sha
+          if (-not $sha) {
+            Write-Error "Failed to fetch Bun commit SHA"
+            exit 1
+          }
           $shortSha = $sha.Substring(0, 7)
           echo "SHA=$shortSha" >> $env:GITHUB_OUTPUT
         shell: pwsh
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
🤖 Prompt for AI Agents
In @.github/workflows/daily-build-windows.yml around lines 203 - 211, The step
"Get Bun repo commit SHA" (id: bun-sha) currently assumes the gh api call
returns a value; add error handling to validate $sha is non-empty and that the
gh command succeeded before computing $shortSha and writing to $GITHUB_OUTPUT.
Wrap the gh api call in a try/catch or check its exit status, emit a clear error
and exit non-zero if the API call fails or $sha is empty, or optionally
implement a retry/backoff and/or fallback tag, and only echo "SHA=$shortSha"
when $shortSha is a valid 7-character commit SHA to avoid producing empty image
tags.

Comment on lines +220 to +263
- name: Clear disk space
run: |
$disk = Get-WmiObject -Class Win32_LogicalDisk -Filter "DeviceID='C:'"
$totalGB = [math]::Round($disk.Size / 1GB, 2)
$usedGB = [math]::Round(($disk.Size - $disk.FreeSpace) / 1GB, 2)
Write-Host "Disk space before cleanup: $usedGB GB used / $totalGB GB total" -ForegroundColor Yellow

$paths = @(
"C:\Program Files\dotnet",
"C:\Program Files (x86)\dotnet",
"C:\Android",
"C:\Program Files (x86)\Android",
"$Env:AGENT_TOOLSDIRECTORY\CodeQL",
"$Env:AGENT_TOOLSDIRECTORY\Ruby",
"$Env:AGENT_TOOLSDIRECTORY\go",
"$Env:AGENT_TOOLSDIRECTORY\Python",
"$Env:AGENT_TOOLSDIRECTORY\PyPy",
"C:\Program Files\MySQL",
"C:\Program Files\PostgreSQL",
"C:\Rust",
"C:\vcpkg",
"C:\msys64",
"C:\Strawberry",
"C:\Program Files\Mozilla Firefox",
"C:\Julia",
"C:\Miniconda",
"C:\Program Files\R",
"C:\Program Files\Microsoft SQL Server"
)
foreach ($path in $paths) {
if (Test-Path $path) {
try { [System.IO.Directory]::Delete($path, $true) } catch { }
}
}
Get-ChildItem "$Env:AGENT_TOOLSDIRECTORY" -Filter "Java*" -Directory -ErrorAction SilentlyContinue | ForEach-Object {
try { [System.IO.Directory]::Delete($_.FullName, $true) } catch { }
}

docker system prune -af 2>$null

$diskAfter = Get-WmiObject -Class Win32_LogicalDisk -Filter "DeviceID='C:'"
$usedGBAfter = [math]::Round(($diskAfter.Size - $diskAfter.FreeSpace) / 1GB, 2)
Write-Host "Reclaimed $([math]::Round(($usedGB - $usedGBAfter), 2)) GB" -ForegroundColor Green
shell: pwsh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider extracting disk cleanup into a reusable composite action.

The cleanup logic is duplicated between build-bootstrap (lines 62-139) and build-prebuilt (lines 220-263). While the lists differ slightly, extracting this into a composite action would improve maintainability.

This is low priority since:

  1. Each job runs on a separate ephemeral runner (duplication is necessary)
  2. The lists are intentionally different (bootstrap needs more aggressive cleanup for VS install)
🤖 Prompt for AI Agents
In @.github/workflows/daily-build-windows.yml around lines 220 - 263, The "Clear
disk space" step is duplicated between the build-bootstrap and build-prebuilt
jobs; extract it into a reusable composite action (e.g.,
.github/actions/cleanup-windows) that runs PowerShell (shell: pwsh) and accepts
an input list of paths and a flag for running docker prune so each job can pass
its specific paths; update the build-bootstrap and build-prebuilt jobs to call
this composite action (preserving behavior such as Get-WmiObject queries,
Directory::Delete with try/catch, Java* deletion, and docker system prune) so
the logic is centralized but still configurable per-job.


steps:
- name: Checkout repository
uses: actions/checkout@v4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uses: actions/checkout@v4
uses: actions/checkout@v6

https://github.com/actions/checkout?tab=readme-ov-file#checkout-v6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants