DeskRun Agent Documentation

Project Mission & Architecture

Core Purpose

DeskRun is a highly optimized GitHub Actions runner deployment solution designed specifically for bare metal infrastructure with NVMe caching. The primary use case is deploying rubionic-workspace workers that require:

Privileged container access for Nix daemon socket communication
Direct host resource mounting (NVMe drives, cache directories, sockets)
Single-maintainer projects (not highly multi-tenant environments)
Bare metal performance with local NVMe cache optimization

Target Architecture: Cached-Privileged-Kubernetes

The cached-privileged-kubernetes container mode provides:

securityContext:
  privileged: true                    # Full privileged access
  allowPrivilegeEscalation: true      # Can escalate privileges  
  readOnlyRootFilesystem: false       # Can write to filesystem
  runAsNonRoot: false                 # Can run as root

volumeMounts:
  - name: nix-store                   # Direct Nix store access  
    mountPath: /nix/store-host
  - name: nix-daemon-socket           # Nix daemon socket access
    mountPath: /nix/var/nix/daemon-socket-host
  - name: docker-cache                # Docker image cache persistence
    mountPath: /var/lib/docker

Critical Architecture: Workflow-Managed Docker

Key Understanding: DeskRun does NOT provide a Docker daemon. Instead:

Workflows install Docker: The GitHub Actions workflow installs and starts Docker daemon inside the privileged container
Cache volume persists images: /var/lib/docker is mounted to cache Docker images/layers between runs
Privileged mode enables daemon: privileged: true allows workflow to start Docker daemon with full system access
Performance optimization: Cached Docker images dramatically speed up subsequent builds

Example Workflow Pattern:

jobs:
  build:
    runs-on: rubionic-workspace-1
    steps:
      - name: Install Docker
        run: |
          # Install Docker daemon in privileged container
          curl -fsSL https://get.docker.com | sh
          systemctl start docker
      
      - name: Build with cached images
        run: |
          # Docker images are cached in /var/lib/docker volume
          docker build . # Fast due to cached layers

Why This Architecture:

✅ Maximum flexibility: Workflow controls Docker version and configuration
✅ Cache optimization: Docker images persist between workflow runs
✅ No resource overhead: No always-running Docker daemon in runner
✅ Security isolation: Docker daemon only exists during workflow execution

Design Philosophy

1. Bare Metal Optimization

NVMe Cache Integration: Configurable cache paths for mounting high-speed NVMe storage
Direct Hardware Access: Privileged containers can access host hardware optimally
Minimal Overhead: No nested containerization (DinD) - direct execution

2. Rubionic-Workspace Compatibility

Nix Daemon Access: Solves Host daemon socket not found errors
Store Mounting: Direct /nix/store access for package management
Development Workflow Support: Full filesystem access for build artifacts

3. Configuration Agnostic

Flexible Cache Paths: Not hardcoded - configurable via CachePaths array
Adaptable Mounting: Support for hostPath, emptyDir, and custom volume types
Multi-Purpose: While optimized for rubionic-workspace, supports other privileged workloads

4. Single-Maintainer Focus

Simplified Security Model: Privileged access acceptable for trusted single-maintainer projects
Performance Over Isolation: Prioritizes speed and functionality over multi-tenant security
Direct Control: Full host access when needed for development workflows

Architecture Components

Container Mode: `cached-privileged-kubernetes`

Runner Container Features:

privileged: true - Full system access
ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER: false - Direct execution
/home/runner/k8s-novolume/index.js - Optimized hook system
Direct volume mounts for cache, sockets, and storage

Host Integration:

Nix Infrastructure: /nix/store and daemon socket mounting
Docker Image Cache: /var/lib/docker persistence for workflow-installed Docker
Cache Optimization: Configurable NVMe cache paths for builds
Development Tools: Full filesystem access for builds

Cache Configuration

type CachePath struct {
    Source string  // Host path (empty = emptyDir)
    Target string  // Container mount point
}

// Example: NVMe cache optimization
cachePaths := []CachePath{
    {Source: "/nvme/nix-store", Target: "/nix/store"},
    {Source: "/nvme/cache", Target: "/cache"}, 
    {Source: "", Target: "/tmp/build"},  // emptyDir for temporary builds
}

Security Model

Single-Maintainer Trust Assumption:

All projects maintained by same trusted maintainer
Privileged access acceptable for development workflows
Performance and functionality prioritized over isolation
No untrusted code execution expected

Host Protection:

Privileged containers operate within Kubernetes security boundaries
Resource limits and namespace isolation still apply
Network policies and RBAC provide additional controls

Use Cases

Primary: Rubionic-Workspace Development

Workflow Requirements:

Nix package manager with daemon access
Docker daemon installation and image caching
Large build artifacts requiring NVMe cache
Development tool chains needing filesystem access
Container operations with workflow-managed Docker

Performance Optimizations:

NVMe-backed Nix store for fast package access
Persistent Docker image cache for faster builds
Local cache mounting for build artifacts
Direct hardware access for optimal I/O
Minimal containerization overhead

Secondary: High-Performance CI/CD

Suitable for:

Single-maintainer projects requiring privileged access
Build workflows needing host resource access
Development environments requiring full system access
Performance-critical CI/CD pipelines

Not Suitable for:

Multi-tenant environments with untrusted code
Security-focused deployments requiring isolation
Shared infrastructure with multiple maintainers
Compliance environments requiring privilege restrictions

Deployment Considerations

Bare Metal Infrastructure

Recommended Setup:

Kubernetes cluster on bare metal nodes
NVMe drives mounted at consistent paths
Docker runtime with privileged container support
Network storage for shared artifacts (optional)

NVMe Cache Strategy:

# Node preparation
/nvme/cache/nix-store     # Nix package cache
/nvme/cache/builds        # Build artifact cache  
/nvme/cache/docker        # Docker image/layer cache (workflow-installed)
/nvme/cache/cargo         # Rust/Cargo cache
/nvme/cache/npm           # Node.js cache

Configuration Examples

Basic Rubionic-Workspace Setup:

containerMode: cached-privileged-kubernetes
cachePaths:
  - source: "/nvme/nix-store"
    target: "/nix/store-host"
  - source: "/nvme/nix-daemon"  
    target: "/nix/var/nix/daemon-socket-host"
  - source: "/nvme/docker-cache"
    target: "/var/lib/docker"    # Docker image cache for workflow-installed Docker
  - source: "/nvme/cargo-cache"
    target: "/home/runner/.cargo"
  - source: ""
    target: "/tmp/builds"

Security & Trust Model

Assumptions

Single Maintainer: All code and workflows controlled by trusted maintainer
Development Focus: Primarily development/build workloads, not production services
Bare Metal Control: Full control over underlying infrastructure
Performance Priority: Speed and functionality over isolation

Mitigations

Kubernetes Boundaries: Privileged containers still within K8s security model
Resource Limits: CPU, memory, and storage quotas prevent resource exhaustion
Network Policies: Control network access even for privileged containers
Monitoring: Comprehensive logging and monitoring of privileged operations

Risk Acceptance

Privileged Escalation: Accepted for development workflow requirements
Host Access: Required for Nix daemon and cache optimization
Container Breakout: Risk mitigated by single-maintainer trust model
Resource Abuse: Controlled through Kubernetes resource management

Performance Characteristics

NVMe Cache Benefits

Nix Store Access: 10-100x faster package resolution vs network
Build Artifacts: Persistent cache across workflow runs
Docker Images: Local registry cache for base images
Development Tools: Cached toolchains and dependencies

Benchmark Targets

Cold Build: < 2 minutes for typical rubionic-workspace setup
Warm Build: < 30 seconds with full NVMe cache
Package Resolution: < 5 seconds for cached Nix packages
Container Startup: < 10 seconds including cache mounts

Future Considerations

Potential Enhancements

Cache Warming: Pre-populate NVMe caches during node setup
Multi-Cache Strategy: Support for tiered cache (NVMe + network)
Monitoring Integration: Performance metrics for cache hit rates
Auto-Scaling: Dynamic cache allocation based on usage patterns

Architectural Evolution

Hybrid Security: Optional non-privileged mode for less trusted workloads
Multi-Tenant Support: Enhanced isolation for shared infrastructure
Cloud Integration: Adapt optimizations for cloud-based NVMe storage
Container Registry: Built-in registry for cached container images

Development Workflows

ACCEPT_DIFF: Regenerating Expected Test Files

When templates change (overlays, schema, or base templates), the expected test output files need to be regenerated. Use the ACCEPT_DIFF=true environment variable to automatically update expected files from actual test output.

When to use:

After modifying templates in pkg/templates/templates/
When adding new container modes or overlay configurations
When test output intentionally changes due to feature updates

Usage:

# Regenerate expected files for pkg/templates tests
ACCEPT_DIFF=true go test ./pkg/templates/...

# Regenerate expected files for internal/runner tests  
ACCEPT_DIFF=true go test ./internal/runner/...

# Regenerate all expected files
ACCEPT_DIFF=true go test ./...

How it works:

Tests render templates with various configurations
When ACCEPT_DIFF=true is set, actual output overwrites expected files
Expected files are stored in testdata/expected/ directories:
- pkg/templates/testdata/expected/ - processor tests
- internal/runner/template_spec/testdata/expected/ - runner tests

After regenerating:

Review the diffs carefully with git diff
Verify changes are intentional and correct
Run tests again without ACCEPT_DIFF to confirm they pass
Commit the updated expected files with the template changes

Example workflow:

# 1. Make template changes
vim pkg/templates/templates/overlays/container-mode-privileged.yaml

# 2. Regenerate expected files
ACCEPT_DIFF=true go test ./...

# 3. Review changes
git diff

# 4. Verify tests pass
go test ./...

# 5. Commit together
git add -A && git commit -m "feat: update privileged mode template"

Test-Driven Overlay Development

When developing or fixing ytt overlays, use a test-driven approach:

First, modify the expected test files to reflect the desired output
Run tests to confirm they fail - this validates your expected change is correct
Modify the overlay to make the tests pass
Iterate until all tests pass

Example: Removing finalizers from manager ServiceAccount

# 1. Edit expected file to remove the finalizer you want gone
vim pkg/templates/testdata/expected/kubernetes_basic.yaml
# Remove the finalizers section from the manager ServiceAccount

# 2. Run test to see it fail (confirms expected file is correct)
go test ./pkg/templates/... -run TestProcessTemplate/kubernetes-basic -v
# Should show: "one map entry added: finalizers: ..."

# 3. Fix the overlay to match expected output
vim pkg/templates/templates/overlay.yaml
# Add overlay directive to remove finalizers

# 4. Run test again - should pass now
go test ./pkg/templates/... -run TestProcessTemplate/kubernetes-basic -v

# 5. Update ALL expected files and run full test suite
ACCEPT_DIFF=true go test ./pkg/templates/... ./internal/runner/...

# 6. Verify all tests pass
go test ./pkg/templates/... ./internal/runner/...

Key insight: The overlay matching must account for template transformations. The Go processor transforms names like arc-runner-gha-rs-manager into ytt expressions before overlays are applied. Match by labels (e.g., app.kubernetes.io/component) instead of by name when the name is dynamically generated.

Summary

DeskRun prioritizes performance and functionality over isolation for single-maintainer development environments. The cached-privileged-kubernetes mode provides optimal bare metal performance with NVMe cache integration, specifically designed for rubionic-workspace workflows requiring Nix daemon access and high-speed build caching.

The architecture accepts security trade-offs in exchange for development velocity and infrastructure efficiency, making it ideal for trusted single-maintainer projects but unsuitable for multi-tenant or security-focused deployments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeskRun Agent Documentation

Project Mission & Architecture

Core Purpose

Target Architecture: Cached-Privileged-Kubernetes

Critical Architecture: Workflow-Managed Docker

Design Philosophy

1. Bare Metal Optimization

2. Rubionic-Workspace Compatibility

3. Configuration Agnostic

4. Single-Maintainer Focus

Architecture Components

Container Mode: `cached-privileged-kubernetes`

Cache Configuration

Security Model

Use Cases

Primary: Rubionic-Workspace Development

Secondary: High-Performance CI/CD

Deployment Considerations

Bare Metal Infrastructure

Configuration Examples

Security & Trust Model

Assumptions

Mitigations

Risk Acceptance

Performance Characteristics

NVMe Cache Benefits

Benchmark Targets

Future Considerations

Potential Enhancements

Architectural Evolution

Development Workflows

ACCEPT_DIFF: Regenerating Expected Test Files

Test-Driven Overlay Development

Summary

FilesExpand file tree

AGENT.md

Latest commit

History

AGENT.md

File metadata and controls

DeskRun Agent Documentation

Project Mission & Architecture

Core Purpose

Target Architecture: Cached-Privileged-Kubernetes

Critical Architecture: Workflow-Managed Docker

Design Philosophy

1. Bare Metal Optimization

2. Rubionic-Workspace Compatibility

3. Configuration Agnostic

4. Single-Maintainer Focus

Architecture Components

Container Mode: cached-privileged-kubernetes

Cache Configuration

Security Model

Use Cases

Primary: Rubionic-Workspace Development

Secondary: High-Performance CI/CD

Deployment Considerations

Bare Metal Infrastructure

Configuration Examples

Security & Trust Model

Assumptions

Mitigations

Risk Acceptance

Performance Characteristics

NVMe Cache Benefits

Benchmark Targets

Future Considerations

Potential Enhancements

Architectural Evolution

Development Workflows

ACCEPT_DIFF: Regenerating Expected Test Files

Test-Driven Overlay Development

Summary

Container Mode: `cached-privileged-kubernetes`