Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
2a73b07
docs: add BEP-1049 Kata Containers Agent Backend proposal
kyujin-cho Feb 27, 2026
1bc7b4a
update(BEP-1049): comprehensive intrinsic mount evaluation for Kata
kyujin-cho Mar 3, 2026
44cfe8f
update(BEP-1049): detailed virtio-fs performance analysis with benchm…
kyujin-cho Mar 3, 2026
dcf4521
update(BEP-1049): address PR review comments
kyujin-cho Mar 3, 2026
9b84040
docs(BEP-1049): add per-VM cloned disk hybrid storage model analysis
kyujin-cho Mar 3, 2026
c67393a
update(BEP-1049): rewrite with accurate kernel runner architecture
kyujin-cho Mar 3, 2026
bc1615a
fix(BEP-1049): correct technical inaccuracies from proofing review
kyujin-cho Mar 3, 2026
30bfa9d
update(BEP-1049): resolve all open questions with CoCo-by-default pre…
kyujin-cho Mar 26, 2026
883a10c
refactor(BEP): rename BEP-1049 to BEP-1051
kyujin-cho Mar 26, 2026
70529ce
Merge branch 'main' into docs/bep-1051-kata-containers-agent
kyujin-cho Mar 26, 2026
2e9a0d0
this will be merged from main later on
kyujin-cho Mar 26, 2026
9007917
Merge branch 'main' into docs/bep-1051-kata-containers-agent
kyujin-cho Mar 26, 2026
1e241f1
fix(BEP-1051): align sub-documents with CoCo-by-default, direct guest…
kyujin-cho Mar 26, 2026
05c454a
fix(BEP-1051): fix mermaid sequence diagram syntax for GitHub rendering
kyujin-cho Mar 26, 2026
39db6ed
update(BEP-1051): supplement technical implementation details across …
kyujin-cho Mar 26, 2026
7424a68
refactor(BEP-1051): replace ASCII art diagrams with mermaid across al…
kyujin-cho Mar 26, 2026
88c67d7
fix(BEP-1051): clarify /home/work is guest-disk directory, not virtio…
kyujin-cho Mar 26, 2026
4210a5c
fix(BEP-1051): trim verbose agent socket sections; consolidate skippe…
kyujin-cho Mar 26, 2026
830351a
fix(BEP-1051): fix docker-creds.json and gather_container_measures st…
kyujin-cho Mar 26, 2026
f3fa149
add missing files
kyujin-cho Mar 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,066 changes: 1,066 additions & 0 deletions docs/reports/kata-containers-feature-parity-analysis.md

Large diffs are not rendered by default.

346 changes: 346 additions & 0 deletions proposals/BEP-1051-kata-containers-agent.md

Large diffs are not rendered by default.

346 changes: 346 additions & 0 deletions proposals/BEP-1051/configuration-deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,346 @@
<!-- context-for-ai
type: detail-doc
parent: BEP-1051 (Kata Containers Agent Backend)
scope: Agent configuration schema for Kata backend and host deployment requirements
depends-on: []
key-decisions:
- New [kata] section in agent unified config
- Cloud Hypervisor as default hypervisor, QEMU as fallback
- Config validated at agent startup with fail-fast on missing paths
-->

# BEP-1051: Configuration and Deployment

## Summary

This document defines the configuration schema additions for the Kata backend, including hypervisor selection, VFIO settings, VM resource defaults, and host deployment prerequisites. The `[kata]` section is added to the agent's unified config (`AgentUnifiedConfig` at `src/ai/backend/agent/config/unified.py`).

## Current Design

The agent backend is selected via `local_config.agent_common.backend`, which reads the `AgentBackend` enum (`src/ai/backend/agent/types.py:36`). The `get_agent_discovery()` function (`types.py:84`) dynamically imports the backend package.

Docker-specific configuration lives in the `[container]` section:

```toml
[container]
stats-type = "docker"
sandbox-type = "jail"
scratch-root = "/tmp/scratches"
```

There is no VM or hypervisor configuration in the current schema.

## Proposed Design

### Backend Selection

```toml
[agent]
backend = "kata" # "docker" | "kubernetes" | "kata"
```

When `backend = "kata"`, the agent imports `ai.backend.agent.kata` and instantiates `KataAgentDiscovery`.

### Kata Configuration Section

```toml
[kata]
# --- Hypervisor ---
hypervisor = "cloud-hypervisor" # "cloud-hypervisor" | "qemu" | "dragonball"

# --- VM Defaults ---
default-vcpus = 2 # Initial vCPUs per VM (hot-plugged as needed)
default-memory-mb = 2048 # Initial memory per VM in MB
vm-overhead-mb = 64 # Per-VM VMM process + guest kernel + kata-agent overhead (MB), deducted from host capacity

# --- Guest VM Image (NOT the container image — see note below) ---
kernel-path = "/opt/kata/share/kata-containers/vmlinux.container"
initrd-path = "" # Empty = use rootfs image instead of initrd
rootfs-path = "/opt/kata/share/kata-containers/kata-containers.img"

# --- Storage ---
shared-fs = "virtio-fs" # "virtio-fs" | "virtio-9p" (9p deprecated)
virtiofsd-path = "/opt/kata/libexec/virtiofsd"
virtio-fs-cache-size = 0 # DAX window in MB; 0 = disabled (recommended default)

# --- Networking ---
network-model = "tcfilter" # "tcfilter" | "macvtap"

# --- VFIO ---
enable-iommu = true
hotplug-vfio = "root-port" # "root-port" | "bridge-port" | "no-port"

# --- Containerd ---
containerd-socket = "/run/containerd/containerd.sock"
kata-runtime-class = "kata" # RuntimeClass name registered in containerd

# --- Confidential Computing (CoCo-by-default) ---
confidential-guest = true # CoCo is always enabled; no non-CoCo mode
guest-attestation = "tdx" # "tdx" | "sev-snp" (required)
kbs-endpoint = "http://kbs.internal:8080" # Key Broker Service for sealed secrets

# --- Storage (guest-side direct mount) ---
kernel-modules = ["nfs", "nfsv4"] # Modules loaded during create_sandbox()
guest-hook-path = "/usr/share/oci/hooks" # OCI prestart hook path in guest rootfs

[[kata.storage-mounts]]
fs-type = "nfs4"
source = "storage-server:/exports/vfstore"
mountpoint = "/mnt/vfstore"
options = "vers=4.1,rsize=1048576,wsize=1048576"

[[kata.storage-mounts]]
fs-type = "lustre"
source = "lustre-mgs@tcp:/scratch"
mountpoint = "/mnt/lustre"
options = ""

[kata.storage-nic]
device = "ib0"
address = "10.0.100.5/24"
gateway = "10.0.100.1"
```

### Guest VM Image vs Container Image

Kata Containers uses **two separate filesystem layers** that must not be confused:

1. **Guest VM rootfs** (`rootfs-path` above): A minimal mini-OS image containing only the kata-agent, systemd, and essential utilities. This is the VM's boot disk — shared across all VMs, read-only, and mounted via DAX on a `/dev/pmem*` device inside the guest. It is **not** the user's container image. This is an infrastructure-level asset analogous to a VM template.

2. **Container image** (e.g., `cr.backend.ai/stable/python-tensorflow:2.15-py312-cuda12.3`): The user-selected OCI image that Backend.AI's image management system resolves. Under CoCo-by-default, images are pulled and decrypted **inside the guest** using `image-rs` — the host is untrusted and must not see image contents. Registry credentials are delivered via KBS (Key Broker Service) after remote attestation.

```mermaid
flowchart TB
boot["Guest VM boots from kata-containers.img (attested rootfs)"]
boot --> attest["TEE attestation completes"]
attest --> pull["image-rs pulls encrypted OCI image from registry"]
pull --> decrypt["Decrypts image with keys from KBS (attestation-gated)"]
decrypt --> mount["kata-agent mounts container rootfs inside guest"]
mount --> run["Container process runs on OCI image filesystem"]
```

**Image management changes for CoCo:** The image registry and image selection flow are unchanged (manager resolves image references identically). The pull mechanism differs — instead of host-side containerd pull, `image-rs` inside the guest TEE handles pull + integrity verification + decryption. The agent cannot verify image presence on the host; it passes the image reference to the Kata shim and waits for the kernel runner ZMQ connection.

Comment on lines +44 to +123
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc is written as if non-CoCo Kata is the baseline (host-side image pull, confidential-guest=false, CoCo as “Phase 4”), but the master BEP states CoCo-by-default and guest-side image pull (host must not see image contents). Please reconcile the configuration defaults and the “Guest VM Image vs Container Image” flow so the baseline matches the CoCo trust model (or clearly split the doc into CoCo vs non-CoCo modes).

Copilot uses AI. Check for mistakes.
### Pydantic Config Model

```python
class KataConfig(BaseConfigSchema):
hypervisor: Literal["cloud-hypervisor", "qemu", "dragonball"] = Field(
default="cloud-hypervisor",
validation_alias=AliasChoices("hypervisor"),
)

# VM defaults
default_vcpus: int = Field(
default=2,
validation_alias=AliasChoices("default_vcpus", "default-vcpus"),
)
default_memory_mb: int = Field(
default=2048,
validation_alias=AliasChoices("default_memory_mb", "default-memory-mb"),
)
vm_overhead_mb: int = Field(
default=64,
description="VMM process + guest kernel + kata-agent overhead (MB)",
validation_alias=AliasChoices("vm_overhead_mb", "vm-overhead-mb"),
)

# Guest image
kernel_path: Path = Field(
default=Path("/opt/kata/share/kata-containers/vmlinux.container"),
validation_alias=AliasChoices("kernel_path", "kernel-path"),
)
initrd_path: Path | None = Field(
default=None,
validation_alias=AliasChoices("initrd_path", "initrd-path"),
)
rootfs_path: Path = Field(
default=Path("/opt/kata/share/kata-containers/kata-containers.img"),
validation_alias=AliasChoices("rootfs_path", "rootfs-path"),
)

# Storage
shared_fs: Literal["virtio-fs", "virtio-9p"] = Field(
default="virtio-fs",
validation_alias=AliasChoices("shared_fs", "shared-fs"),
)
virtiofsd_path: Path = Field(
default=Path("/opt/kata/libexec/virtiofsd"),
validation_alias=AliasChoices("virtiofsd_path", "virtiofsd-path"),
)
virtio_fs_cache_size: int = Field(
default=0,
validation_alias=AliasChoices("virtio_fs_cache_size", "virtio-fs-cache-size"),
)

# Networking
network_model: Literal["tcfilter", "macvtap"] = Field(
default="tcfilter",
validation_alias=AliasChoices("network_model", "network-model"),
)

# VFIO
enable_iommu: bool = Field(
default=True,
validation_alias=AliasChoices("enable_iommu", "enable-iommu"),
)
hotplug_vfio: Literal["root-port", "bridge-port", "no-port"] = Field(
default="root-port",
validation_alias=AliasChoices("hotplug_vfio", "hotplug-vfio"),
)

# Containerd
containerd_socket: Path = Field(
default=Path("/run/containerd/containerd.sock"),
validation_alias=AliasChoices("containerd_socket", "containerd-socket"),
)
kata_runtime_class: str = Field(
default="kata",
validation_alias=AliasChoices("kata_runtime_class", "kata-runtime-class"),
)

# Confidential computing (CoCo-by-default)
confidential_guest: bool = Field(
default=True,
description="CoCo is always enabled; no non-CoCo Kata mode",
validation_alias=AliasChoices("confidential_guest", "confidential-guest"),
)
guest_attestation: Literal["tdx", "sev-snp"] = Field(
default="tdx",
description="TEE attestation type (required)",
validation_alias=AliasChoices("guest_attestation", "guest-attestation"),
)
kbs_endpoint: str = Field(
default="",
description="Key Broker Service endpoint for sealed secrets and image decryption keys",
validation_alias=AliasChoices("kbs_endpoint", "kbs-endpoint"),
)

# Storage (guest-side direct mount)
storage_mounts: list[StorageMountSpec] = Field(
default_factory=list,
description="Storage volumes to mount inside the guest VM via native NFS/Lustre client",
validation_alias=AliasChoices("storage_mounts", "storage-mounts"),
)
storage_nic: StorageNicSpec | None = Field(
default=None,
description="Storage network interface (VFIO passthrough IPoIB/Ethernet) IP configuration",
validation_alias=AliasChoices("storage_nic", "storage-nic"),
)

# Guest hooks
guest_hook_path: str = Field(
default="/usr/share/oci/hooks",
description="Path inside guest rootfs where OCI prestart hooks are scanned",
validation_alias=AliasChoices("guest_hook_path", "guest-hook-path"),
)
kernel_modules: list[str] = Field(
default_factory=lambda: ["nfs", "nfsv4"],
description="Kernel modules to load inside guest during sandbox creation",
validation_alias=AliasChoices("kernel_modules", "kernel-modules"),
)


class StorageMountSpec(BaseConfigSchema):
"""A storage volume to mount directly inside the guest VM."""
fs_type: str = Field(description="Filesystem type: nfs, nfs4, lustre, wekafs")
source: str = Field(description="Mount source (e.g., 'storage-server:/exports/vfstore')")
mountpoint: str = Field(description="Guest-side mount point (e.g., '/mnt/vfstore')")
options: str = Field(default="", description="Mount options (e.g., 'vers=4.1,rdma')")


class StorageNicSpec(BaseConfigSchema):
"""Storage network interface IP configuration for guest VM."""
device: str = Field(description="Guest-side network device name (e.g., 'ib0', 'eth1')")
address: str = Field(description="IP address with prefix (e.g., '10.0.100.5/24')")
gateway: str = Field(default="", description="Gateway address (optional)")
```
Comment on lines +124 to +257
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TOML examples use hyphenated keys (e.g., default-vcpus, vm-overhead-mb), but the Pydantic model snippet doesn’t show the AliasChoices/serialization_alias pattern used elsewhere in AgentUnifiedConfig to support those hyphenated names. Please update the snippet to include the appropriate Field validation_alias/serialization_alias so the example keys actually parse and sample generation emits consistent key names.

Copilot uses AI. Check for mistakes.

Integration in `AgentUnifiedConfig`:

```python
class AgentUnifiedConfig(BaseConfigSchema):
# ... existing fields ...
kata: KataConfig | None = None # Only present when backend = "kata"
```

### Startup Validation

At `KataAgent.__ainit__()`, validate:

1. `kernel_path` exists and is readable
2. `rootfs_path` (or `initrd_path`) exists and is readable
3. `virtiofsd_path` exists and is executable
4. `containerd_socket` exists and is connectable
5. Kata runtime class is registered in containerd config
6. If `enable_iommu`: IOMMU is enabled in the kernel (`/sys/class/iommu` is non-empty)
7. `vhost_vsock` kernel module is loaded

Fail fast with descriptive error messages on any validation failure.

### Hypervisor Comparison

| Feature | Cloud Hypervisor | QEMU | Dragonball |
|---------|-----------------|------|------------|
| Boot time | ~125-200ms | ~300-500ms | ~100ms (est., no published benchmark) |
| Memory overhead | ~15MB | ~60MB | ~10MB |
| VFIO passthrough | Yes | Yes | Yes |
| virtio-fs | Yes | Yes | Yes |
| CPU/mem hotplug | Yes | Yes | Yes |
| Confidential (TDX/SEV) | Yes | Yes | No |
| Code size | ~50K LOC (Rust) | ~2M LOC (C) | Rust (in-process) |
| Maturity | Production | Production | Production (Alibaba) |

**Default: Cloud Hypervisor** — best balance of feature completeness, security (small Rust codebase), and performance. QEMU recommended only for confidential computing on older hardware or non-x86 architectures.

## Host Deployment Requirements

### Kernel and Hardware

- Linux kernel >= 5.10 with KVM enabled
- Intel VT-x/VT-d or AMD-V/AMD-Vi (IOMMU)
- Kernel parameters: `intel_iommu=on iommu=pt` (Intel) or `amd_iommu=on iommu=pt` (AMD)
- Kernel modules: `kvm`, `kvm_intel`/`kvm_amd`, `vfio`, `vfio_pci`, `vfio_iommu_type1`, `vhost_vsock`

### Software

- Kata Containers 3.x (`kata-runtime`, guest kernel, guest rootfs/initrd)
- containerd >= 1.7 with Kata shim (`io.containerd.kata.v2`)
- virtiofsd (included in Kata installation)

### containerd Configuration

```toml
# /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
privileged_without_host_devices = true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"
```

### VFIO GPU Binding

GPUs intended for VFIO passthrough must be unbound from the `nvidia` driver and bound to `vfio-pci`:

```bash
# Identify GPU PCI addresses
lspci -nn | grep -i nvidia

# Bind to vfio-pci (example for 0000:41:00.0)
echo "0000:41:00.0" > /sys/bus/pci/devices/0000:41:00.0/driver/unbind
echo "10de 2684" > /sys/bus/pci/drivers/vfio-pci/new_id

# Verify
ls -la /dev/vfio/
```

For persistent binding, use `driverctl` or modprobe configuration.

## Implementation Notes

- `KataConfig` is validated at agent startup; the agent refuses to start if prerequisites are not met
- The `generate-sample` CLI command should include the `[kata]` section with comments
- Hypervisor binary paths can be auto-detected from `kata-runtime env` output
- The `[container]` section's `scratch-root` and `port-range` settings are reused by KataAgent
Loading
Loading