Skip to content

dailykim149656-source/codex-hermes-baremetal-doom

Repository files navigation

Codex × Hermes BareMetal Doom Experiment

Codex × Hermes BareMetal Doom Experiment, formerly Hermes OS-Doom Harness, is a Google I/O-inspired multi-agent software development experiment: use Codex and Hermes-style agent workflows to build a tiny QEMU-bootable OS, add just enough kernel/runtime surface to host doomgeneric/Freedoom, and prove the result with executable gates instead of screenshots or claims.

QEMU/VNC live Doom/Freedoom capture produced by the live-display evaluator

This GIF is generated from QEMU monitor screendump frames captured from the live VNC/display surface during ./evaluate.sh live-display; it is not hand-authored or AI-generated. A still gameplay frame from ./evaluate.sh playable is also kept at docs/assets/readme-doom.png.

The point is not that an AI agent magically wrote an operating system in one shot. The point is that complex agent work needs contracts, small scopes, failure logs, repeatable evaluation, and durable handoffs. Doom/Freedoom is used as the final integration pressure test because it forces display, input, timing, memory, and game-data loading to work together.

한국어 요약: 이 저장소는 Hermes Agent를 루트 오케스트레이터로 두고, 좁은 milestone과 QEMU 기반 evaluator를 통해 toy OS 위에서 Doom-compatible 데모를 실행하는 멀티에이전트 개발 실험입니다.

Current status

Status: complete through M15f evaluator-backed Google-demo parity threshold.

The full regression gate passes:

./evaluate.sh all

Expected final marker:

ALL_PASS

Generated logs under artifacts/logs/ are intentionally not committed. The committed parity JSON/report are evidence snapshots; reproduce them from a clean checkout with:

./evaluate.sh all > artifacts/logs/all.log 2>&1
./evaluate.sh google-parity-audit

The M11b IRQ-backed Doom smoke is a dedicated gate available as ./evaluate.sh irq-doom and now included in ./evaluate.sh all. M12a adds ./evaluate.sh memory-map, a boot-memory discovery gate that parses QEMU's Multiboot memory map and reports usable/reserved regions over serial. M12b adds ./evaluate.sh memory-heap, a narrow heap smoke that selects a 4KiB-aligned heap region after the kernel image from that Multiboot map and initializes the existing bump allocator there. M12c adds ./evaluate.sh file-layer, a sidecar kernel gate that serves an embedded config file and a volatile save-file test double through the kernel compatibility stdio/stat/access APIs. M13 adds ./evaluate.sh swarm-evidence, an actual-records-only evidence gate for durable handoffs, generated run logs, and explicit unavailable metrics. M14 adds ./evaluate.sh public-demo, a one-command public demo package that regenerates playable/live-display evidence and a conservative public report. M15 adds ./evaluate.sh google-parity-audit, ./evaluate.sh paging, ./evaluate.sh physical-allocator, ./evaluate.sh persistent-save, ./evaluate.sh video-capture-demo, and ./evaluate.sh real-agent-swarm-run: the project now reaches its evaluator-backed Google-demo parity threshold at 94/100 while preserving conservative claim boundaries. It still does not claim Google-scale 93 agents, production OS status, real-hardware video proof, or polished live stage recording.

Google I/O demo comparison

This repository is best described as an evaluator-backed, small-scale public reproduction of the Google I/O Antigravity-style OS + Doom demo. It reaches the repo-defined parity threshold because it boots a custom QEMU OS, runs Doom/Freedoom, proves OS-depth sidecars, emits public artifacts, and records actual Codex-native subagent evidence.

It is not a full Google-scale recreation. The project does not claim the reported 93-agent scale, exact per-agent token/cost export, production OS status, real hardware execution, sound/networking, or a polished live-stage recording.

Dimension Current project Compared with the Google I/O demo
Bootable OS QEMU Multiboot toy OS with serial-verifiable gates Near demo-level for the toy OS reproduction goal
Doom/Freedoom execution Sustained QEMU Doom/Freedoom loop with input and frame artifacts Near demo-level for reproducible Doom execution
OS depth Multiboot memory map, heap selection, paging sidecar, physical allocator sidecar, raw-disk save sidecar Strong for a toy OS, but still not a production OS
Reproducibility ./evaluate.sh all, reports, screenshots, GIF, JSON score More transparent and repeatable than a stage-only claim
Agent evidence Three real Codex-native subagent audits plus durable metrics artifact Much smaller than Google-scale multi-agent orchestration
Demo polish QEMU/VNC GIF, static screenshots, and HTML previews Below a polished live stage demo

Safe public wording: "This project reaches its evaluator-backed Google-demo parity threshold for a QEMU/Freedoom OS-Doom reproduction." Avoid saying it is a Google-scale 93-agent recreation, a production OS, or a real-hardware stage demo.

The final playable gate boots the kernel in QEMU, loads embedded Freedoom data through the kernel-side WAD provider, initializes upstream doomgeneric, advances a sustained loop, injects movement/action keys through the QEMU monitor, verifies frame/key progress over serial, and generates a full 320x200 preview from the serial frame dump.

The M9 live-display gate proves a separate visible-surface path: DG_DrawFrame() copies real Doom/Freedoom frames to a QEMU display surface and captures headless VNC-backed screendump frames. In this QEMU -kernel environment Multiboot framebuffer metadata is not provided, so the automated gate programs a VGA 320x200x256 graphics fallback and blits Doom frames into the QEMU-visible VGA surface while preserving the framebuffer blit path for bootloaders that do provide one.

The M10 interactive-input gate proves the current polling PS/2 path can deliver deterministic Doom key press/release events for arrows, space, control, enter, and escape. M11 adds a separate interrupt-smoke kernel that loads an IDT, remaps the PIC, receives PIT IRQ0 ticks, and receives QEMU-injected PS/2 keyboard scancodes through IRQ1. M11b adds a dedicated compile-time Doom smoke path where DG_GetTicksMs/DG_SleepMs use the IRQ tick counter and DG_GetKey consumes the IRQ keyboard queue. The broader playable/live-display gates remain on the established polling path until a later migration gate expands the claim.

Primary proof artifacts after running the gate:

  • artifacts/logs/all.log or artifacts/logs/all-post-commit.log
  • artifacts/logs/playable.log
  • artifacts/logs/memory-map.log
  • artifacts/logs/memory-heap.log
  • artifacts/logs/paging.log
  • artifacts/logs/physical-allocator.log
  • artifacts/logs/persistent-save.log
  • artifacts/logs/file-layer.log
  • artifacts/logs/swarm-evidence.log
  • artifacts/logs/public-demo.log
  • artifacts/logs/video-capture-demo.log
  • artifacts/logs/real-agent-swarm-run.log
  • artifacts/agent_run_log.md
  • artifacts/agent_metrics.json
  • artifacts/real_agent_swarm_run.md
  • artifacts/real_agent_swarm_metrics.json
  • artifacts/public_demo_report.md
  • artifacts/video_capture_report.md
  • artifacts/google_parity_report.md
  • artifacts/google_parity_score.json
  • artifacts/logs/interactive-input.log
  • artifacts/logs/irq.log
  • artifacts/logs/irq-doom.log
  • artifacts/logs/live-display.log
  • artifacts/screenshots/demo.ppm
  • artifacts/screenshots/demo.png
  • artifacts/screenshots/demo.html
  • artifacts/screenshots/live-display.ppm
  • artifacts/screenshots/live-display.png
  • artifacts/screenshots/live-display.html
  • artifacts/videos/demo.gif
  • artifacts/videos/live-display-frames/live-display-*.ppm
  • artifacts/final_report.md
  • artifacts/test_report.md
  • artifacts/demo_commands.md

Generated screenshots/logs are intentionally not treated as source; regenerate them locally with the commands below.

Implementation telemetry

As of the latest local Codex accounting pass on 2026-05-24, the project has the following measurable implementation footprint.

Codex session-log accounting for F:\codex_hermes_os_doom:

  • Time window: 2026-05-23 13:55:25 KST to 2026-05-24 13:41:36 KST.
  • Wall-clock span: about 23 hours, 46 minutes, 11 seconds.
  • Sessions: 24.
  • Model calls: 1,212.
  • Logged total tokens: 153,428,346.
  • Input tokens: 152,655,141.
  • Cached input tokens: 147,149,312.
  • Output tokens: 773,205.
  • Reasoning output tokens reported separately: 288,826.
  • Non-cached input plus output tokens: about 6,279,034.

Codex Goal accounting is tracked separately and should not be added directly to the session-log total because it can overlap with the same underlying work:

  • M12-M14 goal: 430,769 tokens over 1 hour, 19 minutes, 37 seconds.
  • M15 goal: 622,814 tokens over 1 hour, 15 minutes, 27 seconds.
  • Combined explicit Goal accounting for those two goals: 1,053,583 tokens over 2 hours, 35 minutes, 4 seconds.

Hermes/agent caveat: Codex-native subagents that wrote Codex session logs under this project cwd are included in the session-log total. The confirmed M15 real subagent slice accounts for 1,459,434 logged tokens across three Codex-native subagents, with an overlapping wall-clock window of about 2 minutes, 29 seconds. External Hermes-only token and cost telemetry is not exported into the repository artifacts; artifacts/agent_metrics.json and artifacts/real_agent_swarm_metrics.json explicitly record token/cost metrics as unavailable. Therefore the honest project total is the Codex session-log total above, plus an explicit caveat that any external Hermes usage outside those logs is currently unmeasured.

Quick start

From Linux or WSL:

./scripts/check_deps.sh
./evaluate.sh all

For the visible preview only:

make visible-demo

Then open:

artifacts/screenshots/demo.html

The preview is derived from QEMU serial output; it is not a hand-authored image.

For the M9 live-display artifact:

./evaluate.sh live-display

Then open:

artifacts/screenshots/live-display.html

For the M10 interactive-input gate:

./evaluate.sh interactive-input

For the M11 IRQ groundwork gate:

./evaluate.sh irq

For the M11b IRQ-backed Doom input/timer smoke:

./evaluate.sh irq-doom

For the M12a Multiboot memory-map smoke:

./evaluate.sh memory-map

For the M12b memory-map-backed heap smoke:

./evaluate.sh memory-heap

For the M15b paging sidecar smoke:

./evaluate.sh paging

For the M15c physical allocator sidecar smoke:

./evaluate.sh physical-allocator

For the M15d persistent save sidecar smoke:

./evaluate.sh persistent-save

For the M12c embedded file/config test-double smoke:

./evaluate.sh file-layer

For the M13 actual agent-run evidence gate:

./evaluate.sh swarm-evidence

For the M14 public demo package:

./evaluate.sh public-demo

For the M15e animated QEMU/VNC capture package:

./evaluate.sh video-capture-demo

For the M15f real Codex-native subagent evidence package:

./evaluate.sh real-agent-swarm-run

For the M15a Google-demo parity audit:

./evaluate.sh google-parity-audit

For a manual windowed smoke session after the gate has built the dedicated kernel:

qemu-system-x86_64 \
  -kernel build/kernel_doom_interactive_input.elf \
  -serial stdio \
  -display gtk \
  -no-reboot \
  -no-shutdown

Focus the QEMU window and use arrows, space, control, enter, and escape. The serial output should show DOOM_KEY_EVENT_FULL lines. On systems without GTK display support, use an available QEMU display backend such as SDL or VNC.

Milestone gates

Every milestone is executable through evaluate.sh:

Gate Purpose Success marker
sanity Repository/control-file structure SANITY_OK
build Freestanding kernel ELF/bin build BUILD_OK
boot QEMU Multiboot kernel boot and serial marker BOOT_OK
display Deterministic VGA text-mode smoke DISPLAY_OK
input QEMU monitor key injection into PS/2 path INPUT_OK
timer PIT-backed tick/sleep smoke TIMER_OK
runtime Freestanding string/memory/heap smoke RUNTIME_OK
memory-map Multiboot memory-map inspection in QEMU MEMORY_MAP_OK
memory-heap Memory-map-backed heap initialization in QEMU MEMORY_HEAP_OK
paging 4MiB identity-mapped paging sidecar smoke PAGING_OK
physical-allocator Memory-map-backed physical page free stack smoke PHYS_ALLOC_OK
persistent-save QEMU raw-disk save record persists across two boots PERSIST_SAVE_OK
file-layer Embedded config and volatile save-file test double in QEMU FILE_LAYER_OK
swarm-evidence Actual-records-only agent handoff/metrics evidence SWARM_EVIDENCE_OK
public-demo One-command playable/live-display public demo package PUBLIC_DEMO_OK
video-capture-demo Animated GIF capture from QEMU/VNC screendump frames VIDEO_CAPTURE_DEMO_OK
real-agent-swarm-run Actual Codex-native subagent verification evidence REAL_AGENT_SWARM_RUN_OK
google-parity-audit Google-demo parity score/report audit GOOGLE_PARITY_AUDIT_OK
doom-init Host compile/init of local doomgeneric bridge DOOM_INIT_OK
doom-frame Deterministic host DG_DrawFrame bridge DOOM_FRAME_1_OK
doom-os-bridge Kernel-linked local Doom platform bridge DOOM_OS_BRIDGE_OK
freedoom-host Full upstream doomgeneric + Freedoom host frame FREEDOOM_HOST_OK
freedoom-qemu Embedded Freedoom WAD metadata/lump provider in QEMU FREEDOOM_QEMU_WAD_OK
doom-kernel-compile Upstream doomgeneric freestanding compile DOOM_KERNEL_COMPILE_OK
doom-kernel-link Upstream doomgeneric objects linked into kernel DOOM_KERNEL_LINK_OK
doom-kernel-demo Real upstream Doom/Freedoom frame inside QEMU DOOM_KERNEL_DEMO_OK
doom-kernel-input QEMU keyboard input reaches Doom key events DOOM_KERNEL_INPUT_OK
interactive-input Multi-key Doom input press/release ordering through QEMU injection INTERACTIVE_INPUT_OK
irq IDT/PIC/PIT IRQ0 and keyboard IRQ1 smoke IRQ_OK
irq-doom Dedicated Doom smoke path using IRQ-backed timer ticks and keyboard queue IRQ_DOOM_OK
playable Sustained QEMU Doom/Freedoom loop with injected input PLAYABLE_OK
live-display Doom/Freedoom frames copied to a QEMU-visible live display surface LIVE_DISPLAY_OK
hardening Host-side WAD bounds and libc shim error-handling regression checks HARDENING_OK
assert-panic Freestanding assert diagnostics and NDEBUG no-evaluation behavior ASSERT_PANIC_OK
all Full regression ALL_PASS

Project layout

PRD.md                  Product requirements and experiment framing
architecture.md         Current technical architecture
contracts/              Module/subagent contracts and pass criteria
progress.md             Milestone completion log
failure_log.md          Reproducible failures and fixes
evaluate.sh             Single validation entrypoint
Makefile                Build and convenience targets
kernel/                 Multiboot entry, kernel main, serial
linker.ld               32-bit freestanding kernel layout
drivers/                VGA display, PS/2 keyboard, PIT timer
runtime/                Minimal freestanding libc/heap primitives
ports/doomgeneric/      Local Doom bridge, Freedoom WAD provider, upstream vendor
tests/                  Shell-based evaluator gates
scripts/                Dependency check, Freedoom fetch, preview renderer
docs/agent-runs/        Hermes-style run records and handoff logs
artifacts/              Generated reports plus ignored runtime logs/assets/screenshots

Architecture summary

The kernel is intentionally small:

  1. QEMU loads a Multiboot v1 ELF through -kernel.
  2. kernel/entry.S enters 32-bit C code.
  3. kernel/kernel.c emits serial proof markers and runs smoke gates.
  4. Drivers provide the minimum display/input/timer surface.
  5. runtime/ supplies enough freestanding C behavior for the kernel and Doom bridge.
  6. ports/doomgeneric/ adapts upstream doomgeneric to host and kernel smoke environments.
  7. The final demo links verified Freedoom data into the kernel as an ELF binary object and serves it through a kernel-side WAD provider.

The evaluator is serial-log-first: a gate passes only when the expected marker appears in QEMU/host output.

Legal/game data notes

  • This repository must not contain commercial Doom WADs.
  • The harness downloads and verifies Freedoom for legal/free content testing.
  • artifacts/assets/freedoom/ is local generated state and is ignored by git.
  • Vendored doomgeneric sources are kept under ports/doomgeneric/upstream/ with upstream metadata and license files.

Known limits

  • This is a toy OS and demo harness, not a production operating system.
  • There is no POSIX environment, multitasking, networking, sound, or production filesystem.
  • M12a parses and reports the Multiboot memory map. M12b initializes the existing bump allocator from a selected Multiboot usable region in a dedicated smoke gate. M12c adds an embedded config and volatile save-file test double, not persistent disk storage. M13 verifies actual evidence records only; it does not fabricate a new swarm run. M14 packages a reproducible public demo. M15b proves bounded identity-mapped paging in a sidecar kernel, not virtual memory isolation or Doom-path paging migration. M15c proves a bounded physical page free stack in a sidecar kernel, not Doom-path memory migration or a production memory manager. M15d proves one raw-disk save record across two QEMU boots, not Doom save migration or a filesystem. M15e/M15f add GIF capture and actual Codex-native subagent evidence while explicitly avoiding Google-scale 93-agent and production claims.
  • Sound is intentionally no-op in the current smoke/demo path.
  • The automated final proof is still serial-log/checksum anchored; M9 adds QEMU display-surface screendumps and a VNC-derived GIF, using a VGA 320x200x256 graphics fallback because QEMU -kernel does not provide Multiboot framebuffer metadata here.
  • M10 proves polling PS/2 Set 1 press/release ordering through QEMU input. M11 proves IRQ delivery in a dedicated sidecar kernel. M11b proves a dedicated Doom smoke path can consume IRQ-backed timer ticks and keyboard events, but the broader playable/live-display loops have not yet migrated to IRQ-backed input/timer.
  • QEMU is required for meaningful OS validation; make build alone is not enough proof.

Reproduction evidence

  • Agent-run evidence and handoffs: docs/agent-runs/
  • Final report: artifacts/final_report.md
  • Demo command reference: artifacts/demo_commands.md
  • Test report: artifacts/test_report.md
  • Detailed milestone and failure history: progress.md, failure_log.md

Next roadmap

The current project is a serial-verified reproduction, not a full Google I/O demo equivalent. The scale-up plan is documented in docs/plans/2026-05-23-google-demo-level-roadmap.md.

Setup

See docs/setup.md for package installation notes. In short, Ubuntu/WSL needs build tools, 32-bit GCC support, binutils, file, curl, and QEMU:

sudo apt-get update
sudo apt-get install -y build-essential gcc-multilib binutils file curl qemu-system-x86

Then:

./scripts/check_deps.sh
./evaluate.sh all

About

Google I/O-inspired Codex x Hermes experiment: a tiny QEMU-bootable OS that runs Doom/Freedoom with evaluator proof.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors