Test what the CODE does. Internal behavior, edge cases, precise failure injection. These are fast and run everywhere.
- Use
t.TempDir()for filesystem tests - Use
requirefor preconditions (fail immediately),assertfor checks - Construct exact broken states in Go — corrupt files, concurrent writes, duplicate IDs, missing directories
- No env vars for controlling behavior — pass dependencies directly
- Same package as the code under test (access to unexported functions)
func TestBeadStore_CorruptLine(t *testing.T) {
dir := t.TempDir()
os.WriteFile(filepath.Join(dir, "beads.jsonl"),
[]byte("{\"id\":\"gc-1\"}\nthis is not json\n"), 0644)
store := beads.NewStore(dir)
items, err := store.List()
require.NoError(t, err)
assert.Len(t, items, 1) // skips bad line, doesn't crash
}When to use: corrupted data, concurrent writes, specific error types, double-claim conflicts, rollback behavior, boundary conditions.
make test and make test-cover now follow this boundary strictly: they
run the fast unit loop only, with GC_FAST_UNIT=1 gating slow cmd/gc
process scenarios. Slow process-backed cases
such as managed Dolt recovery, real bd lifecycle, tutorial regression
scripts, and the large gc-beads-bd provider suite are routed out of the
default path so local make check and CI Check stay focused on quick
feedback. If you need that full cmd/gc scenario coverage locally, run
make test-cmd-gc-process. In CI, the required non-short path is the
test-integration-packages shard. If you need the heavier package
coverage sweep locally, use make test-integration-packages-cover or
make test-integration-shards-cover. As a result, coverage.txt is the
fast unit-only baseline; the integration contribution comes from the
shard-specific coverage.integration-*.txt profiles and their matching
Codecov flags.
Test what the USER sees. Run the real gc binary, assert on stdout/stderr.
These are the tutorial regression tests — each .txtar corresponds to a
tutorial's shell interactions.
- Uses
github.com/rogpeppe/go-internal/testscript - Testscript defaults missing backend env vars to local fakes:
GC_SESSION=fake,GC_BEADS=file,GC_DOLT=skip - Fakes have at most three modes per dependency:
GC_SESSION=fake— works, but in-memoryGC_SESSION=fail— all operations return errorsGC_SESSION=tmux— use real tmux explicitly
!prefix means command should failstdout/stderrassert on output-- filename --blocks create test fixtures
env GC_SESSION=fake
exec gc init $WORK/bright-lights
stdout 'City initialized'
exec gc rig add $WORK/tower-of-hanoi
stdout 'Adding rig'
exec bd create 'Build a Tower of Hanoi app'
stdout 'status: open'
-- $WORK/tower-of-hanoi/.git/HEAD --
ref: refs/heads/main
When to use: CLI output format, command success/failure, user-facing error messages, tutorial flows end to end.
The env var rule: if you need more than two env vars to set up a failure scenario, it's a unit test, not a testscript. In testscript, omitting the session/beads env vars now means "use the fake defaults," not "use real tmux."
Test that real pieces fit together. Need real tmux, real filesystem, real agent sessions. Run separately — not in CI by default.
//go:build integration
func TestRealTmuxSession(t *testing.T) {
// actually creates and kills tmux sessions
}When to use: proving the fakes are honest, smoke testing the real infra, testing tmux session lifecycle with real processes.
Run with: go test -tags integration ./test/...
Supervisor binary smoke test (test/integration/huma_binary_test.go):
builds gc, boots the supervisor against an isolated GC_HOME, waits
for /health, fetches /openapi.json, and runs gc cities as a
subprocess. Proves the whole stack — build tags, Huma registration,
listener bootstrap, socket paths — wires end-to-end through a real
binary. Run with make test-integration-huma or
go test -tags integration -run TestHumaBinary ./test/integration/.
These tests keep the public docs surface honest.
They currently verify:
- tutorial command coverage against the corresponding txtar tests
- local Markdown link targets across the repo docs
- Mintlify navigation page references in
docs/docs.json
Run them directly with:
go test ./test/docsync
Gas City's own tests for this code live in gascity_test.go (adapter
unit tests) and test/integration/bdstore_test.go (conformance).
Low-level (internal/runtime/tmux/tmux_test.go): test raw tmux
operations (NewSession, HasSession, KillSession) directly against the
tmux library. Session names use the gt-test- prefix.
End-to-end (test/integration/): build the real gc binary and
run it against real tmux. Validates the tutorial experience: gc init,
gc start, gc stop, bead CRUD.
BdStore conformance (test/integration/bdstore_test.go): runs the
beads conformance suite against BdStore backed by a real dolt server.
Proves the full stack: dolt server → bd CLI → BdStore → beads.Store.
Requires dolt and bd installed; skips otherwise.
Test cities use a gctest-<8hex> naming prefix so sessions are
visually distinct from real gascity sessions (gc-<cityname>-<agent>).
Three layers prevent orphan sessions:
- Pre-sweep (TestMain):
KillAllTestSessions()kills allgc-gctest-*sessions from prior crashed runs. - Per-test (
t.Cleanup): thetmuxtest.Guardkills sessions matching its specific city prefix. - Post-sweep (TestMain defer): final sweep after all tests.
guard := tmuxtest.NewGuard(t) // generates "gctest-a1b2c3d4", registers cleanup
cityDir := setupRunningCity(t, guard)
session := guard.SessionName("mayor") // "gc-gctest-a1b2c3d4-mayor"
if !guard.HasSession(session) { ... }test/tmuxtest/guard.go— reusable session guard helperRequireTmux(t)— skips test if tmux not installedKillAllTestSessions(t)— package-level sweep for TestMain
Test that components are called in the right order. Conformance tests verify each component's contract in isolation; coordination tests verify the wiring between components.
What coordination tests prove:
- Lifecycle ordering (ensure-ready before init, shutdown after agents stop)
- Hook survival (hooks reinstalled after init wipes them)
- Qualification consistency (all effective methods use the same name form)
What they don't prove:
- Component correctness — that's what conformance tests cover
- Full E2E behavior — that's integration tests
The exec:<spy> pattern:
t.Setenv("GC_BEADS", "exec:"+spyScript)The spy script logs every operation (ensure-ready, init <dir> <prefix>,
shutdown) to a file. Tests read the log and assert on ordering and
arguments. This exercises the real lifecycle code paths in
beads_provider_lifecycle.go without needing Dolt.
// Verify ensure-ready precedes init.
ops := readOpLog(t, logFile)
if !strings.HasPrefix(ops[0], "ensure-ready") {
t.Fatalf("first op should be ensure-ready, got: %s", ops[0])
}When to write a coordination test vs conformance test:
| Question | Test type |
|---|---|
| Does the beads store handle corrupt JSONL? | Conformance |
Does gc start call ensure-ready before init? |
Coordination |
| Does the mail provider deliver to the right inbox? | Conformance |
| Do all three Effective* methods use the qualified name? | Coordination |
| Does the session provider start a session correctly? | Conformance |
Does gc stop shut down beads after agents? |
Coordination |
The overtesting line: don't re-verify contracts that conformance tests already cover. Coordination tests check call ordering and argument plumbing, not that individual operations produce correct results.
Every provider interface has a conformance test suite that validates the
contract against all implementations. These live in *test/conformance.go
packages and are imported by each implementation's test file:
| Interface | Conformance suite | Implementations tested |
|---|---|---|
beads.Store |
internal/beads/beadstest/conformance.go |
MemStore, FileStore, BdStore |
runtime.Provider |
internal/runtime/runtimetest/conformance.go |
Fake, tmux, subprocess, exec, k8s |
mail.Provider |
internal/mail/mailtest/conformance.go |
beadmail, exec |
events.Recorder |
internal/events/eventstest/conformance.go |
FileRecorder, exec |
Conformance tests verify the behavioral contract (create/read/update/delete, error handling, concurrency). They deliberately don't test lifecycle ordering or cross-provider coordination — that's what coordination tests are for.
For the new 0.15 config surface, use
docs/packv2/doc-conformance-matrix.md as the release-gating ledger for
what should block CI now, what should start blocking once warning plumbing
lands, and what remains tracked but non-gating.
All five provider seams, their lifecycle dependencies, and coordination test coverage. This table is the checklist for new provider implementations.
| Seam | Implementations | Lifecycle deps | Coordination tested? |
|---|---|---|---|
Runtime (runtime.Provider) |
tmux, exec, k8s, fake | None (stateless start/stop) | Via lifecycle start order test |
Beads (beads.Store) |
MemStore, FileStore, BdStore | ensure-ready → init → hooks | TestLifecycleCoordination_* |
Mail (mail.Provider) |
beadmail, exec | Depends on beads store | No — not a lifecycle seam; conformance sufficient |
Events (events.Recorder) |
FileRecorder, exec | None (append-only) | No — stateless append, conformance sufficient |
| Dolt (internal) | dolt.EnsureRunning, dolt.StopCity | ensure → init, stop after agents | Covered by beads lifecycle (exec spy) |
Adding a new provider: When adding a new implementation of any seam:
- Run the conformance suite against it (mandatory)
- If the provider has lifecycle dependencies (startup ordering, shutdown
sequencing), add a coordination test using the
exec:<spy>pattern - Update this table
| Question you're testing | Tier |
|---|---|
Does bd create print the right output? |
Testscript |
Does gc start fail gracefully without tmux? |
Testscript (GC_SESSION=fail) |
Does gc rig add fail for a missing path? |
Testscript (real missing path) |
| Does the beads store skip corrupted JSONL lines? | Unit test |
| Does claim return ErrAlreadyClaimed on double-claim? | Unit test |
| Does concurrent bead creation avoid corruption? | Unit test |
| Does startup roll back if step 3 of 5 fails? | Unit test |
| Does a real tmux session start and respond to send-keys? | Integration |
| Package | Purpose |
|---|---|
testing (stdlib) |
t.TempDir(), t.Run(), subtests, build tags |
github.com/stretchr/testify |
assert and require — cleaner assertions |
github.com/rogpeppe/go-internal/testscript |
Tutorial regression from .txtar files |
No mock libraries. No gomock. No mockgen. Every test double is a
hand-written concrete type that lives in the same package as the
interface it implements.
| Double | Interface | Package | Strategy |
|---|---|---|---|
runtime.Fake |
runtime.Provider |
internal/runtime |
In-memory state + spy + broken mode |
fsys.Fake |
fsys.FS |
internal/fsys |
In-memory maps + spy + per-path error injection |
beads.MemStore |
beads.Store |
internal/beads |
Real logic, in-memory backing (also used by FileStore internally) |
Every fake records calls as []Call structs. Tests verify both the
result AND the call sequence:
sp := runtime.NewFake()
_ = sp.Start(context.Background(), "mayor", runtime.Config{})
_ = sp.Attach("mayor")
// Verify call sequence recorded by the fake runtime.
want := []string{"Start", "Attach"}
for i, c := range sp.Calls {
if c.Method != want[i] { ... }
}Three patterns, used where they fit:
Per-path errors (fsys.Fake) — fine-grained, fail specific operations:
f := fsys.NewFake()
f.Errors["/city/rigs"] = fmt.Errorf("disk full")Modal errors (runtime.Fake) — all-or-nothing broken mode:
f := runtime.NewFake()
f.Broken = true // Start/Stop/Attach and related operations return errorsEvery fake has a compile-time assertion in its test file:
var _ Provider = (*Fake)(nil)Fakes are exported types in the same package as their interface. This
makes them importable by cross-package unit tests (e.g., cmd/gc
imports runtime.NewFake()).
Every CLI command splits into two functions:
cmdFoo()— wires up real dependencies (reads cwd, loads config, callsnewSessionProvider()), then callsdoFoo().doFoo()— pure logic. Accepts all dependencies as arguments. Returns an exit code.
Unit tests call doFoo() directly with fakes:
sp := runtime.NewFake()
code := doSessionAttach(sp, "mayor", &stdout, &stderr)Testscript tests call gc foo which routes through cmdFoo() →
doFoo().
| I want to test... | Call |
|---|---|
| Pure logic with injected failures | doFoo() with a fake |
| CLI output format, exit codes | exec gc foo in txtar |
| That the factory wiring is correct | exec gc foo in txtar with GC_SESSION=fake |
When a function's argument construction is the behavior under test (flag injection, command building), extract the subprocess call behind an executor interface. This separates "what arguments are built" from "running a real binary."
When to use: Code that constructs exec.Command arguments
conditionally (socket flags, env vars, flag lists). The test verifies
the args array, not the subprocess outcome.
When NOT to use: When the logic under test is the orchestration
sequence (which methods are called in what order). Use the startOps
interface pattern instead.
Example: tmux.executor — fakeExecutor captures the []string
args passed to each tmux command. Tests verify socket flags, UTF-8
flags, and argument ordering without a tmux binary.
Testscript needs fakes too, but can't inject Go objects. The CLI has factory functions that check env vars and return the appropriate implementation.
Current env vars:
| Env var | Values | Factory | Used by |
|---|---|---|---|
GC_SESSION |
fake, fail, (absent) |
newSessionProvider() in cmd/gc/providers.go |
cmd_start.go, cmd_stop.go, cmd_agent.go |
GC_BEADS |
file, bd, (absent) |
beadsProvider() in cmd/gc/providers.go |
bead commands, cmd_init.go, cmd_start.go |
GC_DOLT |
skip, (absent) |
N/A (checked inline) | dolt lifecycle in cmd_init.go, cmd_start.go, cmd_stop.go |
Design rules for env var fakes:
- The fake never reads env vars itself — the factory function does
- At most three modes per dependency: works, fails, real
- If you need more than two env vars to set up a test scenario, it belongs in a unit test, not testscript
beads.MemStore is not a test-only fake — it's a real Store
implementation backed by a slice. FileStore composes MemStore
internally for its in-memory state and adds persistence on top. This
makes MemStore usable both as a production building block and as a
test double for code that needs a Store without disk I/O.