Skip to content

Testing

Ankur Nair edited this page Apr 19, 2026 · 1 revision

Testing

TitanX uses Vitest 4 as the test runner. Coverage target is ≥ 80%. CI enforces; local runs don't block.


Quick start

bun run test           # watch mode
bun run test:coverage  # one-shot with coverage report

Watch mode is fast — Vitest picks up TypeScript changes and re-runs affected tests. Average feedback loop: 1-3 seconds.


Test file conventions

Co-located tests

For simple unit tests, co-locate .test.ts next to the module under test:

src/process/services/reasoningBank/
├── index.ts
└── index.test.ts

__tests__/ for cross-module

When a test spans multiple modules or exercises an integration path:

src/process/services/__tests__/
└── fleet-integration.test.ts

Naming

  • <module>.test.ts — unit tests for <module>.ts
  • <module>.integration.test.ts — integration tests
  • <feature>.e2e.test.ts — end-to-end tests (rare; mostly run via bun run test:e2e)

What a good test looks like

import { describe, it, expect, beforeEach } from 'vitest';
import { SqliteTestHarness } from '@process/services/database/__tests__/testHarness';
import { storeTrajectory, findSimilarTrajectories } from '../index';

describe('reasoningBank', () => {
  let db: SqliteTestHarness;

  beforeEach(async () => {
    db = new SqliteTestHarness();
    await db.init();
  });

  describe('storeTrajectory', () => {
    it('stamps failure_pattern=1 when input.failurePattern is true', () => {
      const id = storeTrajectory(db.driver, {
        taskDescription: 'test task',
        steps: [{ toolName: 'git.diff', args: {}, result: '...', durationMs: 0 }],
        successScore: 0.3,
        failurePattern: true,
      });
      const row = db.driver
        .prepare('SELECT failure_pattern FROM reasoning_bank WHERE id = ?')
        .get(id) as { failure_pattern: number };
      expect(row.failure_pattern).toBe(1);
    });

    it('honors workspace_id isolation in retrieval', () => {
      storeTrajectory(db.driver, {
        taskDescription: 'project-a task',
        steps: [{ toolName: 'x', args: {}, result: 'r', durationMs: 0 }],
        successScore: 0.9,
        workspaceId: 'ws-a',
      });
      storeTrajectory(db.driver, {
        taskDescription: 'project-b task',
        steps: [{ toolName: 'x', args: {}, result: 'r', durationMs: 0 }],
        successScore: 0.9,
        workspaceId: 'ws-b',
      });

      const results = findSimilarTrajectories(db.driver, 'project', 10, 'ws-a');
      expect(results.map((r) => r.taskDescription)).toEqual([expect.stringContaining('project-a')]);
    });
  });
});

Principles on display:

  • Clear describe hierarchy
  • Each it tests one behavior
  • Setup in beforeEach
  • Arrange → Act → Assert structure
  • Assertions are specific (.toBe(1) not .toBeTruthy())

Test types

Unit tests (~70% of coverage)

Pure logic, no side effects.

it('formats cost correctly', () => {
  expect(formatCostCents(12345)).toBe('$123.45');
});

Fast, deterministic, no mocks needed (usually).

Integration tests (~20% of coverage)

Multi-module flows with a real SQLite test harness + mocked external services.

it('routes trajectory through push → ingest → dream → broadcast → apply', async () => {
  const slaveDb = new SqliteTestHarness();
  const masterDb = new SqliteTestHarness();
  // ... set up slave + master, simulate enrollment ...
  storeTrajectory(slaveDb.driver, { /* ... */ });
  const envelope = buildLearningEnvelope(slaveDb.driver, 0, Date.now());
  ingestLearningEnvelope(masterDb.driver, 'slave-a', envelope);
  await runDreamPass(masterDb.driver);
  // ... verify consolidated_learnings has the expected row ...
});

Slower but high-value. Exercise realistic paths.

E2E tests (~5%, run separately)

Playwright-driven. Launch the Electron app, click through UI, verify rendered state.

bun run test:e2e

Reserved for critical flows (first launch, team creation, agent hire). Most features don't need E2E.


Mocking

Mock boundaries, not internals

Mock the edges of your unit (filesystem, network, IPC) and test the real internal logic.

// ✅ Mock the IPC bridge
vi.mock('@/common', () => ({
  ipcBridge: { fleet: { getMode: { invoke: vi.fn(() => Promise.resolve('slave')) } } },
}));

// ❌ Don't mock the module you're testing
vi.mock('../index', () => ({ storeTrajectory: vi.fn() }));  // defeats the point

Mock sparingly

If you're mocking 5+ things, you're probably testing the mocks. Consider:

  • Restructuring the code for testability (dependency injection)
  • Using a test harness (real SQLite, fake filesystem) instead of mocks
  • Testing at a higher level (integration, not unit)

Test harnesses

SqliteTestHarness

Real in-memory SQLite database for tests that need persistence:

import { SqliteTestHarness } from '@process/services/database/__tests__/testHarness';

beforeEach(async () => {
  db = new SqliteTestHarness();
  await db.init();  // runs all migrations
});

MockFileSystem

Used in filesystem-touching tests:

const fs = new MockFileSystem({ '/tmp/test.txt': 'hello' });

FakeIpcBridge

For renderer tests that need IPC responses:

const ipc = new FakeIpcBridge();
ipc.register('fleet.getMode', () => 'master');

Coverage

Target

≥ 80% across src/process/services/, src/common/, src/renderer/hooks/.

Pages and components have looser targets (UI surface is mostly tested via E2E, plus visual inspection).

Viewing coverage

bun run test:coverage
# Opens coverage/index.html

Drill into any file to see which lines are hit / missed. Unmet lines are usually:

  • Edge-case branches — add a test
  • Error paths — add a test that triggers the error
  • Dead code — remove it

Excluding from coverage

Rarely needed. When it is, inline:

/* c8 ignore next 5 */
if (process.env.NODE_ENV === 'production') {
  // production-only path, not testable in unit tests
  preloadMetrics();
}

Running specific tests

# By file
bun run test reasoningBank

# By describe/it name (pattern)
bun run test -t 'honors workspace_id'

# Watch a specific file only
bun run test --watch reasoningBank/index.test.ts

CI integration

GitHub Actions runs on every PR:

  1. bun install
  2. bunx tsc --noEmit — fails on type errors
  3. bun run lint — fails on errors (warnings OK)
  4. bun run test:coverage — fails if coverage drops below 80%
  5. bun run i18n:types && node scripts/check-i18n.js — fails on locale drift
  6. prek run — CI equivalent of local prek check

All 6 must pass to merge. Skipping tests in a PR requires a human reviewer's explicit approval in the PR thread.


Test hygiene

Don't test implementation details

// ❌ Tests the inner mechanism
expect(spy).toHaveBeenCalledWith({ step1: ..., step2: ... });

// ✅ Tests the outcome
expect(result.finalState).toEqual(expected);

Refactors shouldn't break tests. If they do, you tested the implementation, not the behavior.

Cleanup

Tests own their cleanup. beforeEach for setup; afterEach for teardown if needed. No test should leave state that affects the next.

Flaky tests

Zero tolerance. If a test is flaky, it's broken. Fix it or delete it — don't retry in CI to mask.

Common causes: time-based logic (use faketimers), network (mock), process race conditions (use await).


Specific test recipes

Testing a migration

it('migration v73 adds source_tag to agent_memory', async () => {
  const db = new SqliteTestHarness({ migrateToVersion: 72 });
  await db.init();
  // confirm column doesn't exist yet
  expect(() => db.driver.prepare('SELECT source_tag FROM agent_memory').get()).toThrow();

  await db.migrateTo(73);
  // column exists, rows readable
  db.driver.prepare(
    'INSERT INTO agent_memory (id, agent_slot_id, ..., source_tag) VALUES (?, ?, ..., ?)'
  ).run('1', 'slot-x', ..., 'fleet_consolidated');
  const row = db.driver.prepare('SELECT source_tag FROM agent_memory WHERE id = ?').get('1');
  expect(row.source_tag).toBe('fleet_consolidated');
});

Testing a fleet command

it('rejects agent.execute from workforce-role slave with not_farm_role', async () => {
  const slaveDb = await setupSlaveInRole('workforce');
  const envelope = buildSignedEnvelope({
    commandType: 'agent.execute',
    targetDeviceId: 'slave-a',
    params: { jobId: 'j1', slotId: 's1', messages: [], model: 'claude', timeoutMs: 10000 },
  });
  const result = await executeFleetCommand(slaveDb.driver, envelope);
  expect(result).toEqual({
    ok: false,
    reason: 'not_farm_role',
  });
});

Testing redaction

it('drops trajectory containing sk-ant- API key', async () => {
  const slaveDb = new SqliteTestHarness();
  await slaveDb.init();
  storeTrajectory(slaveDb.driver, {
    taskDescription: 'Call API with key sk-ant-api03-some-secret-value',
    steps: [{ toolName: 'web.fetch', args: {}, result: 'ok', durationMs: 0 }],
    successScore: 0.9,
  });
  const envelope = buildLearningEnvelope(slaveDb.driver, 0, Date.now());
  // trajectory should have been dropped by entropy audit
  expect(envelope?.trajectories).toEqual([]);
});

Related pages

Clone this wiki locally