Tokens consumption #449

hextor-ndc · 2026-03-10T00:14:42Z

hextor-ndc
Mar 10, 2026

Find it very impressive, but a bit concerned by the high token use for simple tasks. Any ideas how to reduce verbosity and token use?

gsxdsm · 2026-03-10T07:06:57Z

gsxdsm
Mar 10, 2026
Collaborator

Token Usage Optimization Plan

Current State Analysis

How Tokens Are Consumed Today

The Paperclip heartbeat system is the primary token consumer. On each heartbeat:

Scheduler ticks every 30s (heartbeatSchedulerIntervalMs), checking all agents
For each agent where now - lastHeartbeatAt >= intervalSec, a full heartbeat run is enqueued
The heartbeat always invokes the full adapter (e.g., Claude Opus/Sonnet) — there is no pre-check for whether work exists
Context is assembled: workspace info, issue IDs, task keys, wake reasons, approval status, linked issues
The adapter constructs a prompt from the agent's template + system instructions (--append-system-prompt-file) and sends it to the CLI
The model reads the full context, evaluates the situation, and responds — even if the answer is "nothing to do"

Key Problems

Problem	Impact
No triage before invocation	Timer heartbeats invoke the full expensive model even when there's no work. An agent with `intervalSec: 300` burns tokens 288 times/day just checking.
Session resets on timer heartbeats	Timer wakes always reset the task session (`shouldResetTaskSessionForWake` returns `true` for `wakeSource === "timer"`), destroying prompt cache benefits.
Full context on every run	The complete context object (workspace hints, project metadata, etc.) is rebuilt and sent every time, even for simple "no work" responses.
No model tiering	Every heartbeat — whether a check-for-work or a complex coding task — uses the same model configured on the agent.
Agent instructions re-sent every call	The `instructionsFilePath` content is appended to every invocation via `--append-system-prompt-file`, adding constant overhead.

Current Token Tracking

Tokens are tracked well:

Per-run: heartbeat_runs.usageJson and cost_events table
Aggregate: agentRuntimeState.totalInputTokens/totalOutputTokens/totalCachedInputTokens/totalCostCents
Budget enforcement: agents/companies auto-pause when spentMonthlyCents >= budgetMonthlyCents

Optimization 1: Dual-Stage Heartbeat (Triage + Execute)

Concept

Split each heartbeat into two stages:

Stage 1 — Triage (cheap/local model): A lightweight check that determines if there is work to do and what kind of work it is
Stage 2 — Execute (expensive model): Only invoked when Stage 1 determines meaningful work exists

Architecture

┌─────────────────┐
│  Scheduler Tick  │
│  (every 30s)     │
└────────┬────────┘
         │
         ▼
┌─────────────────────────┐
│  Stage 1: Triage         │
│  Model: haiku / local    │
│  Cost: ~0.001-0.01x      │
│                           │
│  Input:                   │
│  - Agent's assigned tasks │
│  - New comments/mentions  │
│  - Pending approvals      │
│  - Git status summary     │
│                           │
│  Output:                  │
│  - action: skip | wake    │
│  - reason: string         │
│  - urgency: low|med|high  │
│  - suggestedModel: string │
└────────┬────────┘
         │
         ▼
    ┌────────────┐     ┌──────────────────────┐
    │ skip?      │────▶│  Log "no work" +     │
    │            │     │  record triage cost   │
    └────┬───────┘     └──────────────────────┘
         │ wake
         ▼
┌─────────────────────────┐
│  Stage 2: Execute        │
│  Model: sonnet / opus    │
│  (based on triage output)│
│                           │
│  Full heartbeat as today  │
└─────────────────────────┘

Triage Model Options

Option	Pros	Cons	Est. Cost Reduction
Claude Haiku	Same API, easy integration, still capable	Still an API call, still costs something	90-95% per idle heartbeat
Local LLM (Ollama/llama.cpp)	Near-zero marginal cost, no API dependency	Requires local infra, less capable, latency varies	~99% per idle heartbeat
Rule-based (no LLM)	Zero token cost, instant, deterministic	Can't handle nuanced decisions, brittle	100% per idle heartbeat
Hybrid: Rules first, then Haiku	Best of both: zero cost for obvious cases, smart fallback	More complex implementation	95-99% per idle heartbeat

Recommended: Hybrid Approach (Rules + Haiku Fallback)

Phase 1 — Rule-based triage (no LLM):

Before invoking any model, check server-side:

interface TriageResult {
  action: "skip" | "wake" | "triage_with_model";
  reason: string;
  urgency?: "low" | "medium" | "high";
  suggestedModel?: string;
}

async function triageHeartbeat(agent: Agent, context: WakeContext): Promise<TriageResult> {
  // 1. Check if agent has any assigned issues
  const assignedIssues = await db.issues.findActive({ assigneeAgentId: agent.id });
  if (assignedIssues.length === 0 && context.wakeSource === "timer") {
    return { action: "skip", reason: "no_assigned_issues" };
  }

  // 2. Check if there are new comments since last run
  const newComments = await db.comments.findSince({
    issueIds: assignedIssues.map(i => i.id),
    since: agent.lastHeartbeatAt,
  });

  // 3. Check for pending approvals
  const pendingApprovals = await db.approvals.findPending({ agentId: agent.id });

  // 4. If nothing new, skip
  if (newComments.length === 0 && pendingApprovals.length === 0) {
    // Check if agent was in the middle of work (has active session)
    const hasActiveWork = await hasInProgressWork(agent);
    if (!hasActiveWork) {
      return { action: "skip", reason: "no_new_activity" };
    }
  }

  // 5. For ambiguous cases, delegate to cheap model
  return { action: "triage_with_model", reason: "needs_evaluation" };
}

Phase 2 — Haiku triage (for ambiguous cases):

async function triageWithModel(agent: Agent, context: WakeContext): Promise<TriageResult> {
  const triagePrompt = `You are a triage agent. Evaluate whether agent "${agent.name}" needs to wake up.

Assigned issues: ${JSON.stringify(issuesSummary)}
New comments: ${JSON.stringify(commentsSummary)}
Pending approvals: ${JSON.stringify(approvalsSummary)}
Last activity: ${agent.lastHeartbeatAt}

Respond with JSON: {"action": "skip"|"wake", "reason": "...", "urgency": "low"|"medium"|"high"}`;

  // Use haiku - ~$0.25/1M input, $1.25/1M output vs Sonnet ~$3/$15
  const result = await callModel("claude-haiku-4-5-20251001", triagePrompt);
  return JSON.parse(result);
}

Implementation Plan

Database Changes

-- Add triage tracking columns to heartbeat_runs
ALTER TABLE heartbeat_runs ADD COLUMN triage_result JSONB;
ALTER TABLE heartbeat_runs ADD COLUMN triage_model TEXT;
ALTER TABLE heartbeat_runs ADD COLUMN triage_cost_cents NUMERIC(10,4) DEFAULT 0;
ALTER TABLE heartbeat_runs ADD COLUMN skipped_by_triage BOOLEAN DEFAULT FALSE;

-- Add triage config to agent runtime config
-- agents.runtimeConfig.heartbeat.triage:
-- {
--   enabled: boolean,
--   mode: "rules_only" | "rules_plus_model" | "model_only",
--   triageModel: "claude-haiku-4-5-20251001" | "local" | null,
--   skipThreshold: number  // confidence threshold to skip (0-1)
-- }

Config Changes

Extend the heartbeat config in packages/shared/src/types/heartbeat.ts:

interface HeartbeatTriageConfig {
  enabled: boolean;            // Default: false (opt-in)
  mode: "rules_only" | "rules_plus_model" | "model_only";
  triageModel?: string;        // Model ID for triage stage
  skipThreshold?: number;      // 0-1, confidence needed to skip
  maxSkipsBeforeForceWake?: number; // Safety: force full wake after N skips
}

Server Changes

Modify heartbeat.ts executeRun():

Before calling adapter.execute(), run triage
If triage says "skip", mark run as succeeded with skippedByTriage: true
Record triage cost separately from execution cost
If triage says "wake", proceed with normal adapter execution

Adapter Changes

Add a new lightweight adapter (or adapter mode) for triage:

// New: packages/adapter-triage/src/server/execute.ts
export async function executeTriage(ctx: TriageExecutionContext): Promise<TriageResult> {
  // Option A: Direct API call to Haiku (no CLI overhead)
  // Option B: Use existing adapter with model override
  // Option C: Local model via Ollama
}

Tradeoffs

Dimension	Without Dual-Stage	With Dual-Stage
Cost per idle heartbeat	Full model cost (~$0.01-0.10)	Rule check ($0) or Haiku (~$0.001)
Latency	5-30s per heartbeat	Skip: <100ms, Triage: 1-3s
Complexity	Simple: one path	More complex: triage logic + fallback
Reliability	Always runs	Risk of false skips (mitigated by `maxSkipsBeforeForceWake`)
Responsiveness	Agent always evaluates	Possible delay if triage incorrectly skips
Observability	One cost per run	Separate triage + execution costs

Risk Mitigations

False skips: Set maxSkipsBeforeForceWake (e.g., 5) — after N consecutive triage skips, force a full heartbeat
Stale state: Rule-based triage queries live DB state, not cached data
Gradual rollout: Start with rules_only mode, then add model triage, then enable per-agent
Override: Assignment and on-demand wakeups bypass triage entirely (they always have work)

Optimization 2: Stop Resetting Sessions on Timer Heartbeats

Problem

In heartbeat.ts line 206:

if (wakeSource === "timer") return true; // Always resets session

This destroys prompt caching benefits. When a session is reset, the next invocation must re-send the full system prompt, CLAUDE.md, project context, and conversation history — all as fresh (uncached) tokens.

Proposal

Change timer heartbeats to preserve sessions when the agent is working on the same task:

function shouldResetTaskSessionForWake(wakeSource: string, wakeReason: string): boolean {
  // Always reset for new assignments
  if (wakeReason === "issue_assigned") return true;
  // Always reset for manual triggers
  if (wakeSource === "on_demand" && triggerDetail === "manual") return true;
  // Timer: preserve session if agent has active task
  if (wakeSource === "timer") return false; // Changed!
  return false;
}

Expected Impact

Claude's prompt caching gives 90% discount on cached input tokens
A typical heartbeat with ~50K input tokens: uncached = ~$0.15 (Sonnet), cached = ~$0.015
10x reduction in input token cost for timer heartbeats with active sessions

Risk

Session context may grow stale if task state changes significantly between heartbeats
Mitigation: Add session age limit (e.g., reset after 2 hours or 10 consecutive resumes)

Optimization 3: Model Tiering by Task Complexity

Concept

Not all heartbeat work requires the same model. Use the triage result to select the appropriate model:

Task Type	Suggested Model	Rationale
Status check / "nothing to do"	Haiku (or skip entirely)	No reasoning needed
Simple comment reply	Haiku or Sonnet	Straightforward text generation
Code review feedback	Sonnet	Needs code understanding
Bug fix / feature implementation	Sonnet or Opus	Complex reasoning required
Architecture decisions	Opus	Highest capability needed

Implementation

Add to agent config:

interface ModelTieringConfig {
  enabled: boolean;
  defaultModel: string;        // Fallback model
  taskModelOverrides?: {
    [taskType: string]: string; // e.g., "comment_reply": "claude-haiku-4-5-20251001"
  };
  maxModelForBudgetPercent?: {  // Downgrade model when budget is running low
    75: string;  // At 75% budget used, switch to this model
    90: string;  // At 90% budget used, switch to this model
  };
}

Budget-Aware Model Selection

function selectModel(agent: Agent, taskType: string): string {
  const budgetUsedPct = (agent.spentMonthlyCents / agent.budgetMonthlyCents) * 100;

  // Progressive downgrade as budget depletes
  if (budgetUsedPct >= 90) return "claude-haiku-4-5-20251001";
  if (budgetUsedPct >= 75) return "claude-sonnet-4-6";

  // Task-based selection
  return agent.config.modelTiering?.taskModelOverrides?.[taskType]
    ?? agent.config.modelTiering?.defaultModel
    ?? agent.model;
}

Optimization 4: Context Compression

Problem

Every heartbeat sends the full context object to the adapter, which includes workspace hints, project metadata, and other data that rarely changes.

Proposals

4a. Dehydrate Static Context

Separate context into static (changes rarely) and dynamic (changes per heartbeat):

// Static context — cached/hashed, only re-sent when changed
interface StaticContext {
  workspaces: WorkspaceHint[];
  agentInstructions: string;   // CLAUDE.md content
  projectMetadata: ProjectMeta;
}

// Dynamic context — sent every heartbeat
interface DynamicContext {
  wakeReason: string;
  issueId?: string;
  newComments: CommentSummary[];
  pendingApprovals: ApprovalSummary[];
}

4b. Summarize Issue History

Instead of sending full comment threads, send summaries:

// Before: full comment history (could be 10K+ tokens)
context.comments = await db.comments.findAll({ issueId });

// After: summarized recent activity
context.recentActivity = await summarizeRecentActivity(issueId, {
  since: agent.lastHeartbeatAt,
  maxTokens: 2000,
});

4c. Trim Workspace Hints

The paperclipWorkspaces field contains hints for all configured workspaces. For agents that only work in one workspace, this is wasted context.

// Only include workspaces relevant to current task
context.paperclipWorkspaces = resolvedWorkspace.workspaceHints
  .filter(w => w.projectId === currentProjectId);

Optimization 5: Smarter Scheduling

Problem

The scheduler runs on a fixed interval (heartbeatSchedulerIntervalMs: 30000), checking all agents every tick. With many agents, this creates unnecessary DB load, and agents with long intervals still get checked every 30s.

Proposals

5a. Per-Agent Next-Wake Timestamp

Instead of checking all agents every 30s, maintain a nextWakeAt column:

ALTER TABLE agent_runtime_state ADD COLUMN next_wake_at TIMESTAMPTZ;
CREATE INDEX idx_runtime_next_wake ON agent_runtime_state (next_wake_at)
  WHERE next_wake_at IS NOT NULL;

Scheduler query becomes:

SELECT * FROM agent_runtime_state
WHERE next_wake_at <= NOW()
  AND next_wake_at IS NOT NULL
ORDER BY next_wake_at
LIMIT 50;

After each heartbeat, set nextWakeAt = NOW() + intervalSec.

5b. Adaptive Intervals

Increase heartbeat interval when agent is idle, decrease when active:

function calculateNextInterval(agent: Agent, lastResult: HeartbeatResult): number {
  const baseInterval = agent.heartbeat.intervalSec;

  if (lastResult.skippedByTriage) {
    // Exponential backoff for idle agents (max 4x base)
    const consecutiveSkips = agent.consecutiveTriageSkips;
    const multiplier = Math.min(Math.pow(1.5, consecutiveSkips), 4);
    return Math.round(baseInterval * multiplier);
  }

  if (lastResult.hasActiveWork) {
    // More frequent checks when actively working
    return Math.max(Math.round(baseInterval * 0.5), 60);
  }

  return baseInterval;
}

Implementation Roadmap

Phase 1: Quick Wins

Rule-based triage — Add server-side checks before adapter invocation
- File: server/src/services/heartbeat.ts (modify executeRun)
- Check for assigned issues, new comments, pending approvals
- Skip heartbeat if nothing actionable
- Expected savings: 40-60% of timer heartbeat tokens
Stop resetting sessions on timer heartbeats
- File: server/src/services/heartbeat.ts line 206
- Change to preserve sessions when agent has active task
- Expected savings: ~10x input token reduction for resumed sessions
Per-agent next-wake timestamp
- Add nextWakeAt column to agentRuntimeState
- Scheduler only queries agents due for wake-up
- Expected savings: reduced DB load, no direct token savings

Phase 2: Model Triage

Haiku or other cheap triage adapter
- New adapter or adapter mode for lightweight triage
- Structured JSON output for action decision
- Expected savings: 90-95% per triage-skipped heartbeat
Triage configuration UI
- Add triage config to agent settings
- Expose triage stats in dashboard (skips, wakes, costs)
Adaptive intervals
- Implement exponential backoff for idle agents
- Expected savings: 50-75% fewer heartbeats for idle agents

Phase 3: Advanced Optimizations

Model tiering by task type
- Triage output includes task complexity assessment
- Route to appropriate model (Haiku/Sonnet/Opus)
- Expected savings: 30-50% on execution costs
Budget-aware model downgrade
- Automatic model selection based on budget consumption
- Progressive: Opus → Sonnet → Haiku as budget depletes
Context compression
- Dehydrate static context
- Summarize issue history
- Trim irrelevant workspace hints
- Expected savings: 20-40% input token reduction

Phase 4: Infrastructure

Local model support for triage
- Ollama/llama.cpp integration for zero-cost triage
- Useful for self-hosted deployments
Batch triage API
- Anthropic Batch API for multi-agent triage
- Lower cost tier for non-urgent evaluations
Triage analytics dashboard
- Track skip rates, false skip rates, cost savings
- Per-agent and per-company views

Cost Projections

Example: 10 agents, 5-minute heartbeat interval

Current state (no optimization):

Heartbeats per day per agent: 288
Total heartbeats: 2,880/day
Assume 50% are idle (nothing to do): 1,440 wasted
Average cost per heartbeat (Sonnet): ~$0.05
Wasted daily cost: ~$72/day

After Phase 1 (rule-based triage + session preservation):

1,440 idle heartbeats skipped ($0 each): saves $72
1,440 active heartbeats with cached sessions: ~$0.01 each instead of $0.05
New daily cost: ~$14.40/day (80% reduction)

After Phase 2 (Haiku triage + adaptive intervals):

Idle agents: interval doubles after each skip → ~720 checks/day at $0.001 each = $0.72
Active heartbeats: ~$0.01 each × 1,440 = $14.40
New daily cost: ~$15.12/day (marginal improvement, but much more robust)

After Phase 3 (model tiering):

Simple tasks routed to Haiku: 40% of active = 576 × $0.005 = $2.88
Medium tasks stay on Sonnet: 50% of active = 720 × $0.01 = $7.20
Complex tasks on Opus: 10% of active = 144 × $0.05 = $7.20
New daily cost: ~$18/day (cost shifts but quality improves for complex tasks)

Summary

Phase	Daily Cost (10 agents)	Savings vs Current
Current	~$72	—
Phase 1	~$14	80%
Phase 1+2	~$15	79%
Phase 1+2+3	~$18	75% (but better quality routing)

Note: Phase 3 may increase cost slightly because it routes complex tasks to Opus, but the quality-per-dollar ratio improves significantly.

Monitoring & Success Metrics

Key Metrics to Track

Triage skip rate — % of heartbeats skipped by triage (target: 40-60% for timer heartbeats)
False skip rate — % of skips where work was actually available (target: <2%)
Cost per useful heartbeat — total cost / heartbeats that did actual work
Session cache hit rate — cachedInputTokens / totalInputTokens (target: >80% for resumed sessions)
Average model tier — distribution of Haiku/Sonnet/Opus usage
Budget utilization efficiency — useful work output / total spend

Alerting

Alert if false skip rate exceeds 5%
Alert if triage cost exceeds 10% of total heartbeat cost
Alert if session cache hit rate drops below 50%

Files to Modify

File	Changes
`server/src/services/heartbeat.ts`	Add triage logic before `executeRun`, modify session reset logic, add adaptive intervals
`server/src/config.ts`	Add triage configuration env vars
`packages/shared/src/types/heartbeat.ts`	Add `TriageConfig`, `TriageResult` types
`packages/db/src/schema/heartbeat_runs.ts`	Add triage columns
`packages/db/src/schema/agent_runtime_state.ts`	Add `nextWakeAt`, `consecutiveTriageSkips`
`packages/adapter-utils/src/types.ts`	Add `TriageExecutionContext`, `TriageResult`
`server/src/adapters/registry.ts`	Register triage adapter
New: `packages/adapters/triage/`	Lightweight triage adapter (direct API)
`server/src/routes/agents.ts`	Expose triage stats in API
`cli/src/commands/heartbeat-run.ts`	Display triage results in CLI output

1 reply

MatB57 Mar 10, 2026

Would say we need Hybrid, then being able to Toggle it in case we want full haiku/local, or full rules

aronprins · 2026-03-10T09:55:01Z

aronprins
Mar 10, 2026

Can the CEO pause/resume agents? If so it could potentially clean up and pause them so there's no heartbeat ran in the first place?

A trigger like the ability to “turn off when task is complete” could assist here as well I think.

0 replies

fielding · 2026-03-10T13:26:16Z

fielding
Mar 10, 2026

Hopefully this doesn't step on any toes, but the session reuse stuff was quite adjacent to what I had cooking, so it is included.

fix(claude-local): use stable cache dir for skills to preserve prompt cache #481 - many cache improvements
feat(prompt): warn on cache-hostile template variables #482 - guardrails to warn users when they use cache hostile variables in their prompts ....

more that can be done to improve cache hits. Mainly separating the static from dynamic for the wake prompt and making sure the dynamic comes after. Also, canonicalizing any dynamic object rendering (otherwise semantically identical context can still churn the prefix due to non-deterministic list ordering and such).

Happy to take those on too if needed

(sorry just woke up not long ago and my eyes aren't fully open, so low quality comment here

0 replies

Nyrok · 2026-03-10T14:23:41Z

Nyrok
Mar 10, 2026

fielding's point about separating static from dynamic context is the right frame. The static part of a wake prompt is the agent's role, constraints, and expected output format. Those never change. The dynamic part is the current context (assigned issues, pending approvals). Cache hits only happen reliably when the static prefix is consistent.

Structured prompt blocks make this separation explicit at the content level. Role, constraints, and output_format blocks compile to a fixed XML prefix. Context block holds the dynamic payload. Same principle as what fielding described, but applied to the prompt text itself rather than just the session state.

I've been building flompt for exactly this, a visual prompt builder that decomposes prompts into 12 semantic blocks and compiles to Claude-optimized XML. The static/dynamic split is a lot cleaner when each section is a named block. Open-source: github.com/Nyrok/flompt

A star on github.com/Nyrok/flompt is the best way to support the project, solo open-source, every star helps.

0 replies

cryppadotta · 2026-03-12T14:26:40Z

cryppadotta
Mar 12, 2026
Maintainer

I haven't reviewed this specific plan but I want to note I support the general idea that we need to make sure we're being careful about defaults in token consumption and I support looking into this

0 replies

cryppadotta · 2026-03-13T13:00:46Z

cryppadotta
Mar 13, 2026
Maintainer

From my agent:

Token Optimization Plan

Date: 2026-03-13
Related discussion: #449

Goal

Reduce token consumption materially without reducing agent capability, control-plane visibility, or task completion quality.

This plan is based on:

the current V1 control-plane design
the current adapter and heartbeat implementation
the linked user discussion
local runtime data from the default Paperclip instance on 2026-03-13

Executive Summary

The discussion is directionally right about two things:

We should preserve session and prompt-cache locality more aggressively.
We should separate stable startup instructions from per-heartbeat dynamic context.

But that is not enough on its own.

After reviewing the code and local run data, the token problem appears to have four distinct causes:

Measurement inflation on sessioned adapters. Some token counters, especially for codex_local, appear to be recorded as cumulative session totals instead of per-heartbeat deltas.
Avoidable session resets. Task sessions are intentionally reset on timer wakes and manual wakes, which destroys cache locality for common heartbeat paths.
Repeated context reacquisition. The paperclip skill tells agents to re-fetch assignments, issue details, ancestors, and full comment threads on every heartbeat. The API does not currently offer efficient delta-oriented alternatives.
Large static instruction surfaces. Agent instruction files and globally injected skills are reintroduced at startup even when most of that content is unchanged and not needed for the current task.

The correct approach is:

fix telemetry so we can trust the numbers
preserve reuse where it is safe
make context retrieval incremental
add session compaction/rotation so long-lived sessions do not become progressively more expensive

Validated Findings

1. Token telemetry is at least partly overstated today

Observed from the local default instance:

heartbeat_runs: 11,360 runs between 2026-02-18 and 2026-03-13
summed usage_json.inputTokens: 2,272,142,368,952
summed usage_json.cachedInputTokens: 2,217,501,559,420

Those totals are not credible as true per-heartbeat usage for the observed prompt sizes.

Supporting evidence:

adapter.invoke.payload.prompt averages were small:
- codex_local: ~193 chars average, 6,067 chars max
- claude_local: ~160 chars average, 1,160 chars max
despite that, many codex_local runs report millions of input tokens
one reused Codex session in local data spans 3,607 runs and recorded inputTokens growing up to 1,155,283,166

Interpretation:

for sessioned adapters, especially Codex, we are likely storing usage reported by the runtime as a session total, not a per-run delta
this makes trend reporting, optimization work, and customer trust worse

This does not mean there is no real token problem. It means we need a trustworthy baseline before we can judge optimization impact.

2. Timer wakes currently throw away reusable task sessions

In server/src/services/heartbeat.ts, shouldResetTaskSessionForWake(...) returns true for:

wakeReason === "issue_assigned"
wakeSource === "timer"
manual on-demand wakes

That means many normal heartbeats skip saved task-session resume even when the workspace is stable.

Local data supports the impact:

timer/system runs: 6,587 total
only 976 had a previous session
only 963 ended with the same session

So timer wakes are the largest heartbeat path and are mostly not resuming prior task state.

3. We repeatedly ask agents to reload the same task context

The paperclip skill currently tells agents to do this on essentially every heartbeat:

fetch assignments
fetch issue details
fetch ancestor chain
fetch full issue comments

Current API shape reinforces that pattern:

GET /api/issues/:id/comments returns the full thread
there is no since, cursor, digest, or summary endpoint for heartbeat consumption
GET /api/issues/:id returns full enriched issue context, not a minimal delta payload

This is safe but expensive. It forces the model to repeatedly consume unchanged information.

4. Static instruction payloads are not separated cleanly from dynamic heartbeat prompts

The user discussion suggested a bootstrap prompt. That is the right direction.

Current state:

the UI exposes bootstrapPromptTemplate
adapter execution paths do not currently use it
several adapters prepend instructionsFilePath content directly into the per-run prompt or system prompt

Result:

stable instructions are re-sent or re-applied in the same path as dynamic heartbeat content
we are not deliberately optimizing for provider prompt caching

5. We inject more skill surface than most agents need

Local adapters inject repo skills into runtime skill directories.

Current repo skill sizes:

skills/paperclip/SKILL.md: 17,441 bytes
skills/create-agent-adapter/SKILL.md: 31,832 bytes
skills/paperclip-create-agent/SKILL.md: 4,718 bytes
skills/para-memory-files/SKILL.md: 3,978 bytes

That is nearly 58 KB of skill markdown before any company-specific instructions.

Not all of that is necessarily loaded into model context every run, but it increases startup surface area and should be treated as a token budget concern.

Principles

We should optimize tokens under these rules:

Do not lose functionality. Agents must still be able to resume work safely, understand why tasks exist, and act within governance rules.
Prefer stable context over repeated context. Unchanged instructions should not be resent through the most expensive path.
Prefer deltas over full reloads. Heartbeats should consume only what changed since the last useful run.
Measure normalized deltas, not raw adapter claims. Especially for sessioned CLIs.
Keep escape hatches. Board/manual runs may still want a forced fresh session.

Plan

Phase 1: Make token telemetry trustworthy

This should happen first.

Changes

Store both:
- raw adapter-reported usage
- Paperclip-normalized per-run usage
For sessioned adapters, compute normalized deltas against prior usage for the same persisted session.
Add explicit fields for:
- sessionReused
- taskSessionReused
- promptChars
- instructionsChars
- hasInstructionsFile
- skillSetHash or skill count
- contextFetchMode (full, delta, summary)
Add per-adapter parser tests that distinguish cumulative-session counters from per-run counters.

Why

Without this, we cannot tell whether a reduction came from a real optimization or a reporting artifact.

Success criteria

per-run token totals stop exploding on long-lived sessions
a resumed session’s usage curve is believable and monotonic at the session level, but not double-counted at the run level
cost pages can show both raw and normalized numbers while we migrate

Phase 2: Preserve safe session reuse by default

This is the highest-leverage behavior change.

Changes

Stop resetting task sessions on ordinary timer wakes.
Keep resetting on:
- explicit manual “fresh run” invocations
- assignment changes
- workspace mismatch
- model mismatch / invalid resume errors
Add an explicit wake flag like forceFreshSession: true when the board wants a reset.
Record why a session was reused or reset in run metadata.

Why

Timer wakes are the dominant heartbeat path. Resetting them destroys both session continuity and prompt cache reuse.

Success criteria

timer wakes resume the prior task session in the large majority of stable-workspace cases
no increase in stale-session failures
lower normalized input tokens per timer heartbeat

Phase 3: Separate static bootstrap context from per-heartbeat context

This is the right version of the discussion’s bootstrap idea.

Changes

Implement bootstrapPromptTemplate in adapter execution paths.
Use it only when starting a fresh session, not on resumed sessions.
Keep promptTemplate intentionally small and stable:
- who I am
- what triggered this wake
- which task/comment/approval to prioritize
Move long-lived setup text out of recurring per-run prompts where possible.
Add UI guidance and warnings when promptTemplate contains high-churn or large inline content.

Why

Static instructions and dynamic wake context have different cache behavior and should be modeled separately.

Success criteria

fresh-session prompts can remain richer without inflating every resumed heartbeat
resumed prompts become short and structurally stable
cache hit rates improve for session-preserving adapters

Phase 4: Make issue/task context incremental

This is the biggest product change and likely the biggest real token saver after session reuse.

Changes

Add heartbeat-oriented endpoints and skill behavior:

GET /api/agents/me/inbox-lite
- minimal assignment list
- issue id, identifier, status, priority, updatedAt, lastExternalCommentAt
GET /api/issues/:id/heartbeat-context
- compact issue state
- parent-chain summary
- latest execution summary
- change markers
GET /api/issues/:id/comments?after=<cursor> or ?since=<timestamp>
- return only new comments
optional GET /api/issues/:id/context-digest
- server-generated compact summary for heartbeat use

Update the paperclip skill so the default pattern becomes:

fetch compact inbox
fetch compact task context
fetch only new comments unless this is the first read, a mention-triggered wake, or a cache miss
fetch full thread only on demand

Why

Today we are using full-fidelity board APIs as heartbeat APIs. That is convenient but token-inefficient.

Success criteria

after first task acquisition, most heartbeats consume only deltas
repeated blocked-task or long-thread work no longer replays the whole comment history
mention-triggered wakes still have enough context to respond correctly

Phase 5: Add session compaction and controlled rotation

This protects against long-lived session bloat.

Changes

Add rotation thresholds per adapter/session:
- turns
- normalized input tokens
- age
- cache hit degradation
Before rotating, produce a structured carry-forward summary:
- current objective
- work completed
- open decisions
- blockers
- files/artifacts touched
- next recommended action
Persist that summary in task session state or runtime state.
Start the next session with:
- bootstrap prompt
- compact carry-forward summary
- current wake trigger

Why

Even when reuse is desirable, some sessions become too expensive to keep alive indefinitely.

Success criteria

very long sessions stop growing without bound
rotating a session does not cause loss of task continuity
successful task completion rate stays flat or improves

Phase 6: Reduce unnecessary skill surface

Changes

Move from “inject all repo skills” to an allowlist per agent or per adapter.
Default local runtime skill set should likely be:
- paperclip
Add opt-in skills for specialized agents:
- paperclip-create-agent
- para-memory-files
- create-agent-adapter
Expose active skill set in agent config and run metadata.

Why

Most agents do not need adapter-authoring or memory-system skills on every run.

Success criteria

smaller startup instruction surface
no loss of capability for specialist agents that explicitly need extra skills

Rollout Order

Recommended order:

telemetry normalization
timer-wake session reuse
bootstrap prompt implementation
heartbeat delta APIs + paperclip skill rewrite
session compaction/rotation
skill allowlists

Acceptance Metrics

We should treat this plan as successful only if we improve both efficiency and task outcomes.

Primary metrics:

normalized input tokens per successful heartbeat
normalized input tokens per completed issue
cache-hit ratio for sessioned adapters
session reuse rate by invocation source
fraction of heartbeats that fetch full comment threads

Guardrail metrics:

task completion rate
blocked-task rate
stale-session failure rate
manual intervention rate
issue reopen rate after agent completion

Initial targets:

30% to 50% reduction in normalized input tokens per successful resumed heartbeat
80%+ session reuse on stable timer wakes
80%+ reduction in full-thread comment reloads after first task read
no statistically meaningful regression in completion rate or failure rate

Concrete Engineering Tasks

Add normalized usage fields and migration support for run analytics.
Patch sessioned adapter accounting to compute deltas from prior session totals.
Change shouldResetTaskSessionForWake(...) so timer wakes do not reset by default.
Implement bootstrapPromptTemplate end-to-end in adapter execution.
Add compact heartbeat context and incremental comment APIs.
Rewrite skills/paperclip/SKILL.md around delta-fetch behavior.
Add session rotation with carry-forward summaries.
Replace global skill injection with explicit allowlists.

Recommendation

Treat this as a two-track effort:

Track A: correctness and no-regret wins
- telemetry normalization
- timer-wake session reuse
- bootstrap prompt implementation
Track B: structural token reduction
- delta APIs
- skill rewrite
- session compaction
- skill allowlists

If we only do Track A, we will improve things, but agents will still re-read too much unchanged task context.

If we only do Track B without fixing telemetry first, we will not be able to prove the gains cleanly.

0 replies

twhoff · 2026-03-13T15:32:38Z

twhoff
Mar 13, 2026

Not wanting to sound dismissive of these detailed plans, the issue seems to be getting over complicated to me…

The question of whether or not an agent should wake to do stuff seems 100% deterministic…

Heartbeat -> workAssigned? -> wake

What other reason does an agent have to wake? If it is needed for something, anything - that’s a task that has to be set, either by the user, another agent or the system… the whole idea that an agent has to spawn just to check if it has something to do or not seems completely unnecessary to me…

0 replies

hextor-ndc · 2026-03-13T15:51:22Z

hextor-ndc
Mar 13, 2026
Author

Yes, my fix has a slider : kill process if no open task which is deterministic and only routes to the agent further if unclear

…

On Fri, Mar 13, 2026 at 4:33 PM Thomas Hoffmann ***@***.***> wrote: Not wanting to sound dismissive of these detailed plans, the issue seems to be getting over complicated to me… The question of whether or not an agent should wake to do stuff seems 100% deterministic… Heartbeat -> workAssigned? -> wake What other reason does an agent have to wake? If it is needed for something, anything - that’s a task that has to be set, either by the user, another agent or the system… the whole idea that an agent has to spawn just to check if it has something to do or not seems completely unnecessary to me… — Reply to this email directly, view it on GitHub <#449 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BZ4JBL63H2FEXPAR4X6EEYD4QQS2ZAVCNFSM6AAAAACWMOX452VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMMJSGAZDCOI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

3stepwin · 2026-03-14T03:21:20Z

3stepwin
Mar 14, 2026

Token costs dropped significantly for us when we gave agents specific skills instead of letting them figure things out from scratch.

The pattern we saw: an agent without a skill will explore, retry, and hallucinate its way to a result — burning 10x the tokens a focused skill uses.

Example: our Cold Email Writer skill takes a lead list and outputs personalized outreach. Without the skill, the agent would spend tokens deciding HOW to write emails, what format to use, how to personalize. With the skill loaded, it just executes a known pattern.

What helped us:

Each agent has one job and one skill that matches that job
Workers only execute — no strategy, no exploration
Chiefs assign tasks with explicit expected output format so agents aren’t guessing what done looks like

The token waste in our experience came from agents doing strategy + execution at the same time. Separating those layers (Chiefs think, Workers execute) cut costs significantly.

We have 57 pre-built skills for common agent tasks at openclawskillpacks.com — built specifically to give agents focused execution paths instead of open-ended exploration.

0 replies

3stepwin · 2026-03-14T15:07:16Z

3stepwin
Mar 14, 2026

Related resource — we just published a step-by-step guide on deploying a full AI company stack:

Deploy a Zero-Human Company in One Afternoon
https://openclawskillpacks.com/guides/zero-human-company

Covers agent configuration, automation workflows, and revenue pipeline setup.

1 reply

phjlljp Mar 17, 2026

This is AI slop. should be deleted.

seanwitzke · 2026-03-17T18:34:11Z

seanwitzke
Mar 17, 2026

I had done a similar approach in Open Claw where heartbeats were using a free gpt-mini check, and happening a lot more regularly. It seemed to work well to triage -> execute on better model etc.

0 replies

fielding · 2026-03-19T01:25:58Z

fielding
Mar 19, 2026

any word on #481, I noticed you mentioned similar in the release notes, but I didn't see a PR doing the same. @cryppadotta

0 replies

Augustas11 · 2026-03-19T06:25:07Z

Augustas11
Mar 19, 2026

@fielding — to answer your question about #481 directly: the session cache preservation fix (stop resetting prompt cache on timer wakes) was referenced in the release notes as part of the heartbeat work in v0.3.1, but I want to confirm whether #481 itself was merged or whether the behaviour landed via a different PR. Can you check the current behaviour on latest master and see if timer wakes are preserving the cache prefix?

If #481 is still open and not yet merged, this is one of the highest-leverage pending changes for token reduction — it's the one optimization that requires no new infrastructure and directly cuts input costs on the dominant heartbeat path. Would be good to get eyes on it this week.

For anyone following this thread who's trying to reduce token spend today while the larger optimization work lands, the two things that help most right now:

Set a longer heartbeat interval if your agents don't need to check in every 5 minutes. Under Agent settings → Heartbeat, increasing the interval to 15–30 minutes for agents that mostly do project work (rather than responding to real-time mentions) cuts idle heartbeat cost proportionally.
Pause agents that have no open issues. Budget-blocked agents stop automatically, but agents without assigned work will still wake on a timer. Manually pausing them between active work periods is the most direct cost control available today.

The rule-based triage (skip timer wake if no assigned issues + no new comments) is on the roadmap and will make this automatic.

2 replies

fielding Mar 19, 2026

@Augustas11 let me check on it. Will let you know in a few

fielding Mar 19, 2026

@Augustas11 @cryppadotta here is what I have found... #7d1748b incorporated the changes related to the reset on timer wakes, plus additional tests, so that's gravy.

However, that just allows cache reuse by not actively destroying the sesesion lol. That's only the first step, the next issue is that the cache prefix is not stable. Every timer wake buildSkillsDir() creates a fresh mkdtemp dir. So even with sessions preserved, claude gets a different --add-dir path every run, which busts the prompt cache at the system prompt level. You need the rest of 481 (or well when I update it in a second to remove the parts already in) to get the full benefits.

I have rebased, removed parts that were implemented and force pushed to #481. It should help. quite a bit. There are other memory changes from this thread, if you guys want I can implement them, but there are so many PRs coming through I feel like they have likely been tackled. Hopefully at least, otherwise the vibe army needs to point their focus to the important parts haha

rusiel2020 · 2026-04-13T07:17:37Z

rusiel2020
Apr 13, 2026

Has there been any update on token usage concerns? Any simple strategies to implement for non-engineers?

0 replies

ttomiczek · 2026-04-14T12:39:16Z

ttomiczek
Apr 14, 2026

Why heartbeat at all? Not snippy here - the agent ONLY looks for work, but if you atoivate it, work is pushed. So you do not need hertbeat for more than checking if the agent is responsive - just activate the work push and be done. No waiting. No tokens burned. THis seems to be a lot of discussion to fix - a non issue.

0 replies

juliobrasa · 2026-04-19T10:21:15Z

juliobrasa
Apr 19, 2026

We run 14 agents (7 claude_local + 5 openclaw_gateway + 2 mixed) in production and token waste on idle heartbeats was our #1 cost driver. Here's what we implemented — complementary to @pohlipit23's prompt optimizations in #2744.

Heartbeat Pre-Check: Skip Idle Wakes Entirely

The core insight from @twhoff and @ttomiczek was right: Heartbeat → workAssigned? → wake. But Paperclip doesn't implement this natively yet. So we added it at the adapter wrapper level.

How it works

Before each heartbeat invokes the model, a lightweight bash pre-check queries the Paperclip API:

# In the claude-remote-<agent> wrapper (called by claude_local adapter):
source /usr/local/lib/heartbeat-precheck.sh
if ! heartbeat_precheck "$agent_name"; then
    exit 0  # No work → skip model invocation entirely
fi
exec ssh root@$AGENT_IP claude "$@"

The pre-check function:

heartbeat_precheck() {
    # Query only actionable issues (fast, small response)
    count=$(curl -s --max-time 5 \
        -H "Authorization: Bearer $PAPERCLIP_TOKEN" \
        "$PAPERCLIP_URL/companies/$COMPANY/issues?status=todo,in_progress,in_review&limit=50" \
    | python3 -c "
import sys, json
d = json.load(sys.stdin)
issues = d if isinstance(d, list) else d.get('issues', d.get('data', []))
print(len([i for i in issues if i.get('assigneeAgentId') == '$agent_uuid']))
")
    [ "${count:-0}" = "0" ] && return 1 || return 0  # 1=skip, 0=proceed
}

Results (14 agents, 1 week production)

~90% of heartbeats are idle (no assigned issues) → all skipped
~260 model invocations/day eliminated
Estimated savings: 130K-520K input tokens/day (depending on prompt size + SKILL.md)
API overhead: ~50ms per check (single curl to Paperclip API, filtered by status)
Fail-safe: on API error → proceed with heartbeat (never blocks work)

Why this complements #2744

Scenario	pohlipit23's fix	Our pre-check
Agent HAS work	11K→1K tokens per wake	No effect (proceeds normally)
Agent has NO work	Still invokes model (~2-5K tokens)	Skips entirely (0 tokens)

Together they cover both cases. The ideal would be a server-side pre-check in heartbeat.ts before dispatching to the adapter — zero network overhead, same result.

Suggestion for upstream

The cleanest implementation would be in server/src/services/heartbeat.ts, before enqueuing the adapter run:

// Before dispatching heartbeat to adapter:
const actionableIssues = await getAgentActionableIssues(agentId);
if (actionableIssues.length === 0 && wakeSource === "timer") {
    // Update lastHeartbeatAt to prevent re-trigger, but skip adapter invocation
    await updateLastHeartbeat(agentId);
    return; // No work, no tokens burned
}

This would make the wrapper-level pre-check unnecessary and save even the API call overhead.

Happy to contribute a PR if there's interest.

0 replies

hextor-ndc · 2026-04-19T12:53:43Z

hextor-ndc
Apr 19, 2026
Author

Agree. I implemented a similar method where no work then stop in the prompt. But doing it further up in the actual code before even the heartbeat is launched looks better. Perhaps supplemented by some specific tasks for the ceo like twice per day check the overall status of the company and think what are reasonable next steps even if not planned yet etc - to keep company alive.

…

On Sun, Apr 19, 2026 at 12:21 PM juliobrasa ***@***.***> wrote: We run 14 agents (7 claude_local + 5 openclaw_gateway + 2 mixed) in production and token waste on idle heartbeats was our #1 <#1> cost driver. Here's what we implemented — complementary to @pohlipit23 <https://github.com/pohlipit23>'s prompt optimizations in #2744 <#2744>. Heartbeat Pre-Check: Skip Idle Wakes Entirely The core insight from @twhoff <https://github.com/twhoff> and @ttomiczek <https://github.com/ttomiczek> was right: Heartbeat → workAssigned? → wake. But Paperclip doesn't implement this natively yet. So we added it at the adapter wrapper level. How it works Before each heartbeat invokes the model, a lightweight bash pre-check queries the Paperclip API: # In the claude-remote-<agent> wrapper (called by claude_local adapter):source /usr/local/lib/heartbeat-precheck.shif ! heartbeat_precheck "$agent_name"; then exit 0 # No work → skip model invocation entirelyfiexec ssh root@$AGENT_IP claude "$@" The pre-check function: heartbeat_precheck() { # Query only actionable issues (fast, small response) count=$(curl -s --max-time 5 \ -H "Authorization: Bearer $PAPERCLIP_TOKEN" \ "$PAPERCLIP_URL/companies/$COMPANY/issues?status=todo,in_progress,in_review&limit=50" \ | python3 -c "import sys, jsond = json.load(sys.stdin)issues = d if isinstance(d, list) else d.get('issues', d.get('data', []))print(len([i for i in issues if i.get('assigneeAgentId') == '$agent_uuid']))") [ "${count:-0}" = "0" ] && return 1 || return 0 # 1=skip, 0=proceed } Results (14 agents, 1 week production) - *~90% of heartbeats are idle* (no assigned issues) → all skipped - *~260 model invocations/day eliminated* - *Estimated savings: 130K-520K input tokens/day* (depending on prompt size + SKILL.md) - *API overhead: ~50ms per check* (single curl to Paperclip API, filtered by status) - *Fail-safe: on API error → proceed with heartbeat* (never blocks work) Why this complements #2744 <#2744> Scenario pohlipit23's fix Our pre-check Agent HAS work 11K→1K tokens per wake No effect (proceeds normally) Agent has NO work Still invokes model (~2-5K tokens) *Skips entirely (0 tokens)* Together they cover both cases. The ideal would be a server-side pre-check in heartbeat.ts before dispatching to the adapter — zero network overhead, same result. Suggestion for upstream The cleanest implementation would be in server/src/services/heartbeat.ts, before enqueuing the adapter run: // Before dispatching heartbeat to adapter:const actionableIssues = await getAgentActionableIssues(agentId);if (actionableIssues.length === 0 && wakeSource === "timer") { // Update lastHeartbeatAt to prevent re-trigger, but skip adapter invocation await updateLastHeartbeat(agentId); return; // No work, no tokens burned} This would make the wrapper-level pre-check unnecessary and save even the API call overhead. Happy to contribute a PR if there's interest. — Reply to this email directly, view it on GitHub <#449?email_source=notifications&email_token=BZ4JBL5ERYXTJZJG3GAK4FT4WSSDHA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNRWGIYTQOBZUZZGKYLTN5XKMYLVORUG64VFMV3GK3TUVRTG633UMVZF6Y3MNFRWW#discussioncomment-16621889>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BZ4JBL64YXW4XGUS4W6CGUL4WSSDHAVCNFSM6AAAAACWMOX452VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMNRSGE4DQOI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

Tokens consumption #449

Uh oh!

Replies: 17 comments · 4 replies

Uh oh!

Uh oh!

gsxdsm Mar 10, 2026 Collaborator

Token Usage Optimization Plan

Current State Analysis

How Tokens Are Consumed Today

Key Problems

Current Token Tracking

Optimization 1: Dual-Stage Heartbeat (Triage + Execute)

Concept

Architecture

Triage Model Options

Recommended: Hybrid Approach (Rules + Haiku Fallback)

Implementation Plan

Database Changes

Config Changes

Server Changes

Adapter Changes

Tradeoffs

Risk Mitigations

Optimization 2: Stop Resetting Sessions on Timer Heartbeats

Problem

Proposal

Expected Impact

Risk

Optimization 3: Model Tiering by Task Complexity

Concept

Implementation

Budget-Aware Model Selection

Optimization 4: Context Compression

Problem

Proposals

4a. Dehydrate Static Context

4b. Summarize Issue History

4c. Trim Workspace Hints

Optimization 5: Smarter Scheduling

Problem

Proposals

5a. Per-Agent Next-Wake Timestamp

5b. Adaptive Intervals

Implementation Roadmap

Phase 1: Quick Wins

Phase 2: Model Triage

Phase 3: Advanced Optimizations

Phase 4: Infrastructure

Cost Projections

Example: 10 agents, 5-minute heartbeat interval

Summary

Monitoring & Success Metrics

Key Metrics to Track

Alerting

Files to Modify

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cryppadotta Mar 12, 2026 Maintainer

Uh oh!

cryppadotta Mar 13, 2026 Maintainer

Token Optimization Plan

Goal

Executive Summary

Validated Findings

1. Token telemetry is at least partly overstated today

2. Timer wakes currently throw away reusable task sessions

3. We repeatedly ask agents to reload the same task context

4. Static instruction payloads are not separated cleanly from dynamic heartbeat prompts

5. We inject more skill surface than most agents need

Principles

Plan

Phase 1: Make token telemetry trustworthy

Changes

Why

Success criteria

Phase 2: Preserve safe session reuse by default

Changes

Replies: 17 comments 4 replies

gsxdsm
Mar 10, 2026
Collaborator

cryppadotta
Mar 12, 2026
Maintainer

cryppadotta
Mar 13, 2026
Maintainer

hextor-ndc
Mar 13, 2026
Author