Skip to content

Commit 97060ca

Browse files
wow-mileyclaude
andauthored
AMPR-184 #531: PlaybackRelay — deterministic model replay (#540)
Adds PlaybackRelay : CognitiveRelay to the ampere-eval module: it replays a Trace's recorded model routing in call order, charges zero Watts for replays, and applies a configurable miss policy plus a branch index for rewind-and-correct. - RecordedModelCall + Trace.modelCalls(): pairs the trace's model-call events into ordered (request, response) pairs, replicating the RECON §3.3 pairing (6-part key, first-match-consumed, chronological); non-model events ignored. - PlaybackRelay (link.socket.ampere.eval.relay): ordered replay with reason="playback", a Result-boundary replay() core, MissPolicy.Error (strict, typed PlaybackMiss) and MissPolicy.Delegate (live handoff), and branchIndex wiring (replay < branchIndex, then delegate). - Resolves the ticket's OPEN DECISION as miss-as-failure (divergence is a red build, not a graded Reading). - Per RECON Findings A/B, replays the recorded routing decision only, not response content; content-faithful replay stays at the UpstreamLlmClient seam (ampere-eval ticket 4). Zero Watts holds by construction (no live call). - 10 commonTest cases; ktlintFormat clean; common (iOS) metadata compiles. I (Claude) wrote this commit on Miley's behalf. Closes #531 Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 13905c9 commit 97060ca

3 files changed

Lines changed: 535 additions & 0 deletions

File tree

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
package link.socket.ampere.eval.relay
2+
3+
import kotlinx.coroutines.sync.Mutex
4+
import kotlinx.coroutines.sync.withLock
5+
import kotlinx.serialization.json.Json
6+
import link.socket.ampere.agents.domain.routing.CognitiveRelay
7+
import link.socket.ampere.agents.domain.routing.RelayConfig
8+
import link.socket.ampere.agents.domain.routing.RoutingContext
9+
import link.socket.ampere.agents.domain.routing.RoutingResolution
10+
import link.socket.ampere.data.DEFAULT_JSON
11+
import link.socket.ampere.domain.ai.configuration.AIConfiguration
12+
import link.socket.ampere.eval.trace.Trace
13+
import link.socket.ampere.trace.WattCost
14+
15+
/**
16+
* What a [PlaybackRelay] does when an Arc makes a model call for which the
17+
* [Trace] holds no recording (the Arc has *diverged* from the recording).
18+
*
19+
* Resolves the AMPR-184 OPEN DECISION (`miss-as-failure` vs. `miss-as-low-Reading`)
20+
* in favour of **miss-as-failure**: a strict miss is a hard, typed [PlaybackMiss]
21+
* that turns divergence into a red build. Grading divergence as a low score is a
22+
* reward-function concern (RFT/GRPO) that belongs to a later layer, not to the
23+
* relay — keeping this seam a clean boolean (replayed exactly, or diverged).
24+
*/
25+
sealed interface MissPolicy {
26+
/** Strict, eval mode: a miss yields `Result.failure(`[PlaybackMiss]`)`. */
27+
data object Error : MissPolicy
28+
29+
/** Rewind handoff: a miss is routed to the live delegate (failure if none). */
30+
data object Delegate : MissPolicy
31+
}
32+
33+
/**
34+
* Typed failure signalling that an Arc diverged from its recording: the
35+
* [callIndex]-th model call had no recorded counterpart under [MissPolicy.Error].
36+
*
37+
* Carried as a `Result.failure` value by [PlaybackRelay.replay] (the Result
38+
* boundary) and thrown by the `CognitiveRelay` methods so the divergence
39+
* propagates up the call path as a red build.
40+
*/
41+
class PlaybackMiss(
42+
val callIndex: Int,
43+
val recordedCallCount: Int,
44+
) : Exception(
45+
"PlaybackRelay diverged: model call #$callIndex has no recorded response " +
46+
"(trace recorded $recordedCallCount model call(s)).",
47+
)
48+
49+
/**
50+
* A relay that replays a [Trace]'s recorded model routing in order, behind the
51+
* exact [CognitiveRelay] interface (AMPR-184; interface per RECON-relay §1).
52+
*
53+
* The same class expresses both eval and rewind-and-correct:
54+
* - **Eval** (defaults): no delegate, [MissPolicy.Error], `branchIndex` past the
55+
* end — every model call replays its recording, and a missing recording is a
56+
* [PlaybackMiss] (divergence is itself a finding).
57+
* - **Rewind** : a [liveDelegate] plus a [branchIndex] of `k` — the first `k`
58+
* calls replay, then call `k` onward is handed to the live delegate.
59+
*
60+
* ### What is (and isn't) replayed
61+
* `CognitiveRelay` is a **routing-only** seam: it selects an [AIConfiguration];
62+
* it never sees the prompt or the completion (RECON-relay Finding A), and the
63+
* recorded events carry no content (Finding B). This relay therefore replays the
64+
* recorded **routing decision in call order** with `reason = "playback"` and a
65+
* zero-Watt guarantee; it does **not** substitute response *content*. Because the
66+
* recorded provider/model are plain ids and reconstructing a live [AIConfiguration]
67+
* from them needs a provider registry the eval module deliberately does not depend
68+
* on, a replay hit returns the supplied `fallbackConfiguration` (the recorded
69+
* selection is available for inspection via [recordedCallAt]). Content-faithful
70+
* replay belongs to the `UpstreamLlmClient` seam in a later ticket.
71+
*
72+
* ### Watts
73+
* A replayed call performs **no live provider invocation**, so it consumes no
74+
* tokens and therefore zero Watts (RECON-relay §2.4). See [replayedWattCost].
75+
*
76+
* @param trace the recorded run to replay.
77+
* @param missPolicy what to do when the Arc makes more (or different) calls than
78+
* were recorded. Defaults to strict [MissPolicy.Error].
79+
* @param liveDelegate the relay used for branched/delegated calls. Required for
80+
* [MissPolicy.Delegate] and for any call at/after [branchIndex]; a `null`
81+
* delegate in those cases yields a `Result.failure`.
82+
* @param branchIndex the **call** index at which replay stops and the delegate
83+
* takes over: calls `0 until branchIndex` replay, `branchIndex` onward delegate.
84+
* Defaults to `trace.size` — an event count that is always ≥ the model-call
85+
* count, i.e. "never branch" (the degenerate eval case, mirroring
86+
* `TraceCursor.branchAfter(size - 1)`).
87+
*/
88+
class PlaybackRelay(
89+
private val trace: Trace,
90+
private val missPolicy: MissPolicy = MissPolicy.Error,
91+
private val liveDelegate: CognitiveRelay? = null,
92+
private val branchIndex: Int = trace.size,
93+
private val json: Json = DEFAULT_JSON,
94+
) : CognitiveRelay {
95+
96+
private val recordedCalls: List<RecordedModelCall> = trace.modelCalls(json)
97+
private val mutex = Mutex()
98+
private var nextCallIndex: Int = 0
99+
100+
/** The ordered recorded model calls this relay replays. */
101+
val recordedCallCount: Int get() = recordedCalls.size
102+
103+
/**
104+
* The Watt cost charged for a replayed call: **zero**. Replayed calls make no
105+
* live provider invocation, so they consume no tokens (RECON-relay §2.4).
106+
* Exposed so callers/tests can assert the zero-Watt contract explicitly.
107+
*/
108+
val replayedWattCost: WattCost = WattCost()
109+
110+
override val config: RelayConfig = RelayConfig()
111+
112+
override suspend fun resolve(
113+
context: RoutingContext,
114+
fallbackConfiguration: AIConfiguration,
115+
): AIConfiguration = resolveWithMetadata(context, fallbackConfiguration).configuration
116+
117+
override suspend fun resolveWithMetadata(
118+
context: RoutingContext,
119+
fallbackConfiguration: AIConfiguration,
120+
): RoutingResolution = replay(context, fallbackConfiguration).getOrThrow()
121+
122+
/**
123+
* The Result-boundary core of the relay (RECON-relay: "Result boundaries").
124+
* For the next call index `i`:
125+
* - `i >= branchIndex` → **branch**: route to [liveDelegate] (failure if none).
126+
* - `i < recordedCallCount` → **replay hit**: the recorded selection, zero Watts.
127+
* - otherwise (`i` within the replay window but past the recordings) → **miss**:
128+
* [MissPolicy.Error] → `failure(`[PlaybackMiss]`)`; [MissPolicy.Delegate] →
129+
* route to [liveDelegate] (failure if none).
130+
*
131+
* The branch check precedes the replay check so a `branchIndex` inside the
132+
* recorded range still hands off live (rewind-and-correct, AMPR-184 task 2.4).
133+
*/
134+
suspend fun replay(
135+
context: RoutingContext,
136+
fallbackConfiguration: AIConfiguration,
137+
): Result<RoutingResolution> {
138+
val index = mutex.withLock { nextCallIndex++ }
139+
return when {
140+
index >= branchIndex -> delegate(context, fallbackConfiguration, index)
141+
index < recordedCalls.size -> Result.success(
142+
RoutingResolution(configuration = fallbackConfiguration, reason = PLAYBACK_REASON),
143+
)
144+
missPolicy == MissPolicy.Delegate -> delegate(context, fallbackConfiguration, index)
145+
else -> Result.failure(PlaybackMiss(index, recordedCalls.size))
146+
}
147+
}
148+
149+
/** Recorded model call at [index] in replay order, or `null` if out of range. */
150+
fun recordedCallAt(index: Int): RecordedModelCall? = recordedCalls.getOrNull(index)
151+
152+
override suspend fun updateConfig(newConfig: RelayConfig) {
153+
// No-op: playback selection is driven by the recorded Trace, not routing rules.
154+
}
155+
156+
private suspend fun delegate(
157+
context: RoutingContext,
158+
fallbackConfiguration: AIConfiguration,
159+
index: Int,
160+
): Result<RoutingResolution> {
161+
val delegate = liveDelegate ?: return Result.failure(
162+
IllegalStateException(
163+
"PlaybackRelay call #$index must be served live, but no liveDelegate was provided.",
164+
),
165+
)
166+
return Result.success(delegate.resolveWithMetadata(context, fallbackConfiguration))
167+
}
168+
169+
companion object {
170+
/** The `RoutingResolution.reason` stamped on every replayed selection. */
171+
const val PLAYBACK_REASON: String = "playback"
172+
}
173+
}
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
package link.socket.ampere.eval.relay
2+
3+
import kotlinx.serialization.json.Json
4+
import link.socket.ampere.agents.domain.event.Event
5+
import link.socket.ampere.agents.domain.event.ProviderCallCompletedEvent
6+
import link.socket.ampere.agents.domain.event.ProviderCallStartedEvent
7+
import link.socket.ampere.api.model.TokenUsage
8+
import link.socket.ampere.data.DEFAULT_JSON
9+
import link.socket.ampere.domain.ai.provider.ProviderId
10+
import link.socket.ampere.eval.trace.Trace
11+
12+
/**
13+
* One recorded model call, paired from a [Trace]'s telemetry events.
14+
*
15+
* Per RECON-relay §3.4 (AMPR-189), the model-call events are **metadata only**:
16+
* they carry provider, model, routing reason, token usage, latency and success —
17+
* but **not** the prompt or the completion text. A `RecordedModelCall` therefore
18+
* pins the recorded *routing decision* and its *cost metadata*; deterministic
19+
* replay of response *content* is out of scope for the `CognitiveRelay` seam
20+
* (RECON-relay Findings A & B) and is left to the `UpstreamLlmClient` seam in a
21+
* later ticket.
22+
*
23+
* @property started the `ProviderCallStartedEvent` for this call, or `null` when
24+
* the trace recorded a completion without a matching start (RECON-relay §3.3
25+
* tolerates a missing start; only [ProviderCallStartedEvent.routingReason] is
26+
* then unavailable).
27+
* @property completed the `ProviderCallCompletedEvent` for this call (always
28+
* present — pairing is keyed off completions).
29+
*/
30+
data class RecordedModelCall(
31+
val started: ProviderCallStartedEvent?,
32+
val completed: ProviderCallCompletedEvent,
33+
) {
34+
/** Provider that served the recorded call. */
35+
val providerId: ProviderId get() = completed.providerId
36+
37+
/** Model that served the recorded call. */
38+
val modelId: String get() = completed.modelId
39+
40+
/** Recorded token accounting (the sole input to the Watt formula, RECON-relay §2.3). */
41+
val usage: TokenUsage get() = completed.usage
42+
43+
/** Whether the recorded call succeeded. */
44+
val success: Boolean get() = completed.success
45+
46+
/** The recorded routing reason, or `null` if the start event was not recorded. */
47+
val routingReason: String? get() = started?.routingReason
48+
}
49+
50+
/**
51+
* Maps this trace's ordered model-call events into an ordered list of
52+
* `(request, response)` pairs (AMPR-184 task 2.1).
53+
*
54+
* The pairing replicates `ArcTraceProjection.buildModelInvocations` (RECON-relay
55+
* §3.3) **verbatim**: each `ProviderCallCompletedEvent` is matched to the first
56+
* still-unconsumed `ProviderCallStartedEvent` satisfying the 6-part correlation
57+
* key (`timestamp <=`, `workflowId`, `agentId`, `providerId`, `modelId`,
58+
* `cognitivePhase`), and that start is then removed so it pairs at most once.
59+
* There is no correlation id — ordering is load-bearing (RECON-relay Guideline 5).
60+
*
61+
* Non-model events are ignored. Calls are enumerated in completion order, which
62+
* is call order for the sequential, deterministic runs evals replay.
63+
*/
64+
fun Trace.modelCalls(json: Json = DEFAULT_JSON): List<RecordedModelCall> {
65+
val decoded = events.map { json.decodeFromJsonElement(Event.serializer(), it.payload) }
66+
val starts = decoded.filterIsInstance<ProviderCallStartedEvent>().toMutableList()
67+
68+
return decoded.filterIsInstance<ProviderCallCompletedEvent>().map { completed ->
69+
val start = starts.firstOrNull { candidate ->
70+
candidate.timestamp <= completed.timestamp &&
71+
candidate.workflowId == completed.workflowId &&
72+
candidate.agentId == completed.agentId &&
73+
candidate.providerId == completed.providerId &&
74+
candidate.modelId == completed.modelId &&
75+
candidate.cognitivePhase == completed.cognitivePhase
76+
}
77+
if (start != null) starts.remove(start)
78+
RecordedModelCall(started = start, completed = completed)
79+
}
80+
}

0 commit comments

Comments
 (0)