|
| 1 | +# Plan: Side-effect replay tests |
| 2 | + |
| 3 | +## Scope |
| 4 | + |
| 5 | +The existing `WorkflowDeterminismTest` runs a workflow, fetches its history, |
| 6 | +and calls `WorkflowReplayer.replayWorkflowExecution` — this catches |
| 7 | +non-determinism but does **not** verify that plugin-managed side effects |
| 8 | +(ChatModel calls, tool methods, MCP calls, `@SideEffectTool` bodies) are not |
| 9 | +re-executed on replay. The Temporal AI partner review standards specifically |
| 10 | +call out "tests that make sure you're avoiding repeated side effects on |
| 11 | +replays." |
| 12 | + |
| 13 | +This branch adds counting/assertion tests for each replayable surface the |
| 14 | +plugin introduces. |
| 15 | + |
| 16 | +## Files to change / add |
| 17 | + |
| 18 | +- `src/test/java/io/temporal/springai/replay/ChatModelSideEffectTest.java` |
| 19 | + - Register a `ChatModel` that increments an `AtomicInteger` on every call. |
| 20 | + - Drive a workflow that calls it once, capture the history, then replay |
| 21 | + with `WorkflowReplayer`. Assert the counter is still 1 after replay. |
| 22 | + |
| 23 | +- `src/test/java/io/temporal/springai/replay/ActivityToolSideEffectTest.java` |
| 24 | + - Build an `@ActivityInterface` whose `@Tool`-annotated method increments |
| 25 | + a counter on the activity implementation. |
| 26 | + - Use `ToolCallingStubChatModel`-style stub to produce a tool call, let |
| 27 | + the workflow drive the tool via `ActivityToolCallback`. Replay and |
| 28 | + assert the tool method counter is still 1. |
| 29 | + |
| 30 | +- `src/test/java/io/temporal/springai/replay/SideEffectToolReplayTest.java` |
| 31 | + - Register a `@SideEffectTool` whose `@Tool` body increments a counter. |
| 32 | + - Drive the workflow to call it via the stubbed model. Capture history. |
| 33 | + - Replay and assert the inner body's counter did not advance |
| 34 | + (Workflow.sideEffect is memoized, so this is the behavior we're |
| 35 | + asserting). |
| 36 | + |
| 37 | +- `src/test/java/io/temporal/springai/replay/McpToolSideEffectTest.java` |
| 38 | + - Register a fake `McpSyncClient` (or mock) that increments a counter on |
| 39 | + `callTool`. Replay and assert it stays at 1. |
| 40 | + - If mocking `McpSyncClient` is too heavy, skip this and cover MCP at the |
| 41 | + activity level (counter on `McpClientActivityImpl.callTool`). |
| 42 | + |
| 43 | +All four tests share the pattern from `WorkflowDeterminismTest`: start |
| 44 | +`TestWorkflowEnvironment`, register the counting impl, run, fetch history, |
| 45 | +call `WorkflowReplayer.replayWorkflowExecution`, then |
| 46 | +`assertEquals(1, counter.get())`. |
| 47 | + |
| 48 | +## Test plan |
| 49 | + |
| 50 | +- `./gradlew :temporal-spring-ai:test` locally and in CI. |
| 51 | +- Verify the new tests actually fail if we regress (remove the activity |
| 52 | + memoization intentionally in a scratch branch to confirm the counter |
| 53 | + asserts catch it). |
| 54 | + |
| 55 | +## PR |
| 56 | + |
| 57 | +**Title:** `temporal-spring-ai: add side-effect replay tests for chat, tools, SideEffectTool, and MCP` |
| 58 | + |
| 59 | +**Body:** |
| 60 | + |
| 61 | +``` |
| 62 | +## What was changed |
| 63 | +Four new replay tests under `src/test/.../replay/`: |
| 64 | +- ChatModelSideEffectTest |
| 65 | +- ActivityToolSideEffectTest |
| 66 | +- SideEffectToolReplayTest |
| 67 | +- McpToolSideEffectTest |
| 68 | +
|
| 69 | +Each one runs a workflow that drives the corresponding surface, captures |
| 70 | +history, calls `WorkflowReplayer.replayWorkflowExecution`, and asserts that |
| 71 | +the underlying side effect (counter increment) did not run a second time |
| 72 | +during replay. |
| 73 | +
|
| 74 | +## Why? |
| 75 | +The existing determinism test catches history-vs-command mismatches but |
| 76 | +doesn't prove the plugin isn't re-invoking user side effects on replay. |
| 77 | +Temporal's AI partner review standards require side-effect safety tests, |
| 78 | +and these regressions would be easy to introduce accidentally if someone |
| 79 | +later added an in-workflow fallback or a cache path. Counter-based tests |
| 80 | +make any such regression show up immediately. |
| 81 | +``` |
0 commit comments