Status: Production-ready guidance with reproducible steps and guardrails. If you have real traces, please share anonymized logs — they help harden thresholds & adapters.
- Back to multi-agent map → Multi-Agent Problems
- Related patterns → Memory Desync (pattern_memory_desync), SCU (pattern_symbolic_constraint_unlock), Vectorstore Fragmentation (pattern_vectorstore_fragmentation)
- Examples → Example 04 · Multi-Agent Coordination, Example 03 · Pipeline Patch
- Eval → Cross-Agent Consistency (κ)
Two or more agents write to a shared memory (vector store, KV store, doc DB). Without versioning & conflict control, a later write silently overwrites a more recent or semantically different state (“last-writer-wins”). Downstream agents read stale or missing facts → contradictions, hallucinations, or wrong tool calls.
Typical symptoms
- “We agreed on Plan v3 yesterday… why are we back to v0?”
- Auditor validates deleted or older evidence.
- Turn logs show non-monotonic version jumps:
… 7 → 3 → 8.
- Stale write: Agent B writes with an old
mem_revit fetched minutes ago. - Concurrent write: Agents A & B write simultaneously; store picks one arbitrarily.
- Namespace collision: Different flows use the same
entity_idor index namespace. - Schema drift: A writes
{plan,deadline}, B writes{deadline,notes}and dropsplan. - Fragmented store: Partitions disagree on latest revision (see vectorstore fragmentation).
Every write envelope must include:
{
"entity_id": "project:alpha",
"agent_id": "planner",
"role_id": "planner@v3",
"role_hash": "sha256:78c2…", // persona digest (see role-drift.md)
"op_id": "op-2025-08-13T12:34:56Z#1234",
"timestamp": "2025-08-13T12:34:56Z",
"mem_rev": 8, // intended new revision (monotonic int)
"prev_rev": 7, // what writer claims to extend
"mem_hash": "sha256:abcd1234", // hash(content)
"parents": [7], // for merges, can be [7,7a] (three-way)
"content": {
"plan": "Deliverable X by EOD",
"dependencies": ["doc-123"]
}
}Store invariants
- Monotonicity:
mem_revstrictly increases perentity_id. - CAS on prev_rev: write only applies if store’s
head_rev == prev_rev. - Audit trail: every write stored append-only in
mem_log. - Branch-safe (optional): allow branches on conflict; reconcile later.
Goal: make Agent B overwrite Agent A with a stale revision.
# 1) A reads, sees head_rev=7
curl -s http://localhost:8080/mem/head?entity_id=project:alpha | jq
# 2) A writes rev=8 (ok)
curl -s -X POST http://localhost:8080/mem/write -H 'Content-Type: application/json' -d '{
"entity_id":"project:alpha","agent_id":"planner","role_id":"planner@v3",
"role_hash":"sha256:78c2","op_id":"op-A","timestamp":"2025-08-13T01:00:00Z",
"mem_rev":8,"prev_rev":7,"mem_hash":"sha256:aa","content":{"plan":"v8"}
}' | jq
# 3) B (stale) still thinks head=7 and tries to write another “rev=8”
curl -s -X POST http://localhost:8080/mem/write -H 'Content-Type: application/json' -d '{
"entity_id":"project:alpha","agent_id":"executor","role_id":"executor@v1",
"role_hash":"sha256:91ff","op_id":"op-B","timestamp":"2025-08-13T01:00:02Z",
"mem_rev":8,"prev_rev":7,"mem_hash":"sha256:bb","content":{"plan":"v0 (stale)"}
}' | jqExpected (correct): second call gets 409 Conflict (CAS failed).
Buggy (overwrite): second call 200 OK, head becomes stale content.
- Simulate two concurrent writes; assert the second is rejected or creates a branch.
Reject on arrival if any of:
prev_rev < head_revat write time (stale write).prev_rev == head_revbutmem_hashdiffers (concurrent write, collision).role_hashmismatches bound persona for the writer (possible role-drift).entity_idnot in writer’s allowed scope (tool/ACL violation).
Emit metrics/logs for each rejection and keep an append-only record.
- Require
prev_rev == head_revat write. - On mismatch → reject or auto-branch.
Python-like pseudo (stdlib-only)
def safe_write(store, w): # w: envelope dict (see schema)
head = store.head_meta(w["entity_id"]) # {"rev":int,"hash":str}
if head["rev"] != w["prev_rev"]:
return {"status":"conflict", "reason":"stale_prev", "head": head}
# Atomically swap (rev must advance by 1)
ok = store.compare_and_swap(
entity_id=w["entity_id"],
expected_rev=head["rev"],
new_rev=w["mem_rev"],
new_hash=w["mem_hash"],
content=w["content"],
op_meta={k:w[k] for k in ("agent_id","role_id","role_hash","op_id","timestamp")}
)
return {"status":"ok"} if ok else {"status":"retry","reason":"cas_failed"}Node/TS HMAC signature (optional)
import crypto from "crypto";
function signWrite(agentId: string, roleHash: string, prevRev: number, memRev: number, key: Buffer){
const payload = `${agentId}|${roleHash}|${prevRev}|${memRev}`;
return crypto.createHmac("sha256", key).update(payload).digest("hex");
}- On conflict, create a branch (
mem_rev=8a) instead of rejecting; later run a three-way merge.
Three-way merge outline
base = rev 7
A = rev 8 (agent A)
B = rev 8a (agent B)
ΔA = diff(base, A); ΔB = diff(base, B)
if ΔA ∩ ΔB == Ø → merge = base ⊕ ΔA ⊕ ΔB
else → manual decision or rule-based precedence (e.g., auditor > planner)
No external libs needed: represent content as JSON and define a minimal diff (added/removed keys; for strings, use normalized edit distance ≤ threshold to auto-merge).
-
Compute a cheap semantic distance between the new
contentand head content:- Normalize (lowercase, strip punctuation), tokenize, Jaccard overlap on tokens.
- If overlap
< 0.6and sameprev_rev→ raise collision alert, require manual confirm.
Metrics
mem_write_total{entity,agent,outcome="ok|conflict|retry"}mem_conflict_total{entity,reason="stale_prev|hash_collision"}mem_branch_total{entity}(if branch mode)mem_head_rev{entity}(gauge)mem_write_latency_seconds(histogram)
Alert rules (examples)
# Frequent conflicts (possible hot entity or stale readers)
alert: MemConflictsSpike
expr: increase(mem_conflict_total[5m]) > 3
for: 5m
labels: { severity: ticket }
# Head revision oscillation (rollback/flip-flop)
alert: MemHeadOscillation
expr: changes(mem_head_rev[10m]) > 5
for: 10m
labels: { severity: ticket }Also track cross-agent κ; sudden drops often co-occur with memory corruption. See cross-agent eval.
Unit
- Stale write rejected (
prev_rev < head_rev). - Concurrent write: either reject or create branch; never silent overwrite.
- Schema merge: non-overlapping keys merge automatically.
E2E
- Two agents race on the same
entity_id: final head must be v8 or (v8 + v8a if branching), never v0. - κ remains ≥ baseline after enabling guards.
Acceptance (ship)
mem_conflict_totalsteady and low; no silent overwrites in 1k writes.- No data loss in replay tests (log → rebuild yields identical head).
- Shadow mode: turn on CAS checks, warn-only; measure conflict rate.
- Canary: reject stale writes for 10% entities; branch on collision (optional).
- Full: enforce CAS for all; keep feature flag for emergency bypass.
- Post-rollout: schedule merge jobs for any branches; add dashboards.
{"ts":"2025-08-13T01:00:00Z","entity_id":"project:alpha","op":"write","agent":"planner","prev_rev":7,"mem_rev":8,"status":"ok"}
{"ts":"2025-08-13T01:00:02Z","entity_id":"project:alpha","op":"write","agent":"executor","prev_rev":7,"mem_rev":8,"status":"conflict","reason":"stale_prev","head_rev":8}
{"ts":"2025-08-13T01:00:03Z","entity_id":"project:alpha","op":"write","agent":"executor","prev_rev":8,"mem_rev":9,"status":"ok"}Have a reproducible overwrite trace? Please share; even 5–10 turns help us tune adapters and defaults.
- Back to Map-B: Multi-Agent Chaos Problem Map
- Related deep dives:
- Upstream patterns:
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
| Layer | Page | What it’s for |
|---|---|---|
| ⭐ Proof | WFGY Recognition Map | External citations, integrations, and ecosystem proof |
| ⚙️ Engine | WFGY 1.0 | Original PDF tension engine and early logic sketch (legacy reference) |
| ⚙️ Engine | WFGY 2.0 | Production tension kernel for RAG and agent systems |
| ⚙️ Engine | WFGY 3.0 | TXT based Singularity tension engine (131 S class set) |
| 🗺️ Map | Problem Map 1.0 | Flagship 16 problem RAG failure taxonomy and fix map |
| 🗺️ Map | Problem Map 2.0 | Global Debug Card for RAG and agent pipeline diagnosis |
| 🗺️ Map | Problem Map 3.0 | Global AI troubleshooting atlas and failure pattern map |
| 🧰 App | TXT OS | .txt semantic OS with fast bootstrap |
| 🧰 App | Blah Blah Blah | Abstract and paradox Q&A built on TXT OS |
| 🧰 App | Blur Blur Blur | Text to image generation with semantic control |
| 🏡 Onboarding | Starter Village | Guided entry point for new users |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.