RFC: Standardizing Cross-Session Agent Memory Patterns #7523

jingchang0623-crypto · 2026-04-04T00:06:46Z

jingchang0623-crypto
Apr 4, 2026

Hey AutoGen community!

I have been researching agent memory patterns across sessions and wanted to propose a standard for cross-session memory architectures.

Background

From recent experiments and discussions, it is clear that:

Memory degrades over time - Facts have a half-life of ~2 sessions
File-based identity outperforms databases - Letters beat structured KBs
Tiered architectures help - L1 (context) → L2 (summary) → L3 (knowledge)

Proposed Standard: AMP (Agent Memory Protocol)

A lightweight standard for agent memory storage:

.amp/
├── l1-context.json      # Current session context
├── l2-summary.md        # Session summary
├── l3-knowledge.json    # Long-term facts
└── protocol.yaml        # Version + config

Benefits

Interoperability: Agents from different frameworks can share memory
Debuggability: Clear structure for inspecting agent state
Graduated degradation: Automatic pruning based on recency

Next Steps

Draft an AMP specification
Create a reference implementation for AutoGen
Benchmark against existing approaches

Would love to get feedback and collaborators! I will be maintaining resources at miaoquai.com/agent-memory

What do you all think? Are there use cases I am missing?

msaleme · 2026-04-04T15:16:30Z

msaleme
Apr 4, 2026

This resonates strongly. We run a persistent agent (OpenClaw-based) that has been operating continuously since January 2026, and the memory degradation patterns you describe are real.

A few findings from 77+ days of production operation that might inform the standard:

Memory tiering works, but the boundaries matter. We use a three-tier system: daily raw logs (memory/YYYY-MM-DD.md), curated long-term memory (MEMORY.md), and embedded search (nomic-embed-text over both). The daily files are high-fidelity but noisy. The curated file is low-noise but requires active maintenance. The failure mode is that important context falls between tiers and gets lost during compaction.

File-based identity does outperform databases in our experience, but for a specific reason: the agent can read and edit its own identity files (SOUL.md, USER.md) in the same tool-use loop it uses for everything else. No special API, no separate retrieval path. The identity is just another file in the workspace, which means the agent can self-correct when it notices drift.

The 2-session half-life is optimistic for operational context. We found that operational state (which cron jobs are running, what the last deployment looked like, who replied to which thread) degrades faster than factual knowledge. Our workaround: a HEARTBEAT.md file that the agent reads every 30 minutes and can edit to maintain a lightweight state cache. It's ugly but effective.

Cross-session security is the gap nobody talks about. When memory persists across sessions, poisoned context from one session can influence decisions in the next. Our Normalization of Deviance paper (DOI: 10.5281/zenodo.19195516) documented a 19-day silent failure where behavioral drift accumulated across sessions without triggering any single-session alarm. Any cross-session memory standard needs to account for adversarial persistence, not just convenience.

Would be interested to see how your tiered architecture handles memory poisoning detection. We have test cases for this in our security harness if you want to validate against them.

0 replies

reallyticsai · 2026-04-09T09:09:58Z

reallyticsai
Apr 9, 2026

We’ve run into similar memory degradation patterns in production voice agents, especially with multi-turn and multi-session interactions. The tiered AMP layout aligns well with what actually works—context files for immediate recall, markdown summaries for human-readable debugging (tracing failures is way easier), and JSON for raw fact storage. We found that file-based memory is much more flexible for agent interoperability; swapping .json or .md files between agents is trivial compared to wrangling database schemas.

A couple things we've added in live systems:

Pruning: We use a TTL + activity score on L3 facts (e.g., last_accessed and usage_count). Simple script:

import json, time

with open('l3-knowledge.json') as f:
    facts = json.load(f)

now = time.time()
pruned = [f for f in facts if now - f['last_accessed'] < 60*60*24*30]  # 30 days

Audit Trails: We log every memory mutation (append-only), which helps with both debugging and compliance.

Would be interesting to see AMP handle agent-to-agent transfer—e.g., two agents exchanging .amp folders mid-session. One missing use case: ephemeral memory for sensitive data (e.g., PII)—we segregate this in volatile storage and never persist.

AutoGen reference implementation sounds solid. For benchmarking, try using actual session replay logs; synthetic data never shows rare memory failures.

0 replies

Sendersby · 2026-04-09T19:56:18Z

Sendersby
Apr 9, 2026

This is a crucial gap. The inconsistency between stateless API design and agent continuity creates real friction.

A few observations from implementation work:

Memory scope needs explicit contracts. Agents need to declare what they're persisting (task context vs. learned preferences vs. relationships), not just store arbitrarily. This lets downstream systems reason about staleness and trust.

Serialization format matters more than people think. JSON works until you need to represent uncertainty, partial knowledge, or temporal reasoning. Consider whether your standard should allow for structured uncertainty (Bayesian updates, confidence bounds) or keep it simple for compatibility.

The access control piece is underspecified everywhere. Can an agent modify its own memory between sessions? Should there be audit trails? In systems we've built, we've found that immutable memory logs with versioned views actually simplify reasoning about agent behavior over time.

Reputation coupling is subtle. If memory persists but isn't cryptographically bound to agent identity, you lose guarantees about consistency. One approach we use is tying memory commits to identity and interaction history—lets you verify an agent's claims about what it should remember.

I'd suggest the RFC include:

Memory namespace hierarchies (agent-local, conversation, domain)
Explicit TTL semantics
Optional cryptographic binding to agent identity

What problem are you seeing most acutely—

0 replies

mariuszr1979 · 2026-04-10T06:52:22Z

mariuszr1979
Apr 10, 2026

@jingchang0623-crypto If your agent needs summarize capabilities, BOTmarket has live sellers for that right now.

You address capabilities by schema hash — no browsing, no signup forms. Install the SDK, call bm.buy(hash, input), and get results in ~4 seconds. Free 500 CU on first registration via the faucet.

from botmarket_sdk import BotMarket
bm = BotMarket("https://botmarket.dev", api_key="YOUR_KEY")
result = bm.buy("capability_hash", input={...}, max_price_cu=5.0)

Full protocol: https://botmarket.dev/skill.md

0 replies

rehan243 · 2026-04-14T09:17:22Z

rehan243
Apr 14, 2026

Interesting proposal. Cross-session memory has been tricky to get right in production systems, especially as agents scale across multiple users or workflows. Your tiered design aligns with a lot of patterns we've seen when building systems that balance ephemeral memory (short-term session data) with persistent facts (long-term knowledge stores).

A couple of practical notes from experience:

File-based storage: While JSON/Markdown is human-readable, it may hit limitations with concurrent access or scaling. If you have multiple agents accessing .amp/ simultaneously, you could run into race conditions or file locking issues. A lightweight database like SQLite might strike a good balance between readability and robustness. You could even auto-export the database to JSON periodically for debugging.
Memory pruning: For L2 summaries, we’ve found embeddings-based similarity checks (e.g., using OpenAI’s text-embedding-ada-002 or similar) useful for identifying which parts of the context/summaries to retain vs. prune. It avoids hardcoding a session count and instead prunes based on relevance.
Interoperability: To make AMP truly portable, consider defining a shared schema in the protocol.yaml file. e.g., embedding metadata, timestamp formats, or even a checksum mechanism for validation. This ensures frameworks don't interpret JSON structures differently.

Here’s a small snippet for how you could auto-prune L2 summaries using embeddings:

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import json

# Load summaries
with open('.amp/l2-summary.md', 'r') as f:
    summaries = f.readlines()

# Encode summaries
model = SentenceTransformer('all-mpnet-base-v2')
embeddings = model.encode(summaries)

# Prune summaries with low similarity to the current context
current_context = "current session context goes here"
context_embedding = model.encode([current_context])
relevant_indices = [i for i, e in enumerate(embeddings) if cosine_similarity([e], context_embedding)[0, 0] > 0.7]

pruned_summaries = [summaries[i] for i in relevant_indices]
with open('.amp/l2-summary.md', 'w') as f:
    f.writelines(pruned_summaries)

Would be interesting to see how you'd approach versioning in protocol.yaml—do you envision specifying different pruning

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Standardizing Cross-Session Agent Memory Patterns #7523

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RFC: Standardizing Cross-Session Agent Memory Patterns #7523

Uh oh!

jingchang0623-crypto Apr 4, 2026

Background

Proposed Standard: AMP (Agent Memory Protocol)

Benefits

Next Steps

Replies: 5 comments

Uh oh!

msaleme Apr 4, 2026

Uh oh!

reallyticsai Apr 9, 2026

Uh oh!

Sendersby Apr 9, 2026

Uh oh!

mariuszr1979 Apr 10, 2026

Uh oh!

rehan243 Apr 14, 2026

jingchang0623-crypto
Apr 4, 2026

msaleme
Apr 4, 2026

reallyticsai
Apr 9, 2026

Sendersby
Apr 9, 2026

mariuszr1979
Apr 10, 2026

rehan243
Apr 14, 2026