You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @thedotmack — disclosure first: I'm the author of Bella, an open-source belief hypergraph memory layer for Claude Code. I've been following claude-mem with interest — our projects are different architectures aimed at the same problem, and there's enough convergence in the memory-for-agents community now that I wanted to share some notes and ask some questions. Posting in Discussions because the RFC issue I wanted to engage with (#2014) is locked.
I read the Thompson Sampling for observation-quality RFC. The architecture — observation type × model as bandit arms with retrieval feedback as the reward signal — is genuinely clever, and "accessed and referenced within 7 days" is a sharper reward metric than what I've seen in most memory-for-agent projects.
Something that might compose well with your design:
The same reward signal that drives your bandit arms could also drive per-observation confidence updates, not just per-(type × model) arm updates.
In Bella I use Jaynes log-odds accumulation for belief mass: a claim gains mass when retrieved and used downstream, loses mass when unused over time, with voice-independence attenuation — three independent sessions confirming a belief weigh ~10x more than one session repeating it three times. The math is the same Beta distribution update you'd use for your bandit, just applied at a different level of the memory stack.
Two specific things this might offer:
1. Voice-independence on the reward signal. Your current design treats every observation access identically. But one user retrieving the same observation five times in a month is a weaker signal than five different users retrieving it once each — the first is correlation, the second is independent confirmation. Easy to layer on: cluster reward events by session/user and attenuate same-source repeats (Bella uses ~0.1x; your right number probably depends on your user-session boundary definitions).
2. Observation-level decay, not just model-level optimization. Thompson Sampling on (obs_type, model) arms is great for routing future observations, but it doesn't change the confidence of already-stored ones. A complementary Bayesian update over observations themselves — alpha += 1 when accessed, beta += 1 when unused past 30 days, sampled at retrieval time — would give get_observations a sharper posterior-weighted ranking. The math is the same as your bandit (Beta distribution update); just apply it at the observation level in addition to the arm level.
Both additions would be compatible with your current schema — add alpha/beta/updated_at_epoch columns to the observations table and run the same update semantics as bandit_arms. The result is that your most-referenced observations float to the top of get_observations by posterior confidence rather than by recency/similarity alone.
The nice symmetry: your bandit approach and Jaynes log-odds accumulation are solving sibling problems with nearly-isomorphic math — both are Bayesian belief updates, just at different levels of the memory stack. Thompson on arms + Jaynes on observations composes surprisingly cleanly if you want to unify them.
Three questions I'm genuinely curious about:
In your tier routing + bandit design, do you have a plan for how observation confidence propagates to retrieval ranking, or is retrieval still purely similarity-based via Chroma? I can imagine either answer working, but I haven't seen it specified in the RFC.
How do you handle the cold-start problem for new observation types with no feedback history? Bella uses prior mass seeded from initial-extraction confidence (the LLM's own plausibility signal), but that's specific to extraction-time LLM usage.
Do you distinguish between "observation was accessed" and "observation was actually used in the downstream response"? The RFC mentions both as reward signals — curious whether your signal collection has the granularity to weight them differently.
On a related note:
Both of our projects would benefit enormously from Claude Code shipping a PreCompact lifecycle hook. Right now claude-mem uses PostToolUse/Stop/SessionEnd workarounds, and Bella intercepts transcripts post-hoc — both of us are trying to capture state before /compact runs its lossy summarization, without a clean upstream primitive.
I filed a consolidating proposal at anthropics/claude-code#47023 covering four hooks: PreCompact, PostCompact, SessionStart, SessionEnd. If claude-mem adds a comment there — even just confirming you'd use PreCompact with a production workload description — it carries much more weight with Anthropic than my solo proposal does. The goal isn't Bella-specific; it's the whole external-memory ecosystem getting cleaner integration hooks.
Happy to go deeper on any of this. The memory-for-Claude-Code space is getting interesting — there's more to gain from comparing notes across projects than from competing. Nice work on the architecture here.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @thedotmack — disclosure first: I'm the author of Bella, an open-source belief hypergraph memory layer for Claude Code. I've been following claude-mem with interest — our projects are different architectures aimed at the same problem, and there's enough convergence in the memory-for-agents community now that I wanted to share some notes and ask some questions. Posting in Discussions because the RFC issue I wanted to engage with (#2014) is locked.
I read the Thompson Sampling for observation-quality RFC. The architecture — observation type × model as bandit arms with retrieval feedback as the reward signal — is genuinely clever, and "accessed and referenced within 7 days" is a sharper reward metric than what I've seen in most memory-for-agent projects.
Something that might compose well with your design:
The same reward signal that drives your bandit arms could also drive per-observation confidence updates, not just per-(type × model) arm updates.
In Bella I use Jaynes log-odds accumulation for belief mass: a claim gains mass when retrieved and used downstream, loses mass when unused over time, with voice-independence attenuation — three independent sessions confirming a belief weigh ~10x more than one session repeating it three times. The math is the same Beta distribution update you'd use for your bandit, just applied at a different level of the memory stack.
Two specific things this might offer:
1. Voice-independence on the reward signal. Your current design treats every observation access identically. But one user retrieving the same observation five times in a month is a weaker signal than five different users retrieving it once each — the first is correlation, the second is independent confirmation. Easy to layer on: cluster reward events by session/user and attenuate same-source repeats (Bella uses ~0.1x; your right number probably depends on your user-session boundary definitions).
2. Observation-level decay, not just model-level optimization. Thompson Sampling on (obs_type, model) arms is great for routing future observations, but it doesn't change the confidence of already-stored ones. A complementary Bayesian update over observations themselves — alpha += 1 when accessed, beta += 1 when unused past 30 days, sampled at retrieval time — would give get_observations a sharper posterior-weighted ranking. The math is the same as your bandit (Beta distribution update); just apply it at the observation level in addition to the arm level.
Both additions would be compatible with your current schema — add alpha/beta/updated_at_epoch columns to the observations table and run the same update semantics as bandit_arms. The result is that your most-referenced observations float to the top of get_observations by posterior confidence rather than by recency/similarity alone.
The nice symmetry: your bandit approach and Jaynes log-odds accumulation are solving sibling problems with nearly-isomorphic math — both are Bayesian belief updates, just at different levels of the memory stack. Thompson on arms + Jaynes on observations composes surprisingly cleanly if you want to unify them.
Three questions I'm genuinely curious about:
In your tier routing + bandit design, do you have a plan for how observation confidence propagates to retrieval ranking, or is retrieval still purely similarity-based via Chroma? I can imagine either answer working, but I haven't seen it specified in the RFC.
How do you handle the cold-start problem for new observation types with no feedback history? Bella uses prior mass seeded from initial-extraction confidence (the LLM's own plausibility signal), but that's specific to extraction-time LLM usage.
Do you distinguish between "observation was accessed" and "observation was actually used in the downstream response"? The RFC mentions both as reward signals — curious whether your signal collection has the granularity to weight them differently.
On a related note:
Both of our projects would benefit enormously from Claude Code shipping a PreCompact lifecycle hook. Right now claude-mem uses PostToolUse/Stop/SessionEnd workarounds, and Bella intercepts transcripts post-hoc — both of us are trying to capture state before /compact runs its lossy summarization, without a clean upstream primitive.
I filed a consolidating proposal at anthropics/claude-code#47023 covering four hooks: PreCompact, PostCompact, SessionStart, SessionEnd. If claude-mem adds a comment there — even just confirming you'd use PreCompact with a production workload description — it carries much more weight with Anthropic than my solo proposal does. The goal isn't Bella-specific; it's the whole external-memory ecosystem getting cleaner integration hooks.
Happy to go deeper on any of this. The memory-for-Claude-Code space is getting interesting — there's more to gain from comparing notes across projects than from competing. Nice work on the architecture here.
Beta Was this translation helpful? Give feedback.
All reactions