Releases: mnemos-os/mnemos
v5.0.1
v5.0.0
MNEMOS v5.0.0 — closes v3.6/v4.x charters; full DAG + dream-state pip…
v4.2.0a1
v4.2.0a1: NATS JetStream substrate (alpha)
v4.1.3
v4.1.3: edge profile through persistence backend + corpus hardening
MNEMOS v4.0.0 — A memory operating system for serious agentic work
MNEMOS v4.0.0
A memory operating system for serious agentic work. Apache 2.0. In daily production use since December 2025.
This is not a memory storage provider. It is the operating system layer that owns the full lifecycle of agent memory — write, embed, compress, version, reason-over, audit, federate, archive. Each one is a real subsystem with its own database tables, background workers, and failure modes. You can use as much or as little of that as you want; if all you need is POST /v1/memories, you're done.
What it is
A FastAPI service backed by PostgreSQL + pgvector (or SQLite + sqlite-vec, your choice). Run it next to your applications the way you'd run Redis or a message bus: deploy once, every agent in your stack shares the same memory substrate.
Python 3.11+, Apache-2.0, single-licensed open source. Talks to your agents three ways — pick whichever you already have:
MCP (Model Context Protocol) stdio + HTTP/SSE servers expose memory as
first-class tool calls. Drop into Claude Code,
OpenClaw, ZeroClaw, Hermes, anything MCP-aware.
OpenAI-compatible gateway POST /v1/chat/completions and GET /v1/models
are drop-in for the OpenAI SDK. Point
OPENAI_BASE_URL at MNEMOS and any client that
speaks OpenAI gets memory injection plus
multi-provider routing for free.
Native /v1/* REST For applications that want to talk to MNEMOS
directly. Memories, consultations, providers,
sessions, webhooks, federation, knowledge
graph triples, MPF import/export, admin.
Core capabilities
Memory store with real lifecycle
Memories carry (owner_id, namespace) as a two-dimensional tenancy key. Read paths share a single visibility predicate covering owner-private, world-readable, group-readable, and federation-pulled rows. Mutation paths stay strictly owner-scoped. Cross-namespace access is rejected by default; only root callers can cross over.
Every memory mutation snapshots into a versioned history with content-addressed commits. You get log, branch, merge, revert, checkout, and diff — git-shaped DAG operations on memory itself. History reads filter by the snapshot's own tenancy at the time of write, not the live row's, so a memory that was created private and later relaxed to public doesn't leak its old private content to new readers.
Deletes leave an explicit tombstone in a deletion log so the audit chain stays continuous and federation peers converge.
Knowledge graph
First-class subject-predicate-object triples with temporal validity windows. Search across the graph, walk timelines, attach triples to memories. Owner + namespace scoping mirrors the memory tenancy model.
Multi-LLM consensus reasoning
/v1/consultations distributes one prompt across multiple providers (OpenAI, Anthropic, Google Gemini, Together, Groq, Perplexity, plus local models) and writes a SHA-256 hash-chained audit log on every decision. Seven modes:
auto— engine picks based on task typelocal— force local-only providers (no commercial APIs)external— force external commercial providersall— fan out to every available providersingle— pick exactly one provider, no fan-out (cheapest)debate— multi-turn cross-provider argument (high-stakes calls)majority— N≥3 providers, return only on quorum agreement
Cost and latency are visible per-consultation. Audit-log endpoints scope per-caller for non-root users.
Tiered compression
Memory content can be compressed using a competitive contest pipeline — multiple engines produce candidates, a judge selects winners, and a complete contest audit trail is persisted. Compressed variants surface real ratios via memory rehydration and stats endpoints.
Compression is operator-batched, not automatic. The control plane is /v1/admin/compression/enqueue and /v1/admin/compression/enqueue-all. This is deliberate: compression is expensive (LLM-judged contests, often GPU-backed). Run it when you choose to.
Dream-state generator
A background worker periodically reviews accumulated memories during quiet windows and produces synthesized higher-level summaries — the "dream" pattern. Generated content is owned by the originating user and namespace and is searchable like any other memory.
Recall tracking
Every search hit increments recall_count and updates last_recalled_at on the memory row. You can see which memories are pulling weight and which ones have gone stale.
Federation between MNEMOS instances
Pull-based memory exchange between peers. Each peer pulls a federated feed cursored on (updated, id) so same-timestamp rows don't drop across pages. Per-peer ACL on what gets shared. A schema-compat preflight check prevents cross-version corruption when peers run different minor versions.
Memory portability
A complete MNEMOS instance round-trips through a portable export/import envelope (MPF) covering memories, knowledge graph triples, sessions, the consultation audit chain, and federation cursors. Use it to migrate between hosts, clone a dev instance from prod, or back up to cold storage.
Webhooks with delivery guarantees
HMAC-signed, SSRF-protected event delivery for memory.created, memory.updated, memory.deleted, and other lifecycle events. The dispatch is transactionally coupled to the source mutation — the webhook event row is inserted in the same database transaction as the memory write, so a process crash between commit and dispatch can't lose events. Retries use a leased state machine with structural enforcement of "one terminal succeeded row per chain" — no duplicates.
Auth and isolation
Bearer API keys for service-to-service. OAuth 2.0 / OIDC for browser-driven access. PostgreSQL Row Level Security policies enforce per-row tenancy at the database layer for team and enterprise installs, mirroring the application-layer predicates one-to-one.
Three deployment profiles
Pick one. Pivot when your needs change.
| Profile | Backend | Workers | Use case |
|---|---|---|---|
server |
Postgres + pgvector + Redis | 1 → N | Production multi-tenant cluster; full feature set including federation, RLS, advisory locks, LISTEN/NOTIFY |
edge |
SQLite + sqlite-vec | 1 | Pi, laptop, Termux on Android — single-user appliance, single-file install |
dev |
SQLite + sqlite-vec | 1 | Local development with DEBUG logging |
Set with MNEMOS_PROFILE=edge or mnemos serve --profile edge. Profile sets defaults for every Settings group; explicit env vars still override.
Multi-worker scaling (server profile)
For production throughput, configure RATE_LIMIT_STORAGE_URI=redis://... and set MNEMOS_WORKERS=N. MNEMOS uses Redis-backed circuit breaker / rate limiter / concurrency limiter so per-process state coordinates across workers. In-process fallback preserved with startup WARNING for single-worker installs that don't run Redis.
See docs/SCALING.md for the Kubernetes + Redis playbook.
High availability
For single-site HA, use PostgreSQL streaming replication: one writable primary, read-only standbys, WAL shipping, stable writer endpoint. Federation stays first-class but is reserved for genuinely-remote scenarios — multi-site deployments, multi-org curated feeds, developer-laptop replicas with intermittent connectivity. Replication and federation solve different problems.
See docs/STREAMING_REPLICATION.md.
Distribution
# pip
pip install mnemos-os==4.0.0
# Docker (or Podman, or any OCI runtime)
docker pull ghcr.io/mnemos-os/mnemos:4.0.0
# Single binary, no Python install required
curl -L https://github.com/mnemos-os/mnemos/releases/download/v4.0.0/mnemos-linux-x86_64 -o mnemos
chmod +x mnemos
./mnemos install --profile edge
./mnemos serve --profile edgeSingle-binary builds ship for linux-x86_64, linux-aarch64 (Pi 5, 64-bit Pi 4, aarch64 phones), and macos-aarch64 (Apple Silicon). Each binary bundles the Python interpreter + sqlite-vec extension + the migration chain. Drop on a target host and run.
Designed to interoperate
MNEMOS is the memory layer for the agentic tooling you already use, not a replacement for it. Today's integrations:
- Claude Code — drop-in hooks plus MCP server
- OpenClaw — AGENTS.md skill plus MCP registration
- ZeroClaw — memory skill over MCP, no Python dependency added to the Rust runtime
- Hermes Agent — optional persistence backend for team / multi-tenant deployments
- MemPalace — graduation path: portability schema lets a MemPalace user who grows into a team preserve their drawers and palaces
- Mem0 / Letta / Zep — bulk consolidation via
POST /v1/memories/bulk - LangChain / LlamaIndex / CrewAI / AutoGen — works today via the OpenAI-compatible gateway
Operational surface
Native Prometheus metrics, OpenTelemetry tracing, structured logging, request ID correlation. Health and readiness endpoints. Migration chain captured in versioned SQL files plus a postgres-upgrade compose service for existing-volume installs. SQLite installs run their own migration chain at first start.
Architectural enforcement at CI: seven import-linter contracts gate the build (layered architecture, no route → db direct imports, domain modules independent siblings, MCP doesn't call route handlers, webhooks self-contained vs domain, core has no upward deps, persistence has no upward deps). Pydantic Settings singleton replaces ad-hoc environment-variable reading; CI bans os.environ outside core.config and installer/.
Known limitations in v4.0.0
- **Edge...
MNEMOS v3.5.1 — v3.5 GA
MNEMOS v3.5.1
A memory operating system for serious agentic work. Apache 2.0. In daily production use since December 2025.
This is the v3.5 GA release. If you've never run MNEMOS — or you tried it once and moved on — these are the capabilities you get out of the box today.
What it is
A FastAPI service backed by PostgreSQL + pgvector. You run it next to your applications the way you'd run Redis or a message bus: deploy once, every agent in your stack shares the same memory substrate.
The pitch is in the name. MNEMOS is not a vector database with helpers bolted on; it is the operating system layer that owns the full lifecycle of agent memory — write, embed, compress, version, reason-over, audit, federate, archive. Each of those is a real subsystem with its own tables, workers, and failure modes, not a marketing word.
You can use as much or as little of that as you want. If all you need is POST /v1/memories, you're done. The OS-shape is there if and when your application grows into it.
Three ways to talk to it
Pick one. Mix as needed.
MCP (Model Context Protocol)
A stdio + HTTP/SSE MCP server exposes memory as first-class tool calls — search_memories, create_memory, update_memory, delete_memory, bulk_create_memories, kg_create_triple, kg_search, kg_timeline, log_memory, branch_memory, diff_memory_commits, checkout_memory, recommend_model, and more.
Drop into Claude Code, OpenClaw, ZeroClaw, Hermes, or anything else MCP-aware. Per-user bearer auth via MNEMOS_MCP_TOKENS=user:api_key so multi-tenant clients aren't collapsed onto a single backend identity.
OpenAI-compatible gateway
POST /v1/chat/completions and GET /v1/models are drop-in for the OpenAI SDK. Point OPENAI_BASE_URL at your MNEMOS instance and any client that already speaks OpenAI gets memory injection plus multi-provider routing for free.
LangChain, LlamaIndex, CrewAI, AutoGen, and anything else written against the OpenAI wire protocol works without modification. Generation parameters (temperature, max_tokens, top_p) propagate through to providers. Server-Sent Events streaming is native. Tools, tool_choice, response_format, multimodal content blocks, stop sequences, frequency/presence penalties — passed through to the selected provider where supported, with explicit 4xx errors where not. Unknown models return 404 with the real OpenAI error envelope.
Native REST
For applications that want to talk to MNEMOS directly: /v1/memories, /v1/consultations, /v1/providers, /v1/sessions, /v1/webhooks, /v1/federation, /v1/kg/triples, /v1/import, /v1/export, /v1/admin/*. Language-agnostic; pick your HTTP client and go.
What you get
A memory store with real lifecycle
Memories carry (owner_id, namespace) as a two-dimensional tenancy key. Owner is the API-key principal; namespace is a sub-tenant within that owner. Read paths share a single visibility predicate covering owner-private, world-readable, group-readable, and federation-pulled rows. Mutation paths stay strictly owner-scoped. Cross-namespace access is rejected by default; only root callers can cross over.
Permission modes use Unix-style read bits — owner / group / world — so the same row can be private to one team but readable by an audit role.
Every memory mutation snapshots into a versioned history with content-addressed commits. You get log, branch, merge, revert, checkout, and diff — git-shaped DAG operations on memory itself. History reads filter by the snapshot's own tenancy at the time of write, not the live row's, so a memory that was created private and later relaxed to public doesn't leak its old private content to new readers.
Deletes leave an explicit tombstone in a deletion log so the audit chain stays continuous and federation peers converge.
A knowledge graph
First-class subject-predicate-object triples with temporal validity windows. Search across the graph, walk timelines, attach triples to memories. Owner + namespace scoping mirrors the memory tenancy model.
Multi-LLM consensus reasoning
/v1/consultations distributes one prompt across multiple providers (OpenAI, Anthropic, Google Gemini, Together, Groq, Perplexity, plus local models) and writes a SHA-256 hash-chained audit log on every decision. Modes: single, consensus, debate, auto. Cost and latency are visible per-consultation. Audit-log endpoints scope per-caller for non-root users.
The consensus subsystem is called GRAEAE. It's a separate code path from the OpenAI-compatible gateway, but the two share the same provider registry, so a model added once is available to both surfaces.
Tiered compression
Memory content can be compressed using a competitive contest pipeline — multiple engines produce candidates, a judge selects winners, and a complete contest audit trail is persisted. Compressed variants surface real ratios via memory rehydration and stats endpoints.
Compression is operator-batched, not automatic. The control plane is /v1/admin/compression/enqueue and /v1/admin/compression/enqueue-all. This is deliberate: compression is expensive (LLM-judged contests, often GPU-backed), and pervasive auto-compression on every write would saturate the GPU pool while increasing cost-per-memory unboundedly. Run it when you choose to.
A dream-state generator
A background worker periodically reviews accumulated memories during quiet windows and produces synthesized higher-level summaries — the "dream" pattern. Generated content is owned by the originating user and namespace and is searchable like any other memory.
Recall tracking
Every search hit increments recall_count and updates last_recalled_at on the memory row. You can see which memories are pulling weight and which ones have gone stale.
Memory portability
A complete MNEMOS instance round-trips through a portable export/import envelope covering memories, knowledge graph triples, sessions, the consultation audit chain, and federation cursors. Use it to migrate between hosts, clone a dev instance from prod, or back up to cold storage.
Webhooks with delivery guarantees
HMAC-signed, SSRF-protected event delivery for memory.created, memory.updated, memory.deleted, and other lifecycle events. The dispatch is transactionally coupled to the source mutation — the webhook event row is inserted in the same database transaction as the memory write, so a process crash between commit and dispatch can't lose events. Retries use a leased state machine with structural enforcement of "one terminal succeeded row per chain" — no duplicates.
Auth and isolation
Bearer API keys for service-to-service. OAuth 2.0 / OIDC for browser-driven access. PostgreSQL Row Level Security policies enforce per-row tenancy at the database layer for team and enterprise installs, mirroring the application-layer predicates one-to-one.
Operational surface
Native Prometheus metrics, OpenTelemetry tracing, structured logging, request ID correlation. Health and readiness endpoints. The migration chain is captured in versioned SQL files plus a postgres-upgrade compose service that handles existing-volume installs.
Deployment topologies
MNEMOS supports several deployment shapes. Pick the one that matches your reliability and reach requirements; you can always grow into a richer topology later.
Single instance
One MNEMOS process, one PostgreSQL. The simplest shape and the right default for evaluation, single-team use, and personal installs. Backed up via the portable export/import envelope or standard pg_dump.
Single-site high availability — PostgreSQL streaming replication
One writable primary, one or more read-only standbys, WAL shipping, a stable writer endpoint. This is the canonical HA story for v3.5: when the primary dies, an existing standby is promoted; clients reconnect to the same logical writer endpoint. The MNEMOS process itself stays the only writer at any given moment — this is by design and matches the single-worker doctrine.
Use streaming replication when you need durability and read-side fanout at one site and want PostgreSQL's well-understood failover discipline. See docs/STREAMING_REPLICATION.md for the runbook.
Multi-site federation between MNEMOS instances
Each site runs its own independent MNEMOS instance. Peers pull each other's curated memory feeds over HTTPS — pull-based, cursored on (updated, id) so same-timestamp rows don't drop, with a schema-compat preflight check that prevents cross-version corruption when peers run different minor versions. Per-peer ACL on what gets shared.
Use federation when you have genuinely separate sites or organizations that should each own their write path: different geographic regions, different legal entities, different security boundaries with curated cross-flow. Federation is not a replacement for HA; replication is for one site, federation is for many.
Hybrid
Run streaming replication at each site for local HA and federation between sites for cross-site memory exchange. The two layers are independent and compose cleanly.
Local replica (planned, v4)
A SQLite-based local-replica profile for developer laptops and offline-capable edge devices: pull from an upstream MNEMOS instance, work offline, sync back when reconnected. Same federation envelope, smaller footprint, no Postgres requirement. Targeted for v4.
Distribution forms
pip install mnemos-os==3.5.1— Python package, drops in next to your application.ghcr.io/mnemos-os/mnemos:3.5.1— OCI container (Docker, Podman, or any OCI runtime).- Bundled
docker-compose.yml— MNEMOS + Postgres + ollama + the migration upgrade service in one stack. Ships in the repo. - Production fleet installs commonly run MNEMOS under Podman with systemd unit supervision; the container shape is identical.
What it doesn't do (yet)
- *...
MNEMOS v3.4.1 — CHARON federation schema-compat preflight
MNEMOS v3.4.1
A memory operating system for serious agentic work. Apache 2.0.
This is a federation-ergonomics + portability-discipline release. The CHARON memory-portability format gained a schema-compat preflight check, and the dev↔prod MPF restore drill validated the round-trip.
Highlights
- CHARON federation schema-compat preflight. A peer pulling a federated memory now validates the source schema before accepting the row, preventing schema-drift silent corruption between minor versions.
- MPF dev↔prod restore drill. Verified that the MNEMOS Portability Format (MPF) export from a v3.4 dev instance round-trips into a v3.4 prod instance with no data loss across memories, KG triples, sessions, audit log, and federation cursors.
- Operational documentation.
DEPLOYMENT.mdnow documents the dev→prod MPF restore drill as a recurring backup-validation procedure.
Install
pip install mnemos-os==3.4.1See DEPLOYMENT.md for full install, migration, and configuration reference.
Links
License
Apache License 2.0.
Release notes backfilled in April 2026. The v3.4.1 tag itself is at SHA 88cd1e2.
MNEMOS v3.1.0 — Apache 2.0 memory operating system
MNEMOS v3.1.0
MNEMOS is an Apache 2.0 memory operating system with GRAEAE reasoning, DAG versioning, compression, and cost-aware routing. This release consolidates the v3.x line as a single-licensed open-source project.
What's in this release
- Compression platform — plugin
CompressionEngineABC with three built-in engines (LETHE, ANAMNESIS, optional ALETHEIA), competitive contest selection with persisted audit log, GPU circuit breaker for graceful degradation - DAG versioning — content-addressed commits, branches, and merge support on every memory
- Knowledge graph — first-class triples with temporal validity
- Federation — pull-based, content-addressed memory exchange between peers
- OAuth + bearer auth — team/enterprise install profiles with RLS-enforced multi-tenancy
- Webhook dispatcher — HMAC-signed, SSRF-protected event delivery
- Model registry — self-maintaining provider-model catalog with Arena Elo scores
License
Apache License 2.0 (single license). Contributions accepted under the Developer Certificate of Origin (DCO). See LICENSE and CONTRIBUTING.md.
Install
pip install mnemos-os
# or
docker pull ghcr.io/perlowja/mnemos:3.1.0 # coming soonSee DEPLOYMENT.md for full install, migration, and configuration reference.