A MemPalace fork focused on three practical upgrades for local AI memory:
- flexible self-hosted embedding models
- transcript-aware long-chat mining
- stronger Chinese and mixed-language support
- stdio MCP integration for local AI clients such as Chatbox, Claude Code, Codex-style agents, and similar tools
- service-style MCP deployment over streamable HTTP for clients that prefer one long-lived local service
Official MemPalace has already become much stronger in multilingual support. This fork is not trying to replace upstream. Its goal is to push harder on the parts that matter in real local memory workflows:
- swapping in stronger local embedding models
- handling exported chat transcripts more reliably
- improving retrieval on Chinese and mixed-language conversation data
- making MCP usage smoother in local stdio clients
- supporting service-style MCP deployment over streamable HTTP, not just stdio
- recovering short memory queries more reliably over MCP without requiring the caller to know a lot of hidden context
This is the biggest feature.
Instead of being stuck with one default embedding setup, you can point the
system at your own local model, including large self-hosted models such as
Qwen3-Embedding-8B.
Example:
export MEMPALACE_EMBED_MODEL=$HOME/.mempalace-zh/models/Qwen3-Embedding-8B
export MEMPALACE_EMBED_DEVICE=mps
export MEMPALACE_EMBED_BATCH_SIZE=2Typical device values:
mpsfor Apple Siliconcudafor NVIDIA GPUscpufor fallback
This fork improves transcript normalization and chunking for long-form personal conversation histories, especially markdown exports from chat tools.
That matters when memory is not just about storing short facts, but recovering:
- what was said on a specific day
- what happened right before an important event
- what gift, location, or coincidence tied an event together
This fork adds extra handling for Chinese and mixed Chinese-English material in:
- transcript normalization
- conversation mining
- general extraction
- query sanitization
- lightweight search reranking
The MCP workflow is not specific to one app.
If a host can launch a local stdio MCP server, this project can usually be integrated into it.
Typical categories include:
- Chatbox
- Claude Code
- Codex-style local agent shells
- other desktop or terminal tools with stdio MCP support
This fork also fixes UTF-8 MCP output, so Chinese appears directly instead of
being escaped as \uXXXX.
This fork supports both:
stdioMCP, where the client launches the server process for youstreamable HTTP, where you run one long-lived local MCP service and let multiple clients connect to it
That second mode is especially useful when a desktop client tends to leave
duplicate stdio Python processes behind after repeated reconnects.
Real MCP clients often search with very short labels because the model does not yet know the surrounding context.
This fork now improves that path by:
- using a looser default distance for short queries
- retrying enriched query variants automatically
- falling back to lexical matching when semantic recall is too weak
That makes terse lookups like name origin, steak incident, or allergy
less likely to come back empty when the underlying memory is actually present.
For this project, the recommended pattern is editable install:
pip install -e ".[dev,local-embeddings]"Why editable mode:
- it is best used as a local working tree
- MCP servers often point directly at the current repo environment
- local memory workflows often involve iterative tuning and retesting
Cloning the repo is not enough. The repository does not include:
- model weights
- optional local embedding runtime dependencies unless you install extras
The local embedding extra currently pulls in:
sentence-transformers>=2.7.0transformers>=4.51.0torch>=2.4
git clone https://github.com/SunflowerNocturne/MemPalace-ZH-FlexEmbed.git
cd MemPalace-ZH-FlexEmbedconda env create -f environment.yml
conda activate mempalace-zh-flexembedpip install -e ".[dev,local-embeddings]"pip install "huggingface_hub[cli]>=0.23"mkdir -p ~/.mempalace-zh/modelsModel page:
Example command:
hf download Qwen/Qwen3-Embedding-8B \
--local-dir ~/.mempalace-zh/models/Qwen3-Embedding-8Bexport MEMPALACE_EMBED_MODEL=$HOME/.mempalace-zh/models/Qwen3-Embedding-8B
export MEMPALACE_EMBED_DEVICE=mps
export MEMPALACE_EMBED_BATCH_SIZE=2mkdir -p ~/.mempalace-zh/palaceProject files:
mempalace init /path/to/project
mempalace mine /path/to/project --wing "MyProject"Conversations:
mempalace mine /path/to/chatlogs --mode convos --wing "MyChats"mempalace --palace ~/.mempalace-zh/palace search "what you're looking for" --wing "MyChats"Recommended for desktop clients:
- Use
Streamable HTTPwhen your client supports it. - Keep
stdioas a fallback for older hosts. Streamable HTTPavoids the common problem where repeated reconnects can leave multiple heavy Python MCP processes running at once.
Core launch pattern:
/absolute/path/to/conda/env/bin/python -m mempalace.mcp_server --palace /absolute/path/to/palace
Environment variables:
MEMPALACE_EMBED_MODEL=/absolute/path/to/your/embedding-model
MEMPALACE_EMBED_DEVICE=mps
MEMPALACE_EMBED_BATCH_SIZE=2
Example stdio MCP command:
/Users/your_name/miniconda3/envs/mempalace-zh-flexembed/bin/python -m mempalace.mcp_server --palace /Users/your_name/.mempalace-zh/palace
Recommended streamable HTTP launch:
/Users/your_name/miniconda3/envs/mempalace-zh-flexembed/bin/python -m mempalace.mcp_server \
--transport streamable-http \
--host 127.0.0.1 \
--port 8765 \
--mount-path /mcp \
--palace /Users/your_name/.mempalace-zh/palace-fictionThen configure your MCP client with:
URL=http://127.0.0.1:8765/mcp
Field-by-field examples:
- Chatbox
远程 (http/sse):名称:mempalace-zh-fictionURL:http://127.0.0.1:8765/mcpHTTP Header: leave blank
- Codex
Streamable HTTP:Name:mempalace-zh-fictionURL:http://127.0.0.1:8765/mcpBearer token env var: leave blankHeaders: leave blankHeaders from environment variables: leave blank
Short-query recovery over MCP:
- Recent builds automatically recover from short memory queries such as
name origin,steak incident, or other terse event labels. - If the caller omits a strict threshold, the server now uses a looser default for short queries, retries enriched variants, and can fall back to lexical matching when semantic recall is too weak.
- In practice, MCP clients should usually omit
max_distancefor short/event-style lookups instead of forcing a strict value like0.5.
If you want a second always-on palace, start it on another port:
/Users/your_name/miniconda3/envs/mempalace-zh-flexembed/bin/python -m mempalace.mcp_server \
--transport streamable-http \
--host 127.0.0.1 \
--port 8766 \
--mount-path /mcp \
--palace /Users/your_name/.mempalace-zh/palace-personalIf you change:
- embedding model path
- batch size
- palace path
- MCP server command
restart the MCP server in your client.
mempalace/tests/benchmarks/examples/hooks/docs/