MemPalace-ZH-FlexEmbed

English

A MemPalace fork focused on three practical upgrades for local AI memory:

flexible self-hosted embedding models
transcript-aware long-chat mining
stronger Chinese and mixed-language support
stdio MCP integration for local AI clients such as Chatbox, Claude Code, Codex-style agents, and similar tools
service-style MCP deployment over streamable HTTP for clients that prefer one long-lived local service

Why this fork exists

Official MemPalace has already become much stronger in multilingual support. This fork is not trying to replace upstream. Its goal is to push harder on the parts that matter in real local memory workflows:

swapping in stronger local embedding models
handling exported chat transcripts more reliably
improving retrieval on Chinese and mixed-language conversation data
making MCP usage smoother in local stdio clients
supporting service-style MCP deployment over streamable HTTP, not just stdio
recovering short memory queries more reliably over MCP without requiring the caller to know a lot of hidden context

Key features

1. Flexible self-hosted embedding models

This is the biggest feature.

Instead of being stuck with one default embedding setup, you can point the system at your own local model, including large self-hosted models such as Qwen3-Embedding-8B.

Example:

export MEMPALACE_EMBED_MODEL=$HOME/.mempalace-zh/models/Qwen3-Embedding-8B
export MEMPALACE_EMBED_DEVICE=mps
export MEMPALACE_EMBED_BATCH_SIZE=2

Typical device values:

mps for Apple Silicon
cuda for NVIDIA GPUs
cpu for fallback

2. Transcript-aware long-chat mining

This fork improves transcript normalization and chunking for long-form personal conversation histories, especially markdown exports from chat tools.

That matters when memory is not just about storing short facts, but recovering:

what was said on a specific day
what happened right before an important event
what gift, location, or coincidence tied an event together

3. Stronger Chinese and mixed-language support

This fork adds extra handling for Chinese and mixed Chinese-English material in:

transcript normalization
conversation mining
general extraction
query sanitization
lightweight search reranking

4. MCP-ready local memory

The MCP workflow is not specific to one app.

If a host can launch a local stdio MCP server, this project can usually be integrated into it.

Typical categories include:

Chatbox
Claude Code
Codex-style local agent shells
other desktop or terminal tools with stdio MCP support

This fork also fixes UTF-8 MCP output, so Chinese appears directly instead of being escaped as \uXXXX.

5. stdio and service-style MCP transports

This fork supports both:

stdio MCP, where the client launches the server process for you
streamable HTTP, where you run one long-lived local MCP service and let multiple clients connect to it

That second mode is especially useful when a desktop client tends to leave duplicate stdio Python processes behind after repeated reconnects.

6. Better short-query recovery over MCP

Real MCP clients often search with very short labels because the model does not yet know the surrounding context.

This fork now improves that path by:

using a looser default distance for short queries
retrying enriched query variants automatically
falling back to lexical matching when semantic recall is too weak

That makes terse lookups like name origin, steak incident, or allergy less likely to come back empty when the underlying memory is actually present.

Recommended installation

For this project, the recommended pattern is editable install:

pip install -e ".[dev,local-embeddings]"

Why editable mode:

it is best used as a local working tree
MCP servers often point directly at the current repo environment
local memory workflows often involve iterative tuning and retesting

Cloning the repo is not enough. The repository does not include:

model weights
optional local embedding runtime dependencies unless you install extras

The local embedding extra currently pulls in:

sentence-transformers>=2.7.0
transformers>=4.51.0
torch>=2.4

Quick start

1. Clone

git clone https://github.com/SunflowerNocturne/MemPalace-ZH-FlexEmbed.git
cd MemPalace-ZH-FlexEmbed

2. Create the environment

conda env create -f environment.yml
conda activate mempalace-zh-flexembed

3. Install

pip install -e ".[dev,local-embeddings]"

4. Install Hugging Face CLI helper

pip install "huggingface_hub[cli]>=0.23"

5. Prepare the model directory

mkdir -p ~/.mempalace-zh/models

6. Download `Qwen3-Embedding-8B`

Model page:

Qwen/Qwen3-Embedding-8B

Example command:

hf download Qwen/Qwen3-Embedding-8B \
  --local-dir ~/.mempalace-zh/models/Qwen3-Embedding-8B

7. Export embedding environment variables

export MEMPALACE_EMBED_MODEL=$HOME/.mempalace-zh/models/Qwen3-Embedding-8B
export MEMPALACE_EMBED_DEVICE=mps
export MEMPALACE_EMBED_BATCH_SIZE=2

8. Create a palace

mkdir -p ~/.mempalace-zh/palace

9. Mine data

Project files:

mempalace init /path/to/project
mempalace mine /path/to/project --wing "MyProject"

Conversations:

mempalace mine /path/to/chatlogs --mode convos --wing "MyChats"

10. Search

mempalace --palace ~/.mempalace-zh/palace search "what you're looking for" --wing "MyChats"

Generic MCP setup

Recommended for desktop clients:

Use Streamable HTTP when your client supports it.
Keep stdio as a fallback for older hosts.
Streamable HTTP avoids the common problem where repeated reconnects can leave multiple heavy Python MCP processes running at once.

Core launch pattern:

/absolute/path/to/conda/env/bin/python -m mempalace.mcp_server --palace /absolute/path/to/palace

Environment variables:

MEMPALACE_EMBED_MODEL=/absolute/path/to/your/embedding-model
MEMPALACE_EMBED_DEVICE=mps
MEMPALACE_EMBED_BATCH_SIZE=2

Example stdio MCP command:

/Users/your_name/miniconda3/envs/mempalace-zh-flexembed/bin/python -m mempalace.mcp_server --palace /Users/your_name/.mempalace-zh/palace

Recommended streamable HTTP launch:

/Users/your_name/miniconda3/envs/mempalace-zh-flexembed/bin/python -m mempalace.mcp_server \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8765 \
  --mount-path /mcp \
  --palace /Users/your_name/.mempalace-zh/palace-fiction

Then configure your MCP client with:

URL=http://127.0.0.1:8765/mcp

Field-by-field examples:

Chatbox 远程 (http/sse):
- 名称: mempalace-zh-fiction
- URL: http://127.0.0.1:8765/mcp
- HTTP Header: leave blank
Codex Streamable HTTP:
- Name: mempalace-zh-fiction
- URL: http://127.0.0.1:8765/mcp
- Bearer token env var: leave blank
- Headers: leave blank
- Headers from environment variables: leave blank

Short-query recovery over MCP:

Recent builds automatically recover from short memory queries such as name origin, steak incident, or other terse event labels.
If the caller omits a strict threshold, the server now uses a looser default for short queries, retries enriched variants, and can fall back to lexical matching when semantic recall is too weak.
In practice, MCP clients should usually omit max_distance for short/event-style lookups instead of forcing a strict value like 0.5.

If you want a second always-on palace, start it on another port:

/Users/your_name/miniconda3/envs/mempalace-zh-flexembed/bin/python -m mempalace.mcp_server \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8766 \
  --mount-path /mcp \
  --palace /Users/your_name/.mempalace-zh/palace-personal

If you change:

embedding model path
batch size
palace path
MCP server command

restart the MCP server in your client.

Repository layout

mempalace/
tests/
benchmarks/
examples/
hooks/
docs/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemPalace-ZH-FlexEmbed

English

Why this fork exists

Key features

1. Flexible self-hosted embedding models

2. Transcript-aware long-chat mining

3. Stronger Chinese and mixed-language support

4. MCP-ready local memory

5. stdio and service-style MCP transports

6. Better short-query recovery over MCP

Recommended installation

Quick start

1. Clone

2. Create the environment

3. Install

4. Install Hugging Face CLI helper

5. Prepare the model directory

6. Download `Qwen3-Embedding-8B`

7. Export embedding environment variables

8. Create a palace

9. Mine data

10. Search

Generic MCP setup

Repository layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
benchmarks		benchmarks
docs		docs
examples		examples
hooks		hooks
mempalace		mempalace
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

MemPalace-ZH-FlexEmbed

English

Why this fork exists

Key features

1. Flexible self-hosted embedding models

2. Transcript-aware long-chat mining

3. Stronger Chinese and mixed-language support

4. MCP-ready local memory

5. stdio and service-style MCP transports

6. Better short-query recovery over MCP

Recommended installation

Quick start

1. Clone

2. Create the environment

3. Install

4. Install Hugging Face CLI helper

5. Prepare the model directory

6. Download Qwen3-Embedding-8B

7. Export embedding environment variables

8. Create a palace

9. Mine data

10. Search

Generic MCP setup

Repository layout

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

6. Download `Qwen3-Embedding-8B`

Packages