Skip to content

SunflowerNocturne/MemPalace-ZH-FlexEmbed

Repository files navigation

MemPalace-ZH-FlexEmbed

English | 简体中文

English

A MemPalace fork focused on three practical upgrades for local AI memory:

  • flexible self-hosted embedding models
  • transcript-aware long-chat mining
  • stronger Chinese and mixed-language support
  • stdio MCP integration for local AI clients such as Chatbox, Claude Code, Codex-style agents, and similar tools
  • service-style MCP deployment over streamable HTTP for clients that prefer one long-lived local service

Why this fork exists

Official MemPalace has already become much stronger in multilingual support. This fork is not trying to replace upstream. Its goal is to push harder on the parts that matter in real local memory workflows:

  • swapping in stronger local embedding models
  • handling exported chat transcripts more reliably
  • improving retrieval on Chinese and mixed-language conversation data
  • making MCP usage smoother in local stdio clients
  • supporting service-style MCP deployment over streamable HTTP, not just stdio
  • recovering short memory queries more reliably over MCP without requiring the caller to know a lot of hidden context

Key features

1. Flexible self-hosted embedding models

This is the biggest feature.

Instead of being stuck with one default embedding setup, you can point the system at your own local model, including large self-hosted models such as Qwen3-Embedding-8B.

Example:

export MEMPALACE_EMBED_MODEL=$HOME/.mempalace-zh/models/Qwen3-Embedding-8B
export MEMPALACE_EMBED_DEVICE=mps
export MEMPALACE_EMBED_BATCH_SIZE=2

Typical device values:

  • mps for Apple Silicon
  • cuda for NVIDIA GPUs
  • cpu for fallback

2. Transcript-aware long-chat mining

This fork improves transcript normalization and chunking for long-form personal conversation histories, especially markdown exports from chat tools.

That matters when memory is not just about storing short facts, but recovering:

  • what was said on a specific day
  • what happened right before an important event
  • what gift, location, or coincidence tied an event together

3. Stronger Chinese and mixed-language support

This fork adds extra handling for Chinese and mixed Chinese-English material in:

  • transcript normalization
  • conversation mining
  • general extraction
  • query sanitization
  • lightweight search reranking

4. MCP-ready local memory

The MCP workflow is not specific to one app.

If a host can launch a local stdio MCP server, this project can usually be integrated into it.

Typical categories include:

  • Chatbox
  • Claude Code
  • Codex-style local agent shells
  • other desktop or terminal tools with stdio MCP support

This fork also fixes UTF-8 MCP output, so Chinese appears directly instead of being escaped as \uXXXX.

5. stdio and service-style MCP transports

This fork supports both:

  • stdio MCP, where the client launches the server process for you
  • streamable HTTP, where you run one long-lived local MCP service and let multiple clients connect to it

That second mode is especially useful when a desktop client tends to leave duplicate stdio Python processes behind after repeated reconnects.

6. Better short-query recovery over MCP

Real MCP clients often search with very short labels because the model does not yet know the surrounding context.

This fork now improves that path by:

  • using a looser default distance for short queries
  • retrying enriched query variants automatically
  • falling back to lexical matching when semantic recall is too weak

That makes terse lookups like name origin, steak incident, or allergy less likely to come back empty when the underlying memory is actually present.

Recommended installation

For this project, the recommended pattern is editable install:

pip install -e ".[dev,local-embeddings]"

Why editable mode:

  • it is best used as a local working tree
  • MCP servers often point directly at the current repo environment
  • local memory workflows often involve iterative tuning and retesting

Cloning the repo is not enough. The repository does not include:

  • model weights
  • optional local embedding runtime dependencies unless you install extras

The local embedding extra currently pulls in:

  • sentence-transformers>=2.7.0
  • transformers>=4.51.0
  • torch>=2.4

Quick start

1. Clone

git clone https://github.com/SunflowerNocturne/MemPalace-ZH-FlexEmbed.git
cd MemPalace-ZH-FlexEmbed

2. Create the environment

conda env create -f environment.yml
conda activate mempalace-zh-flexembed

3. Install

pip install -e ".[dev,local-embeddings]"

4. Install Hugging Face CLI helper

pip install "huggingface_hub[cli]>=0.23"

5. Prepare the model directory

mkdir -p ~/.mempalace-zh/models

6. Download Qwen3-Embedding-8B

Model page:

Example command:

hf download Qwen/Qwen3-Embedding-8B \
  --local-dir ~/.mempalace-zh/models/Qwen3-Embedding-8B

7. Export embedding environment variables

export MEMPALACE_EMBED_MODEL=$HOME/.mempalace-zh/models/Qwen3-Embedding-8B
export MEMPALACE_EMBED_DEVICE=mps
export MEMPALACE_EMBED_BATCH_SIZE=2

8. Create a palace

mkdir -p ~/.mempalace-zh/palace

9. Mine data

Project files:

mempalace init /path/to/project
mempalace mine /path/to/project --wing "MyProject"

Conversations:

mempalace mine /path/to/chatlogs --mode convos --wing "MyChats"

10. Search

mempalace --palace ~/.mempalace-zh/palace search "what you're looking for" --wing "MyChats"

Generic MCP setup

Recommended for desktop clients:

  • Use Streamable HTTP when your client supports it.
  • Keep stdio as a fallback for older hosts.
  • Streamable HTTP avoids the common problem where repeated reconnects can leave multiple heavy Python MCP processes running at once.

Core launch pattern:

/absolute/path/to/conda/env/bin/python -m mempalace.mcp_server --palace /absolute/path/to/palace

Environment variables:

MEMPALACE_EMBED_MODEL=/absolute/path/to/your/embedding-model
MEMPALACE_EMBED_DEVICE=mps
MEMPALACE_EMBED_BATCH_SIZE=2

Example stdio MCP command:

/Users/your_name/miniconda3/envs/mempalace-zh-flexembed/bin/python -m mempalace.mcp_server --palace /Users/your_name/.mempalace-zh/palace

Recommended streamable HTTP launch:

/Users/your_name/miniconda3/envs/mempalace-zh-flexembed/bin/python -m mempalace.mcp_server \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8765 \
  --mount-path /mcp \
  --palace /Users/your_name/.mempalace-zh/palace-fiction

Then configure your MCP client with:

URL=http://127.0.0.1:8765/mcp

Field-by-field examples:

  • Chatbox 远程 (http/sse):
    • 名称: mempalace-zh-fiction
    • URL: http://127.0.0.1:8765/mcp
    • HTTP Header: leave blank
  • Codex Streamable HTTP:
    • Name: mempalace-zh-fiction
    • URL: http://127.0.0.1:8765/mcp
    • Bearer token env var: leave blank
    • Headers: leave blank
    • Headers from environment variables: leave blank

Short-query recovery over MCP:

  • Recent builds automatically recover from short memory queries such as name origin, steak incident, or other terse event labels.
  • If the caller omits a strict threshold, the server now uses a looser default for short queries, retries enriched variants, and can fall back to lexical matching when semantic recall is too weak.
  • In practice, MCP clients should usually omit max_distance for short/event-style lookups instead of forcing a strict value like 0.5.

If you want a second always-on palace, start it on another port:

/Users/your_name/miniconda3/envs/mempalace-zh-flexembed/bin/python -m mempalace.mcp_server \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8766 \
  --mount-path /mcp \
  --palace /Users/your_name/.mempalace-zh/palace-personal

If you change:

  • embedding model path
  • batch size
  • palace path
  • MCP server command

restart the MCP server in your client.

Repository layout

  • mempalace/
  • tests/
  • benchmarks/
  • examples/
  • hooks/
  • docs/

About

A MemPalace fork with flexible self-hosted embeddings, transcript-aware long-chat mining, stronger Chinese support, and stdio MCP integration for local AI clients.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors