Replies: 5 comments
-
Beta Was this translation helpful? Give feedback.
-
🎯 Revised proposal: KB Plugin v1 — Keep it dead simpleAfter the feedback in Discord (and general agreement in the thread): the full RAG spec above is the right end-state, but completely the wrong v1. Got carried away designing the dream instead of the thing that actually ships. What v1 isA structured content store that agents can read from. Nothing more. Data model:
Agent access:
UI (plugin settings page):
What v1 is NOT
All of that belongs in v2+ once the basic structure proves useful. Why this still delivers real valueEven without RAG, a KB plugin immediately solves the core problem: agents start tasks with zero company context. With v1, you can write "always inject" articles like:
That's already a massive improvement over the current blank slate, and it's trivially simple to build. Think Notion without the AI, as a plugin — flexible enough for every use case, not just dev-heavy setups. Prior art worth studying: Hindsight for OpenClaw@mingfang flagged this in Discord and it's relevant. Hindsight ( How it works:
Their key insight: "memory that works automatically is qualitatively different from memory that depends on model behavior." The critique of OpenClaw's native memory — agent has to decide what to save and when to search — maps exactly to the problem we're trying to solve here. But Hindsight and the KB plugin solve different problems:
They're complementary, not competing. Hindsight = agent's episodic memory. KB plugin = company's institutional memory. That said, the Hindsight architecture (local daemon reusing existing Postgres, auto-inject before each turn, feedback loop prevention) is a solid reference for when we eventually build the v2 RAG layer. Worth studying before reinventing. Suggested build order
RAG/pgvector + Hindsight-style auto-extraction as v2 once we see what people actually put in there. |
Beta Was this translation helpful? Give feedback.
-
|
Good writeup. The "agent starts blind" problem has two layers: missing retrieval context (what the KB solves) and ambiguous task instructions (what the prompt itself needs to fix). Even with solid RAG, if the task description is flat prose the agent still has to infer role, constraints, and output expectations. Separating those into named blocks makes the KB context slot cleanly into a context block while the task, constraints, and format stay separate and stable across runs. I built flompt (https://flompt.dev) for exactly this, a visual prompt builder that decomposes prompts into semantic XML blocks. Pairs well with a KB layer. Open-source: github.com/Nyrok/flompt |
Beta Was this translation helpful? Give feedback.
-
|
Given how BAD vector databases for RAG work in the normal implementation in OpenClaw, maybe a knowledbe base based approach is better? SOmething that decomposes documents first then combines both normal search as well as vector search. There are multiple better memroy implementations for OpenClaw agents and they all are not using the pure RAG approach. Vector i.e. would get totally confused on a normal large code formatting document - it would need that decomposed into small files that each have different vectors. not against an agent having access to a KB - just saying the traditional approach may be as unusable as anything. I also would object "Always inject" flag per category — auto-prepend to agent context at task start THIS would ONLY work for MULTIPLE SEPARATE KB's or if a KB can be marked as secondary. I.e. something tha MUST inject for a programmer / implementation near agent may not need to be injected at all (or even accessible) for a market researcher. |
Beta Was this translation helpful? Give feedback.
-
|
Maybe this will be useful? https://www.youtube.com/watch?v=sboNwYmH3AY&t=14s |
Beta Was this translation helpful? Give feedback.



Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
🧠 Context & origin
Started as a simple UX ask (document uploads in issues), escalated fast in Discord. Given the plugin system is already on the roadmap — and that a KB is literally cited as the canonical plugin example — this felt worth formalizing properly.
This is a full proposal, not just a feature request. The goal is to think through the whole thing end-to-end so whoever picks it up (or we build it together) has a solid foundation.
🔥 The core problem
Right now, every agent in Paperclip starts each task essentially blind. It has access to the codebase and the task description, but nothing else. It doesn't know:
This forces you to either:
A knowledge base solves this at the platform level, once, for all agents.
📦 What the KB stores — full breakdown
1. 📄 Technical documentation
.envstructure explanations (not secrets, but what each var is for)2. 🧑💻 Code conventions & standards
3. 🏗️ Architectural decisions (ADRs)
4. 🧩 Product & domain knowledge
5. 📋 Processes & runbooks
6. 🔐 Constraints & compliance rules
7. 🕰️ Historical context & task memory
8. 🔗 External references
9. 📁 Files & attachments
🏗️ Proposed architecture
🔌 Why pgvector is the right call here
Paperclip already runs an embedded Postgres instance per deployment. Adding
pgvectoras an extension is a single line — no new infrastructure, no new service to manage, no external SaaS dependency.Hybrid search query (vector + keyword):
No Pinecone, no Qdrant, no Weaviate. Just the Postgres you already have.
🤖 How agents interact with the KB
Auto-injection at task start
When a task is created and an agent picks it up, the runtime automatically:
<knowledge_base>blockMid-task explicit lookup
The agent can call the KB as a tool at any point during execution:
Writing back to the KB
Agents can propose KB additions — decisions they made that should be remembered:
These go into a pending review queue visible in the UI — a human approves before they're added to the main KB. No agent writes to the KB without human confirmation.
🎨 UI considerations
KB management page (
/settings/knowledge-base)In-issue attachment support (the original ask)
❓ Open questions
Embedding model — local (nomic-embed-text, all-MiniLM-L6 via Ollama) for full privacy, or API-based (OpenAI
text-embedding-3-small, Cohere) for better quality? Should be configurable.Context budget — how many tokens of KB context should be auto-injected per task? Should agents be able to request more?
Scoping — company-wide KB only, or also per-project KBs? Per-agent private KB?
Sync strategy — for Git repo sync, should we watch the
/docsfolder only or let users configure which paths to watch?Agent write access — should agents be able to mark a document as "outdated" or "superseded" in addition to proposing new entries?
Version history — should KB entries be versioned so you can see how a document evolved over time?
Access control — should certain KB entries be restricted to specific agents or roles?
🛣️ Suggested build order
Phase 1 — Basic storage + manual upload
Phase 2 — Hybrid search + auto-injection
Phase 3 — Sync sources
/docsfolder watcherPhase 4 — Agent write-back
kb.search()andkb.save()tools exposed to agents@aaaaron already in 🙋 — happy to co-build. Who else?
Beta Was this translation helpful? Give feedback.
All reactions