Skip to content

Commit c866cf5

Browse files
committed
docs: update project descriptions and features in multiple language READMEs
1 parent 301d75e commit c866cf5

File tree

4 files changed

+231
-258
lines changed

4 files changed

+231
-258
lines changed

README.md

Lines changed: 58 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,11 @@
4242

4343
## 📌 Overview
4444

45-
[**WeKnora**](https://weknora.weixin.qq.com) is an LLM-powered framework designed for deep document understanding and semantic retrieval, especially for handling complex, heterogeneous documents.
45+
[**WeKnora**](https://weknora.weixin.qq.com) is an LLM-powered intelligent knowledge management and Q&A framework built for enterprise-grade document understanding and semantic retrieval.
4646

47-
It adopts a modular architecture that combines multimodal preprocessing, semantic vector indexing, intelligent retrieval, and large language model inference. At its core, WeKnora follows the **RAG (Retrieval-Augmented Generation)** paradigm, enabling high-quality, context-aware answers by combining relevant document chunks with model reasoning.
47+
WeKnora offers two Q&A modes — **Quick Q&A** and **Intelligent Reasoning**. Quick Q&A uses a **RAG (Retrieval-Augmented Generation)** pipeline to rapidly retrieve relevant chunks and generate answers, ideal for everyday knowledge queries. Intelligent Reasoning is powered by a **ReACT Agent** engine that employs a **progressive strategy** to autonomously orchestrate knowledge retrieval, MCP tools, and web search, iteratively reasoning and reflecting to arrive at a final conclusion — suited for multi-source synthesis and complex tasks. Custom agents are also supported, allowing flexible configuration of dedicated knowledge bases, tool sets, and system prompts. Choose the right mode for the task, balancing response speed with reasoning depth.
48+
49+
The framework supports auto-syncing knowledge from Feishu (more data sources coming soon), handles 10+ document formats including PDF, Word, images, and Excel, and can serve Q&A directly through IM channels like WeCom, Feishu, Slack, and Telegram. It is compatible with major LLM providers including OpenAI, DeepSeek, Qwen (Alibaba Cloud), Zhipu, Hunyuan, Gemini, MiniMax, NVIDIA, and Ollama. Its fully modular design allows swapping LLMs, vector databases, and storage backends, with support for local and private cloud deployment ensuring complete data sovereignty.
4850

4951
**Website:** https://weknora.weixin.qq.com
5052

@@ -129,69 +131,54 @@ It adopts a modular architecture that combines multimodal preprocessing, semanti
129131

130132
</details>
131133

132-
## 🔒 Security Notice
133-
134-
**Important:** Starting from v0.1.3, WeKnora includes login authentication functionality to enhance system security. For production deployments, we strongly recommend:
135-
136-
- Deploy WeKnora services in internal/private network environments rather than public internet
137-
- Avoid exposing the service directly to public networks to prevent potential information leakage
138-
- Configure proper firewall rules and access controls for your deployment environment
139-
- Regularly update to the latest version for security patches and improvements
140134

141135
## 🏗️ Architecture
142136

143137
![weknora-architecture.png](./docs/images/architecture.png)
144138

145-
WeKnora employs a modern modular design to build a complete document understanding and retrieval pipeline. The system primarily includes document parsing, vector processing, retrieval engine, and large model inference as core modules, with each component being flexibly configurable and extendable.
146-
147-
## 🎯 Key Features
148-
149-
- **🤖 Agent Mode**: Support for ReACT Agent mode that can use built-in tools to retrieve knowledge bases, MCP tools, and web search tools to access external services, providing comprehensive summary reports through multiple iterations and reflection
150-
- **🔍 Precise Understanding**: Structured content extraction from PDFs, Word documents, images and more into unified semantic views
151-
- **🧠 Intelligent Reasoning**: Leverages LLMs to understand document context and user intent for accurate Q&A and multi-turn conversations
152-
- **📚 Multi-Type Knowledge Bases**: Support for FAQ and document knowledge base types, with folder import, URL import, tag management, and online entry capabilities
153-
- **🔧 Flexible Extension**: All components from parsing and embedding to retrieval and generation are decoupled for easy customization
154-
- **⚡ Efficient Retrieval**: Hybrid retrieval strategies combining keywords, vectors, and knowledge graphs, with cross-knowledge base retrieval support
155-
- **🌐 Web Search**: Support for extensible web search engines with built-in DuckDuckGo search engine
156-
- **🔌 MCP Tool Integration**: Support for extending Agent capabilities through MCP, with built-in uvx and npx launchers, supporting multiple transport methods
157-
- **⚙️ Conversation Strategy**: Support for configuring Agent models, normal mode models, retrieval thresholds, and Prompts, with precise control over multi-turn conversation behavior
158-
- **🎯 User-Friendly**: Intuitive web interface and standardized APIs for zero technical barriers
159-
- **🔒 Secure & Controlled**: Support for local deployment and private cloud, ensuring complete data sovereignty
160-
161-
## 📊 Application Scenarios
162-
163-
| Scenario | Applications | Core Value |
164-
|---------|----------|----------|
165-
| **Enterprise Knowledge Management** | Internal document retrieval, policy Q&A, operation manual search | Improve knowledge discovery efficiency, reduce training costs |
166-
| **Academic Research Analysis** | Paper retrieval, research report analysis, scholarly material organization | Accelerate literature review, assist research decisions |
167-
| **Product Technical Support** | Product manual Q&A, technical documentation search, troubleshooting | Enhance customer service quality, reduce support burden |
168-
| **Legal & Compliance Review** | Contract clause retrieval, regulatory policy search, case analysis | Improve compliance efficiency, reduce legal risks |
169-
| **Medical Knowledge Assistance** | Medical literature retrieval, treatment guideline search, case analysis | Support clinical decisions, improve diagnosis quality |
170-
171-
## 🧩 Feature Matrix
172-
173-
| Module | Support | Description |
174-
|---------|---------|-------------|
175-
| Agent Mode | ✅ ReACT Agent Mode | Built-in tools for knowledge base retrieval, MCP tool calls, and web search; cross-knowledge base retrieval with multi-step iteration |
176-
| Knowledge Base Types | ✅ FAQ / Document | FAQ and document knowledge bases with folder import, URL import, tag management, online entry, and knowledge move |
177-
| Document Formats | ✅ PDF / Word / Txt / Markdown / HTML / Images (OCR + Caption) | Structured and unstructured document parsing; image text extraction via OCR; image caption generation via VLM |
178-
| IM Channel Integration | ✅ WeCom / Feishu / Slack / Telegram / DingTalk / Mattermost | WebSocket and Webhook modes; streaming replies; slash commands (/help, /info, /search, /stop, /clear); per-user rate limiting; Redis-based multi-instance coordination |
179-
| Model Management | ✅ Centralized configuration, built-in model sharing | Centralized model config with per-knowledge-base model selection; multi-tenant shared built-in model support |
180-
| Embedding Models | ✅ Local models (Ollama), BGE / GTE / OpenAI-compatible APIs | Customizable embedding models compatible with local deployment and cloud vector generation APIs |
181-
| Vector DB Integration | ✅ PostgreSQL (pgvector) / Elasticsearch / Milvus / Weaviate / Qdrant | Five vector index backends with flexible switching to match retrieval scenario requirements |
182-
| Object Storage | ✅ Local / MinIO / AWS S3 / Volcengine TOS | Pluggable storage adapters for file and image assets; bucket auto-creation on startup |
183-
| Retrieval Strategies | ✅ BM25 / Dense Retrieval / GraphRAG | Sparse/dense recall and knowledge graph-enhanced retrieval; customizable retrieve-rerank-generate pipeline |
184-
| LLM Integration | ✅ Qwen / DeepSeek / MiniMax / NVIDIA / Novita AI / OpenAI-compatible | Local models via Ollama or external API services; thinking/non-thinking mode switching; vLLM streaming reasoning content support |
185-
| Conversation Strategy | ✅ Agent model, normal model, retrieval threshold, Prompt configuration | Online Prompt editing; retrieval threshold tuning; precise multi-turn conversation behavior control |
186-
| Web Search | ✅ DuckDuckGo / Bing / Google (extensible) | Pluggable search engine providers; web search toggle per conversation |
187-
| MCP Tools | ✅ uvx / npx launchers, Stdio / HTTP Streamable / SSE | Extend agent capabilities via MCP; stable tool naming with collision protection; VLM auto-description for tool-returned images |
188-
| Suggested Questions | ✅ Knowledge-base-driven question suggestions | Agent surfaces context-aware suggested questions in chat interface; image knowledge auto-generates questions |
189-
| QA Capabilities | ✅ Context-aware, multi-turn dialogue, prompt templates | Complex semantic modeling, instruction control, chain-of-thought Q&A with configurable prompts and context windows |
190-
| Security | ✅ AES-256-GCM at-rest encryption, SSRF protection | API keys encrypted at rest; SSRF-safe HTTP client for remote API calls; sandbox execution for agent skills |
191-
| E2E Testing | ✅ Retrieval + generation visualization and metric evaluation | End-to-end test tools for evaluating recall hit rates, answer coverage, BLEU/ROUGE metrics |
192-
| Deployment Modes | ✅ Local / Docker / Kubernetes (Helm) | Private and offline deployment; fast development mode with hot-reload; Helm chart for Kubernetes |
193-
| User Interfaces | ✅ Web UI + RESTful API | Interactive web interface and standard API; Agent/normal mode switching; tool call process display |
194-
| Task Management | ✅ MQ async tasks, automatic database migration | MQ-based async task state; automatic schema and data migration on version upgrade |
139+
Fully modular pipeline from document parsing, vectorization, and retrieval to LLM inference — every component is swappable and extensible. Supports local / private cloud deployment with full data sovereignty and a zero-barrier Web UI for quick onboarding.
140+
141+
## 🧩 Feature Overview
142+
143+
**🤖 Intelligent Conversation**
144+
145+
| Capability | Details |
146+
|------------|---------|
147+
| Intelligent Reasoning | ReACT progressive multi-step reasoning, autonomously orchestrating knowledge retrieval, MCP tools, and web search; custom agent support |
148+
| Quick Q&A | RAG-based Q&A over knowledge bases for fast and accurate answers |
149+
| Tool Calling | Built-in tools, MCP tools, web search |
150+
| Conversation Strategy | Online Prompt editing, retrieval threshold tuning, multi-turn context awareness |
151+
| Suggested Questions | Auto-generated question suggestions based on knowledge base content |
152+
153+
**📚 Knowledge Management**
154+
155+
| Capability | Details |
156+
|------------|---------|
157+
| Knowledge Base Types | FAQ / Document with folder import, URL import, tag management, and online entry |
158+
| Data Source Import | Auto-sync from Feishu (more data sources coming soon); incremental and full sync |
159+
| Document Formats | PDF / Word / Txt / Markdown / HTML / Images / CSV / Excel / PPT / JSON |
160+
| Retrieval Strategies | BM25 sparse / Dense retrieval / GraphRAG / parent-child chunking / multi-dimensional indexing |
161+
| E2E Testing | Full-pipeline visualization with recall hit rate, BLEU / ROUGE metric evaluation |
162+
163+
**🔌 Integrations & Extensions**
164+
165+
| Capability | Details |
166+
|------------|---------|
167+
| LLMs | OpenAI / DeepSeek / Qwen (Alibaba Cloud) / Zhipu / Hunyuan / Doubao (Volcengine) / Gemini / MiniMax / NVIDIA / Novita AI / SiliconFlow / OpenRouter / Ollama |
168+
| Embeddings | Ollama / BGE / GTE / OpenAI-compatible APIs |
169+
| Vector DBs | PostgreSQL (pgvector) / Elasticsearch / Milvus / Weaviate / Qdrant |
170+
| Object Storage | Local / MinIO / AWS S3 / Volcengine TOS |
171+
| IM Channels | WeCom / Feishu / Slack / Telegram / DingTalk / Mattermost |
172+
| Web Search | DuckDuckGo / Bing / Google / Tavily |
173+
174+
**🛡️ Platform**
175+
176+
| Capability | Details |
177+
|------------|---------|
178+
| Deployment | Local / Docker / Kubernetes (Helm) with private and offline support |
179+
| UI | Web UI / RESTful API / Chrome Extension |
180+
| Task Management | MQ async tasks, automatic database migration on version upgrade |
181+
| Model Management | Centralized config, per-knowledge-base model selection, multi-tenant built-in model sharing |
195182

196183
## 🚀 Getting Started
197184

@@ -225,21 +212,15 @@ cp .env.example .env
225212

226213
#### ③ Start the core services
227214

228-
Check which images need to be started in the `.env` file, then start the WeKnora core services with Docker Compose.
229-
230-
```bash
231-
docker compose up -d
232-
```
233-
234-
#### ③.0 Start Ollama separately (Optional)
215+
#### Start Ollama separately (Optional)
235216

236217
If you configured a local Ollama model in `.env`, start the Ollama service separately:
237218

238219
```bash
239220
ollama serve > /dev/null 2>&1 &
240221
```
241222

242-
#### ③.1 Activate different combinations of features
223+
#### Activate different combinations of features
243224

244225
- Minimum core services
245226
```bash
@@ -423,6 +404,15 @@ test: Add retrieval engine test cases
423404
refactor: Restructure document parsing module
424405
```
425406

407+
## 🔒 Security Notice
408+
409+
**Important:** Starting from v0.1.3, WeKnora includes login authentication functionality to enhance system security. For production deployments, we strongly recommend:
410+
411+
- Deploy WeKnora services in internal/private network environments rather than public internet
412+
- Avoid exposing the service directly to public networks to prevent potential information leakage
413+
- Configure proper firewall rules and access controls for your deployment environment
414+
- Regularly update to the latest version for security patches and improvements
415+
426416
## 👥 Contributors
427417

428418
Thanks to these excellent contributors:

0 commit comments

Comments
 (0)