The Co-Writer module provides AI-powered text editing and narration capabilities, including text rewriting, automatic annotation, and text-to-speech (TTS) generation.
The Co-Writer module consists of two main agents:
- EditAgent: AI-powered text editing (rewrite, shorten, expand) with optional RAG/web search context
- NarratorAgent: Converts text content into narration scripts and generates TTS audio
co_writer/
├── __init__.py
├── edit_agent.py # Text editing agent
├── narrator_agent.py # TTS narration agent
├── prompts/ # Bilingual prompts (YAML)
│ ├── zh/ # Chinese prompts
│ │ ├── edit_agent.yaml
│ │ └── narrator_agent.yaml
│ └── en/ # English prompts
│ ├── edit_agent.yaml
│ └── narrator_agent.yaml
└── README.md
Purpose: AI-powered text editing with context enhancement
Features:
- Rewrite: Rewrite text based on instructions
- Shorten: Compress text while preserving key information
- Expand: Expand text with additional details
- Context Enhancement: Optional RAG or web search for additional context
Methods:
async def process(
text: str,
instruction: str,
action: Literal["rewrite", "shorten", "expand"] = "rewrite",
source: Optional[Literal["rag", "web"]] = None,
kb_name: Optional[str] = None
) -> Dict[str, Any]Returns:
{
"edited_text": str, # Edited text content
"operation_id": str, # Unique operation ID
"tool_call_file": str # Path to tool call history (if source used)
}Usage Example:
from src.agents.co_writer.edit_agent import EditAgent
agent = EditAgent()
# Rewrite with RAG context
result = await agent.process(
text="Original text...",
instruction="Make it more formal",
action="rewrite",
source="rag",
kb_name="ai_textbook"
)
print(result["edited_text"])Purpose: Convert text content into narration scripts and generate TTS audio
Features:
- Script Generation: Converts text into natural narration scripts
- TTS Generation: Generates audio files using DashScope TTS API
- Voice Selection: Supports multiple voices (Cherry, Stella, Annie, Cally, Eva, Bella)
- Language Support: Supports Chinese and English
Methods:
async def generate_narration(
content: str,
voice: Optional[str] = None,
language: Optional[str] = None
) -> Dict[str, Any]Returns:
{
"audio_url": str, # URL to generated audio file
"audio_path": str, # Local path to audio file
"script": str, # Generated narration script
"operation_id": str # Unique operation ID
}Usage Example:
from src.agents.co_writer.narrator_agent import NarratorAgent
agent = NarratorAgent()
result = await agent.generate_narration(
content="Your text content here...",
voice="Cherry",
language="English"
)
print(f"Audio URL: {result['audio_url']}")All Co-Writer outputs are stored in data/user/co-writer/:
data/user/co-writer/
├── audio/ # TTS audio files
│ └── {operation_id}.mp3
├── tool_calls/ # Tool call history
│ └── {operation_id}_{tool_type}.json
└── history.json # Edit history
TTS settings are configured in config/main.yaml:
tts:
default_voice: "Cherry" # Default voice
default_language: "English" # Default languageRequired for TTS (in .env or DeepTutor.env):
# DashScope TTS API (for NarratorAgent)
DASHSCOPE_API_KEY=your_dashscope_api_key
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/api/v1
DASHSCOPE_TTS_MODEL=sambert-zhichu-v1 # or other TTS modelRequired for EditAgent (same as other modules):
LLM_BINDING_API_KEY=your_api_key
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_MODEL=gpt-4oThe Co-Writer module is exposed via FastAPI routes in src/api/routers/co_writer.py:
POST /api/v1/co_writer/edit- Text editingPOST /api/v1/co_writer/automark- Automatic annotationPOST /api/v1/co_writer/narrate- Generate narration and TTS
Edit Request:
{
"text": "Original text...",
"instruction": "Make it more formal",
"action": "rewrite",
"source": "rag",
"kb_name": "ai_textbook"
}Narrate Request:
{
"content": "Text to narrate...",
"voice": "Cherry",
"language": "English"
}Rewrite text with specific instructions:
result = await edit_agent.process(
text="The quick brown fox jumps over the lazy dog.",
instruction="Make it more academic and formal",
action="rewrite"
)Shorten text while preserving key information:
result = await edit_agent.process(
text="Long text content...",
instruction="Summarize to 50 words",
action="shorten"
)Expand text with additional details:
result = await edit_agent.process(
text="Brief description...",
instruction="Add more technical details",
action="expand",
source="rag",
kb_name="ai_textbook"
)Convert text to audio:
result = await narrator_agent.generate_narration(
content="Your educational content...",
voice="Cherry",
language="English"
)Both agents track LLM usage statistics:
from src.agents.co_writer.edit_agent import get_stats, print_stats
# Print statistics
print_stats()- API Routes:
src/api/routers/co_writer.py- REST API endpoints - RAG Tool:
src/tools/rag_tool.py- Knowledge base retrieval - Web Search:
src/tools/web_search.py- Web search for context - Core Config:
src/core/core.py- Configuration management
To add a new editing action:
- Add the action type to
Literaltype hint inedit_agent.py - Add the corresponding prompts to YAML files in
prompts/en/andprompts/zh/ - Test with various inputs
Prompts are stored in YAML files under prompts/ directory with bilingual support:
# prompts/en/edit_agent.yaml
system: |
You are an expert editor and writing assistant.
auto_mark_system: |
You are a professional academic reading annotation assistant...The language parameter (default: "en") determines which prompts directory to use.
To add support for new TTS voices:
- Check DashScope TTS API documentation for available voices
- Update voice validation in
narrator_agent.py - Update
config/main.yamlif needed
- TTS API Key: NarratorAgent requires DashScope API key (different from LLM API key)
- Audio Storage: Audio files are stored in
data/user/co-writer/audio/and served via/api/outputs/ - Tool Call History: All RAG/web search calls are logged in
tool_calls/directory - History Management: Edit history is automatically saved to
history.json
from src.agents.co_writer.edit_agent import EditAgent
from src.agents.co_writer.narrator_agent import NarratorAgent
# 1. Edit text with RAG context
edit_agent = EditAgent()
edited = await edit_agent.process(
text="Original content...",
instruction="Make it clearer and more detailed",
action="rewrite",
source="rag",
kb_name="ai_textbook"
)
# 2. Generate narration
narrator = NarratorAgent()
audio = await narrator.generate_narration(
content=edited["edited_text"],
voice="Cherry",
language="English"
)
print(f"Edited text: {edited['edited_text']}")
print(f"Audio URL: {audio['audio_url']}")