Co-Writer Module

The Co-Writer module provides AI-powered text editing and narration capabilities, including text rewriting, automatic annotation, and text-to-speech (TTS) generation.

📋 Overview

The Co-Writer module consists of two main agents:

EditAgent: AI-powered text editing (rewrite, shorten, expand) with optional RAG/web search context
NarratorAgent: Converts text content into narration scripts and generates TTS audio

🏗️ Architecture

co_writer/
├── __init__.py
├── edit_agent.py          # Text editing agent
├── narrator_agent.py      # TTS narration agent
├── prompts/               # Bilingual prompts (YAML)
│   ├── zh/               # Chinese prompts
│   │   ├── edit_agent.yaml
│   │   └── narrator_agent.yaml
│   └── en/               # English prompts
│       ├── edit_agent.yaml
│       └── narrator_agent.yaml
└── README.md

🔧 Components

EditAgent

Purpose: AI-powered text editing with context enhancement

Features:

Rewrite: Rewrite text based on instructions
Shorten: Compress text while preserving key information
Expand: Expand text with additional details
Context Enhancement: Optional RAG or web search for additional context

Methods:

async def process(
    text: str,
    instruction: str,
    action: Literal["rewrite", "shorten", "expand"] = "rewrite",
    source: Optional[Literal["rag", "web"]] = None,
    kb_name: Optional[str] = None
) -> Dict[str, Any]

Returns:

{
    "edited_text": str,        # Edited text content
    "operation_id": str,       # Unique operation ID
    "tool_call_file": str      # Path to tool call history (if source used)
}

Usage Example:

from src.agents.co_writer.edit_agent import EditAgent

agent = EditAgent()

# Rewrite with RAG context
result = await agent.process(
    text="Original text...",
    instruction="Make it more formal",
    action="rewrite",
    source="rag",
    kb_name="ai_textbook"
)

print(result["edited_text"])

NarratorAgent

Purpose: Convert text content into narration scripts and generate TTS audio

Features:

Script Generation: Converts text into natural narration scripts
TTS Generation: Generates audio files using DashScope TTS API
Voice Selection: Supports multiple voices (Cherry, Stella, Annie, Cally, Eva, Bella)
Language Support: Supports Chinese and English

Methods:

async def generate_narration(
    content: str,
    voice: Optional[str] = None,
    language: Optional[str] = None
) -> Dict[str, Any]

Returns:

{
    "audio_url": str,          # URL to generated audio file
    "audio_path": str,          # Local path to audio file
    "script": str,              # Generated narration script
    "operation_id": str        # Unique operation ID
}

Usage Example:

from src.agents.co_writer.narrator_agent import NarratorAgent

agent = NarratorAgent()

result = await agent.generate_narration(
    content="Your text content here...",
    voice="Cherry",
    language="English"
)

print(f"Audio URL: {result['audio_url']}")

📁 Data Storage

All Co-Writer outputs are stored in data/user/co-writer/:

data/user/co-writer/
├── audio/                    # TTS audio files
│   └── {operation_id}.mp3
├── tool_calls/               # Tool call history
│   └── {operation_id}_{tool_type}.json
└── history.json              # Edit history

⚙️ Configuration

TTS Configuration

TTS settings are configured in config/main.yaml:

tts:
  default_voice: "Cherry"      # Default voice
  default_language: "English"   # Default language

Environment Variables

Required for TTS (in .env or DeepTutor.env):

# DashScope TTS API (for NarratorAgent)
DASHSCOPE_API_KEY=your_dashscope_api_key
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/api/v1
DASHSCOPE_TTS_MODEL=sambert-zhichu-v1  # or other TTS model

LLM Configuration

Required for EditAgent (same as other modules):

LLM_BINDING_API_KEY=your_api_key
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_MODEL=gpt-4o

🔌 API Integration

The Co-Writer module is exposed via FastAPI routes in src/api/routers/co_writer.py:

Endpoints

POST /api/v1/co_writer/edit - Text editing
POST /api/v1/co_writer/automark - Automatic annotation
POST /api/v1/co_writer/narrate - Generate narration and TTS

Request Format

Edit Request:

{
  "text": "Original text...",
  "instruction": "Make it more formal",
  "action": "rewrite",
  "source": "rag",
  "kb_name": "ai_textbook"
}

Narrate Request:

{
  "content": "Text to narrate...",
  "voice": "Cherry",
  "language": "English"
}

🎯 Use Cases

1. Text Rewriting

Rewrite text with specific instructions:

result = await edit_agent.process(
    text="The quick brown fox jumps over the lazy dog.",
    instruction="Make it more academic and formal",
    action="rewrite"
)

2. Text Compression

Shorten text while preserving key information:

result = await edit_agent.process(
    text="Long text content...",
    instruction="Summarize to 50 words",
    action="shorten"
)

3. Text Expansion

Expand text with additional details:

result = await edit_agent.process(
    text="Brief description...",
    instruction="Add more technical details",
    action="expand",
    source="rag",
    kb_name="ai_textbook"
)

4. Audio Narration

Convert text to audio:

result = await narrator_agent.generate_narration(
    content="Your educational content...",
    voice="Cherry",
    language="English"
)

📊 Statistics Tracking

Both agents track LLM usage statistics:

from src.agents.co_writer.edit_agent import get_stats, print_stats

# Print statistics
print_stats()

🔗 Related Modules

API Routes: src/api/routers/co_writer.py - REST API endpoints
RAG Tool: src/tools/rag_tool.py - Knowledge base retrieval
Web Search: src/tools/web_search.py - Web search for context
Core Config: src/core/core.py - Configuration management

🛠️ Development

Adding New Actions

To add a new editing action:

Add the action type to Literal type hint in edit_agent.py
Add the corresponding prompts to YAML files in prompts/en/ and prompts/zh/
Test with various inputs

Prompts Configuration

Prompts are stored in YAML files under prompts/ directory with bilingual support:

# prompts/en/edit_agent.yaml
system: |
  You are an expert editor and writing assistant.

auto_mark_system: |
  You are a professional academic reading annotation assistant...

The language parameter (default: "en") determines which prompts directory to use.

Adding New Voices

To add support for new TTS voices:

Check DashScope TTS API documentation for available voices
Update voice validation in narrator_agent.py
Update config/main.yaml if needed

⚠️ Notes

TTS API Key: NarratorAgent requires DashScope API key (different from LLM API key)
Audio Storage: Audio files are stored in data/user/co-writer/audio/ and served via /api/outputs/
Tool Call History: All RAG/web search calls are logged in tool_calls/ directory
History Management: Edit history is automatically saved to history.json

📝 Example Workflow

from src.agents.co_writer.edit_agent import EditAgent
from src.agents.co_writer.narrator_agent import NarratorAgent

# 1. Edit text with RAG context
edit_agent = EditAgent()
edited = await edit_agent.process(
    text="Original content...",
    instruction="Make it clearer and more detailed",
    action="rewrite",
    source="rag",
    kb_name="ai_textbook"
)

# 2. Generate narration
narrator = NarratorAgent()
audio = await narrator.generate_narration(
    content=edited["edited_text"],
    voice="Cherry",
    language="English"
)

print(f"Edited text: {edited['edited_text']}")
print(f"Audio URL: {audio['audio_url']}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Co-Writer Module

📋 Overview

🏗️ Architecture

🔧 Components

EditAgent

NarratorAgent

📁 Data Storage

⚙️ Configuration

TTS Configuration

Environment Variables

LLM Configuration

🔌 API Integration

Endpoints

Request Format

🎯 Use Cases

1. Text Rewriting

2. Text Compression

3. Text Expansion

4. Audio Narration

📊 Statistics Tracking

🔗 Related Modules

🛠️ Development

Adding New Actions

Prompts Configuration

Adding New Voices

⚠️ Notes

📝 Example Workflow

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Co-Writer Module

📋 Overview

🏗️ Architecture

🔧 Components

EditAgent

NarratorAgent

📁 Data Storage

⚙️ Configuration

TTS Configuration

Environment Variables

LLM Configuration

🔌 API Integration

Endpoints

Request Format

🎯 Use Cases

1. Text Rewriting

2. Text Compression

3. Text Expansion

4. Audio Narration

📊 Statistics Tracking

🔗 Related Modules

🛠️ Development

Adding New Actions

Prompts Configuration

Adding New Voices

⚠️ Notes

📝 Example Workflow