Thoth Usage Guide

Complete guide to using Thoth Research Assistant for daily research workflows.

Quick Reference
Using the Agent
Document Processing
Paper Discovery
Research Questions
Citation Management
RAG & Semantic Search
Skills System
Settings Management
Best Practices

Quick Reference

Daily Commands

# Start services
thoth start              # or: make dev

# Check status
thoth status            # or: make health

# View logs
thoth logs              # or: make dev-logs

# Stop services
thoth stop              # or: make dev-stop

CLI Subcommands

Command	Purpose
`thoth setup`	Interactive setup wizard
`thoth server start`	Start API server
`thoth mcp start`	Start MCP server
`thoth letta auth`	Manage Letta authentication
`thoth discovery ...`	Paper discovery operations
`thoth pdf ...`	PDF processing
`thoth research ...`	Research operations
`thoth rag ...`	RAG operations
`thoth notes ...`	Note generation
`thoth schema ...`	Schema management
`thoth service ...`	Service management
`thoth system ...`	System utilities
`thoth database ...`	Database operations
`thoth performance ...`	Performance analysis

Using the Agent

Via Obsidian Plugin (Primary Method)

Open chat:
- Click Thoth icon in left sidebar
- Or: Command Palette (Ctrl/Cmd+P) → "Open Thoth Chat"

Start conversation:

You: "Find papers on transformer attention mechanisms"
Agent: [Loads paper-discovery skill, searches sources, returns results]

Multi-session support:
- Click "New Chat" for new session
- Switch between sessions with tabs
- All conversations persist across restarts

Via Letta REST API

# List agents
curl http://localhost:8283/v1/agents

# Send message
curl -X POST http://localhost:8283/v1/agents/{agent_id}/messages \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "text": "Find papers on deep learning"}]}'

Agent Capabilities

Research Orchestrator (thoth_main_orchestrator):

User-facing coordinator
Loads skills dynamically based on task
Delegates complex analysis to Analyst
Memory: persona, human, research_context, loaded_skills, planning, scratchpad

Research Analyst (thoth_research_analyst):

Deep analysis specialist
Literature reviews and synthesis
Paper comparisons and evaluations
Citation network exploration

Common Agent Workflows

Discovery:

You: "Find recent papers on reinforcement learning"
Agent: Loads paper-discovery skill → searches ArXiv, Semantic Scholar →
       returns ranked results with relevance scores

Q&A:

You: "What are the main approaches to attention mechanisms in transformers?"
Agent: Loads knowledge-base-qa skill → searches processed papers →
       answers with citations from your collection

Analysis:

You: "Compare these two papers on attention mechanisms"
Agent: Delegates to research_analyst → loads both papers →
       compares methodology, results, conclusions → provides structured comparison

Document Processing

Automatic Processing (Recommended)

Setup:

# PDF Monitor runs automatically in dev mode
make dev

Usage:

Drop PDF into vault/thoth/papers/pdfs/
Monitor processes it automatically
Note appears in vault/thoth/notes/
Takes 30-60 seconds per paper

What Gets Extracted:

Title, authors, abstract
Full text with sections
Citations (with 6-stage enrichment)
Topic tags (AI-generated)
Metadata (DOI, journal, year)

Manual Processing

# Process single PDF
python -m thoth pdf process paper.pdf

# Process with options
python -m thoth pdf process paper.pdf \
    --output-dir ./notes \
    --generate-tags \
    --build-index

# Batch processing
python -m thoth pdf process ./papers/ --parallel --max-workers 4

Custom Extraction

Edit vault/thoth/_thoth/analysis_schema.json to control what gets extracted:

{
  "presets": {
    "custom": {
      "fields": {
        "title": true,
        "abstract": true,
        "methodology": true,
        "results": true,
        "limitations": true,
        "future_work": true,
        "custom_field": {
          "extract": true,
          "prompt": "Extract the computational complexity analysis"
        }
      }
    }
  }
}

Then use the preset:

# Via settings
python -m thoth schema set-preset custom

# Or in settings.json
{"processing": {"schema_preset": "custom"}}

Paper Discovery

Using Discovery Sources

7 built-in sources:

ArXiv (RSS + API)
Semantic Scholar
NeurIPS
ICML
OpenReview (ICLR, etc.)
ACL Anthology (NLP conferences)
Papers with Code

Via Agent (Easiest)

You: "Find papers on deep learning published in 2024"
Agent: [Loads paper-discovery skill, queries sources, returns results]

Via CLI

# List available sources
python -m thoth discovery list-sources

# Search specific source
python -m thoth discovery search "transformers" --source arxiv --max-results 50

# Search all sources
python -m thoth discovery search "neural networks" --max-results 100

Creating Custom Sources

Automated scraper builder (LLM-powered):

You: "I want to add papers from https://example.com/papers"
Agent: [Loads custom-source-setup skill]
       [Analyzes page structure with LLM + Playwright]
       [Proposes CSS selectors]
       [Tests selectors and shows sample articles]
       [Iteratively refines based on your feedback]
       [Saves confirmed workflow]

How it works:

Playwright loads URL and extracts simplified DOM
LLM analyzes structure and proposes CSS selectors
Selectors tested on live page → sample articles extracted
You review samples and provide feedback
LLM refines selectors based on feedback
Repeat until accurate
Workflow saved for future use

Research Questions

Research questions enable persistent, structured research with automated discovery.

Creating Research Questions

Via Agent:

You: "Create a research question about attention mechanisms in transformers"
Agent: [Loads research-query-management skill]
       [Creates question with discovery settings]
       [Sets up automated discovery]

Via CLI:

# Create question
python -m thoth research create \
    --question "How do attention mechanisms work in transformers?" \
    --sources arxiv semantic_scholar \
    --schedule "0 9 * * *"  # Daily at 9 AM

# List questions
python -m thoth research list

# Run discovery for question
python -m thoth research discover <question_id>

Research Question Features

Automated discovery: Scheduled searches for new papers
Source configuration: Which sources to query
Relevance filtering: Automatic filtering based on your collection
Progress tracking: Track papers found, processed, relevant
Synthesis: Generate literature reviews from findings

Citation Management

Citation Extraction & Enrichment

Automatic (during PDF processing):

Citations extracted from bibliography section
6-stage enrichment chain automatically runs
DOIs, metadata, and citation counts added
~90% enrichment success rate

Manual enrichment:

# Via agent
You: "Enrich citations in paper_xyz"
Agent: [Runs citation enrichment service]

# Via CLI
python -m thoth citations enrich paper.pdf

Citation Resolution Chain

Crossref: DOI lookup, metadata
OpenAlex: Citation counts, authors
ArXiv: ArXiv paper metadata
Fuzzy Matcher: Handle malformed citations
Validator: Confidence scoring
Decision Engine: Best match selection

Citation Formats

Via Agent:

You: "Format citations from paper_xyz in APA style"
Agent: [Uses format_citations tool with APA formatter]

Via MCP Tool (from code/API):

# Format citations
result = mcp_client.call_tool(
    "format_citations",
    {
        "article_id": "abc123",
        "style": "apa"  # or: bibtex, mla, chicago
    }
)

# Export bibliography
result = mcp_client.call_tool(
    "export_bibliography",
    {
        "article_ids": ["abc123", "def456"],
        "style": "bibtex",
        "output_file": "references.bib"
    }
)

RAG & Semantic Search

Building the Index

# Build index from all processed papers
python -m thoth rag build

# Rebuild index (if papers changed)
python -m thoth rag rebuild

# Add specific paper
python -m thoth rag add paper.pdf

Searching

Via Agent (recommended):

You: "What papers discuss attention mechanisms?"
Agent: [Loads knowledge-base-qa skill]
       [Searches vector index]
       [Returns relevant papers with citations]

Via CLI:

# Semantic search
python -m thoth rag search "attention mechanisms in transformers"

# With filters
python -m thoth rag search "neural networks" \
    --top-k 10 \
    --min-score 0.7 \
    --year 2024

Agentic Retrieval

For complex research questions that span multiple papers or need deeper reasoning, Thoth has an agentic retrieval mode. It runs a multi-step pipeline that expands your query, grades documents for relevance, and verifies the answer is grounded in actual sources.

You don't need to think about which mode to use—the agent picks the right tool based on your question. Simple lookups go through standard RAG. Multi-hop synthesis questions go through the agentic pipeline.

Enable it in settings (disabled by default since it uses more LLM calls):

You: "Enable agentic retrieval"
Agent: [Updates agenticRetrieval.enabled to true in settings]

Or edit settings.json directly:

{
  "rag": {
    "agenticRetrieval": {
      "enabled": true
    }
  }
}

What it looks like in practice:

You: "How has the understanding of scaling laws in LLMs evolved over the past two years?"

Agent: [Uses agentic_research_question tool]
       UI shows: "Analyzing your question..."
       UI shows: "Expanding search terms..."
       UI shows: "Searching your knowledge base..."
       UI shows: "Evaluating relevance..."
       UI shows: "Ranking best results..."
       UI shows: "Composing answer..."
       UI shows: "Verifying accuracy..."

Agent: "Based on 8 papers in your collection, the understanding of scaling laws has
        shifted in several ways: [detailed synthesis with citations]"

The Obsidian UI shows what step the pipeline is on in real time, so you're not staring at a blank screen wondering if something broke.

When to use it:

Comparison questions ("how does X compare to Y?")
Synthesis across many papers ("what are the main approaches to Z?")
Questions where you know the answer requires multiple sources
Anything where a quick single-pass search gives shallow results

When not to bother:

"What dataset did paper X use?" — standard RAG handles this fine
Quick factual lookups with obvious keywords
Questions about a single specific paper

See RAG System for the full technical details.

Custom Indexes

Create domain-specific search indexes:

You: "Create a custom index for reinforcement learning papers"
Agent: [Loads rag-administration skill]
       [Uses create_custom_index tool]
       [Filters papers by topic]
       [Builds specialized index]

Skills System

Discovering Skills

Via Agent:

You: "What skills do you have?"
Agent: [Calls list_skills tool, shows available skills with descriptions]

Via MCP:

# List all skills
curl -X POST http://localhost:8082/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "list_skills"}, "id": 1}'

Loading Skills

Automatic (agent loads as needed):

You: "Find papers on deep learning"
Agent: "Loading paper-discovery skill..."
       [Skill attaches required tools dynamically]
       [Agent uses tools to search]

Manual:

You: "Load the deep-research skill"
Agent: [Loads skill, attaches tools, shows capabilities]

Creating Custom Skills

Create skill directory: vault/thoth/_thoth/skills/my-skill/

Create SKILL.md:

---
name: My Custom Skill
description: What this skill does
tools:
  - tool_name_1
  - tool_name_2
---

# Skill guidance

When this skill is loaded, follow these steps:
1. First do X
2. Then do Y
3. Finally do Z

Agent auto-discovers skill on next restart
Load with: load_skill(skill_ids=["my-skill"])

Hot-reload: Edit skill files and they reload automatically (no restart needed)

Settings Management

Via Agent (Easiest)

You: "Change the default model to Claude 3.5 Sonnet"
Agent: [Loads settings-management skill]
       [Updates settings.json]
       [Confirms change applied]

You: "Show current LLM configuration"
Agent: [Displays current LLM settings]

Via Settings File

Edit vault/thoth/_thoth/settings.json:

{
  "llm_config": {
    "default": {
      "model": "openrouter/anthropic/claude-3.5-sonnet",
      "temperature": 0.7,
      "max_tokens": 4096
    }
  }
}

Changes apply in ~2 seconds (dev mode with hot-reload)

Via MCP Tools

# View settings
view_settings()

# Update settings
update_settings({"llm_config.default.temperature": 0.5})

# Validate settings
validate_settings()

# Reset to defaults
reset_settings()

Best Practices

Organizing Your Research

Vault Structure:

vault/
├── thoth/
│   ├── _thoth/                       # Thoth config & workspace
│   │   ├── settings.json             # Main configuration (hot-reloadable)
│   │   ├── analysis_schema.json      # Customizable analysis schema
│   │   ├── mcps.json                 # External MCP server config
│   │   ├── templates/                # Note & schema templates
│   │   │   ├── obsidian_note.md      # Note rendering template
│   │   │   └── analysis_schema.json  # Default schema seed
│   │   ├── prompts/                  # LLM prompt templates
│   │   │   └── google/               # Provider-specific prompts (*.j2)
│   │   ├── skills/                   # User-created skills
│   │   ├── data/                     # Runtime data
│   │   │   ├── output/              # Processing output & tracking
│   │   │   ├── knowledge/           # Knowledge base
│   │   │   ├── queries/             # Research queries
│   │   │   └── agent/               # Agent storage
│   │   ├── logs/                     # Application logs
│   │   └── cache/                    # Temporary cache files
│   ├── papers/
│   │   ├── pdfs/                     # Drop PDFs here
│   │   └── markdown/                 # Converted markdown
│   └── notes/                        # Generated notes appear here
├── Research/                         # Your research (manual)
│   ├── Projects/
│   │   ├── Project A/
│   │   └── Project B/
│   └── Literature Reviews/
└── Papers/                           # Link to generated notes

Research Workflow Tips

Start with discovery: Use agent to find papers first
Let auto-processing work: Drop PDFs in folder, wait for notes
Ask questions: Use knowledge-base-qa skill for Q&A
Track progress: Use research questions for ongoing projects
Build knowledge: Citation networks auto-build as you process papers

Agent Interaction

Be specific:

"Find some papers"
"Find papers on transformer attention mechanisms published in 2024"

Use skills explicitly when needed:

You: "Load the deep-research skill and analyze the paper on attention mechanisms"

Use memory:

You: "Remember that I'm interested in computational efficiency"
Agent: [Updates human memory block]

[Later]
You: "Find papers on transformers"
Agent: [Remembers your interest, prioritizes efficiency-focused papers]

Performance Optimization

Batch operations:

# Process multiple PDFs at once
python -m thoth pdf process ./papers/ --parallel

Scheduled discovery (runs during off-hours):

{
  "discovery": {
    "auto_start_scheduler": true,
    "schedules": [
      {
        "cron": "0 2 * * *",  // 2 AM daily
        "query": "machine learning",
        "max_articles": 50
      }
    ]
  }
}

Cache management:

# Clear cache if memory usage high
rm -rf vault/thoth/_thoth/cache/*

# Rebuild indexes
python -m thoth rag rebuild

Advanced Usage

Custom Prompts

Override default prompts by creating files in vault/thoth/_thoth/prompts/:

_thoth/prompts/
├── custom_analysis.j2        # Custom analysis prompt
├── custom_summary.j2          # Custom summary prompt
└── custom_citation.j2         # Custom citation extraction

Reference in settings.json:

{
  "processing": {
    "custom_prompts": {
      "analysis": "_thoth/prompts/custom_analysis.j2"
    }
  }
}

Direct MCP Tool Access

For programmatic access:

import httpx

# Call MCP tool
response = httpx.post(
    "http://localhost:8082/mcp",
    json={
        "jsonrpc": "2.0",
        "method": "tools/call",
        "params": {
            "name": "search_articles",
            "arguments": {
                "query": "transformer attention",
                "max_results": 10
            }
        },
        "id": 1
    }
)

result = response.json()["result"]

Multi-User Setup

For teams:

Shared Letta instance: One Letta server, multiple Thoth instances
Separate vaults: Each user has their own Obsidian vault
Shared database: Optional shared PostgreSQL for team papers
Access control: Configure per-user API keys

Troubleshooting

Agent Not Responding

# Check Letta is running
curl http://localhost:8283/v1/health

# Check agents exist
curl http://localhost:8283/v1/agents

# View logs
docker logs letta-server
tail -f vault/thoth/_thoth/logs/thoth.log

PDFs Not Processing

# Check PDF Monitor logs
docker logs thoth-dev-pdf-monitor  # dev mode
docker logs thoth-all-in-one       # prod mode

# Check file permissions
ls -la vault/thoth/papers/pdfs/

# Manual processing
python -m thoth pdf process paper.pdf --verbose

Skills Not Loading

# List available skills
curl -X POST http://localhost:8082/mcp \
  -d '{"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "list_skills"}, "id": 1}'

# Check skill directories exist
ls src/thoth/.skills/
ls vault/thoth/_thoth/skills/

Discovery Not Finding Papers

Check API keys: Verify Semantic Scholar key is set
Check sources: List available sources
Adjust query: Be more specific
Check logs: Look for API errors

Next Steps

Quick Reference: Command cheat sheet
Architecture: Understand system design
MCP Architecture: Learn about tools
Letta Architecture: Learn about agents

Last Updated: February 2026

FilesExpand file tree

usage.md

Latest commit

History

usage.md

File metadata and controls

Thoth Usage Guide

Table of Contents

Quick Reference

Daily Commands

CLI Subcommands

Using the Agent

Via Obsidian Plugin (Primary Method)

Via Letta REST API

Agent Capabilities

Common Agent Workflows

Document Processing

Automatic Processing (Recommended)

Manual Processing

Custom Extraction

Paper Discovery

Using Discovery Sources

Via Agent (Easiest)

Via CLI

Creating Custom Sources

Research Questions

Creating Research Questions

Research Question Features

Citation Management

Citation Extraction & Enrichment

Citation Resolution Chain

Citation Formats

RAG & Semantic Search

Building the Index

Searching

Agentic Retrieval

Custom Indexes

Skills System

Discovering Skills

Loading Skills

Creating Custom Skills

Settings Management

Via Agent (Easiest)

Via Settings File

Via MCP Tools

Best Practices

Organizing Your Research

Research Workflow Tips

Agent Interaction

Performance Optimization

Advanced Usage

Custom Prompts

Direct MCP Tool Access

Multi-User Setup

Troubleshooting

Agent Not Responding

PDFs Not Processing

Skills Not Loading

Discovery Not Finding Papers

Next Steps