This project contains an autonomous SEO agent built with Python using Claude Agent Overview
- ** SDK.
- AI Backend: Claude Agent SDK (uses Claude Code CLI via SDK)
- Purpose: Autonomous SEO tasks (audit, content strategy, copywriting, etc.)
agent/seo_agent.py- Main SEOAgent class using Claude Agent SDK; returnsAgentExecutionResultagent/config.py- Configuration dataclass (AgentConfig); exportsPROJECT_ROOT; supportsSEO_AGENT_CWDenv var andmax_thinking_tokensagent/memory_service.py- Layered memory composition: buildsShortTermContext,EpisodicContext,SemanticContext,ProceduralContextinto aComposedPromptContextfor prompt injectionagent/runtime_profiles.py-ExecutionProfileregistry; maps each execution type to tool policy, budget, timeout, turn limit, validator, semantic char limits, andrequires_approvalflagagent/orchestrator.py- Multi-agent campaign dispatch loop; DAG tier resolution (_resolve_execution_tiers), structured inter-agent handoffs (_extract_summary_block), retry with backoff (_run_with_retry), parallel execution viaasyncio.gathermain.py- CLI entry point; usesPROJECT_ROOTfor portable working directoryagent/api/main.py- FastAPI Kanban server; includesAgentRunModel,RunEventModel,TaskSessionModel,OrchestrationStateModelDB tables for full run trackingskills/- SEO skills (.skill files are ZIP archives containing SKILL.md)memory/- Session memory (CLAUDE.md, seo-strategy.md, seo-context.md, seo-tasks.md)tests/- Test suite with pytest
Trigger: create a Kanban task with execution_type = "orchestrate_seo_campaign" and click Execute.
Flow:
- Orchestrator agent reads
memory/files and outputs a{"campaign_goal": ..., "phases": [...]}JSON plan - Python (
agent/orchestrator.py) parses the plan and resolves execution tiers via Kahn's topological sort - Child
TaskModelrows are created (one per phase, withparent_task_idset) - Tiers execute sequentially; phases within a tier run concurrently via
asyncio.gather - Approval gate: before dispatching any phase whose
ExecutionProfile.requires_approval=True, the orchestrator checksparent_task.approved_at. If unset, setsstate.status='awaiting_approval'and halts. Campaign resumes whenapproved_atis set viaPATCH /tasks/{id}. - Each phase agent receives prior outputs via structured
## Summary for Next Phaseblocks (falls back to 1500-char truncation) OrchestrationStateModelpersists state: plan JSON, current phase, all phase outputs, child run IDs, and final status (running | awaiting_approval | completed | error)- All child
AgentRunModelrows carryparent_run_idpointing to the orchestrator run
Phase plan schema (what the orchestrator agent must output):
{
"campaign_goal": "string",
"phases": [
{
"phase": "researcher",
"task_title": "Research: <specific angle>",
"task_description": "...",
"execution_type": "campaign_researcher",
"depends_on": []
},
{ "phase": "draft_writer", "execution_type": "campaign_draft_writer", "depends_on": ["researcher"], ... },
{ "phase": "publisher", "execution_type": "campaign_publisher", "depends_on": ["draft_writer"], ... }
]
}Inter-agent handoff protocol: Each child agent must end its output with:
## Summary for Next Phase
<concise handoff notes for the next agent>
## End Summary
Retry policy: transient errors (timeouts, 503s, rate limits) are retried up to 2 times with exponential backoff with an optional max_total_seconds wall-clock cap. Non-retryable: budget exceeded, malformed plan, circular dependency.
Approval gate: campaign_publisher has requires_approval=True. The orchestrator halts before that tier and sets state.status='awaiting_approval'. Set approved_at on the parent task via the Kanban UI (PATCH) to resume.
Grounding requirement: research and campaign_researcher profiles carry the grounding-required procedural tag. The injected system prompt requires every factual claim to cite a source URL; the validator also checks for at least one https:// URL in the output.
Audit log: every tool call made during a run is written to RunEventModel via a PostToolUse SDK hook wired in _build_runtime_config.
Scalability boundaries (annotated in source with # Scalability note):
#1— moverun_campaign_orchestrationcall to a task queue (Celery/arq) for production; return 202 + polling#2— swap SQLite for Postgres viaDATABASE_URL; schema is unchanged#6— child agent tool scopes are enforced inruntime_profiles.py; never expand viaEXECUTABLE_TYPES#8—_atomic_json_writeusesos.replace()(POSIX-atomic); addfcntl.flock()for multi-worker
- Python: 3.11+
- SDK: claude-agent-sdk>=0.1.44
- HTTP Client: aiohttp>=3.9.0 (for Webflow API)
- Web Server: FastAPI, Uvicorn (for Kanban UI)
- Database: SQLite with SQLAlchemy ORM
- Testing: pytest>=9.0.0, pytest-asyncio>=1.3.0
The agent uses file-based memory for persistent context:
memory/CLAUDE.md- Site overview, target keywords, content gaps, what NOT to do. Loaded at session start.memory/seo-strategy.md- Detailed strategy that evolves over time.memory/seo-context.md- Current sprint state: active tickets, completed work, pending actions.memory/seo-tasks.md- Generated task lists from audits with priorities and subtasks.
Session workflow:
- Agent reads
memory/CLAUDE.md,memory/seo-strategy.md, andmemory/seo-context.mdfor SEO context memory_servicecomposes a layered prompt context (short-term, episodic, semantic, procedural)runtime_profilesselects theExecutionProfilefor the execution type (tools, budget, turns, validator)- Agent executes the task via SDK; result stored in
AgentRunModel - Next session can resume via session ID and draws from
EpisodicContext(prior run summaries)
Orchestration workflow (when execution_type = "orchestrate_seo_campaign"):
- Orchestrator agent produces JSON plan → stored in
OrchestrationStateModel.plan_json _resolve_execution_tiersbuilds DAG tiers fromdepends_onfields- Each tier dispatches concurrently; outputs accumulate in
OrchestrationStateModel.phase_outputs_json - Child runs record
parent_run_id; child tasks recordparent_task_id
Per the documentation-guide.md, always maintain:
- CLAUDE.md (this file) - Architecture decisions, coding conventions, tools/stack used
- README.md - Project overview, setup instructions, how to run locally
- CHANGELOG.md - What changed and when (create if significant changes)
- Inline comments - For non-obvious logic, especially complex queries
- Docstrings - For all functions (parameters, return types, purpose)
- .env.example - If environment variables are added
- DECISIONS.md - Record important technical decisions and rationale
- All public methods should have docstrings with parameters and return types
- Use type hints where beneficial
- Async methods for SDK interactions
- Proper cleanup with context managers or disconnect()
The agent has access to these SEO skills (.skill files are ZIP archives containing SKILL.md + references):
- SEO Audit - Comprehensive website SEO analysis (automatically triggers Task Breakdown)
- Content Strategy - Content planning and optimization
- Copywriting - Writing SEO-optimized content
- Copy Editing - Editing existing content (includes plain-english-alternatives reference)
- Brand Voice - Maintaining consistent brand tone
- Competitor Alternatives - Finding competitor weaknesses
- Programmatic SEO - Automated SEO at scale (includes playbooks reference)
- Schema Markup - Adding structured data (includes JSON-LD examples reference)
- Analytics Tracking - Setting up tracking (includes GA4, GTM, event library references)
- Task Breakdown - Break audit findings into actionable tasks (one-output-per-task)
- Page CRO - Conversion rate optimization for marketing pages (includes experiments reference)
- Marketing Psychology - Apply psychological principles and behavioral science to marketing
- Webflow CMS - Manage Webflow CMS content (create, edit, publish posts)
- Google Docs - Create and manage Google Docs for reports and content
- SEO Feedback Loop - Track impact of implemented SEO changes, diagnose regressions, extract learnings, propagate winning patterns (includes sample log, sample learnings, sample review report, and templates). Use
execution_type: seo_impact_reviewin Kanban to trigger a batch review of all pending changes.
# Run a task
python3.11 main.py "Perform SEO audit on example.com"
# Interactive mode
python3.11 main.py
# Run tests
python -m pytest tests/test_seo_agent.py -vEdit agent/config.py to customize:
model: Claude model (default, sonnet, opus, haiku)permission_mode: Permission mode (acceptEdits, etc.)allowed_tools: Tools the agent can use (Read, Write, Edit, Glob, Grep, Skill, WebSearch, WebFetch — Bash excluded by default)hooks: Optional SDK hook dict (e.g.PostToolUseaudit log); set by_build_runtime_configautomaticallysetting_sources: Sources for settings (user, project)
The agent has these tools available by default (Bash intentionally excluded — runtime profiles stamp the exact allowed list):
- Read, Write, Edit, Glob, Grep - File operations
- Skill - Execute SEO skills
- WebSearch, WebFetch - Web browsing
Automatically available when environment variables are set. When WEBFLOW_ACCESS_TOKEN, WEBFLOW_SITE_ID, and WEBFLOW_COLLECTION_ID are configured, these tools are automatically added to allowed_tools:
mcp__webflow__list_cms_items- List items in collection (supports limit/offset pagination)mcp__webflow__get_cms_item- Get single item by IDmcp__webflow__create_cms_item- Create new post (name, slug, content)mcp__webflow__update_cms_item- Update existing postmcp__webflow__publish_cms_item- Publish to live sitemcp__webflow__get_collection_info- Get collection schema
The agent can use these tools to manage Webflow CMS content when Webflow is configured.
The agent can manage Webflow CMS collections (create, edit, publish posts). Uses Webflow Data API v2.
WEBFLOW_ACCESS_TOKEN=your_api_token
WEBFLOW_SITE_ID=your_site_id
WEBFLOW_COLLECTION_ID=your_collection_idfrom agent import AgentConfig, WebflowConfig
config = AgentConfig(
webflow_config=WebflowConfig(
access_token="your_token",
site_id="your_site_id",
collection_id="your_collection_id"
)
)- Base URL:
https://api.webflow.com/v2 - Pagination: Uses limit/offset (max 100 items per request)
- Live Items: Uses
/items/liveendpoint to fetch published items
agent/webflow/
├── __init__.py # Exports
├── config.py # WebflowConfig dataclass
├── client.py # WebflowAPIClient (raw API)
├── tools.py # @tool decorated functions
└── server.py # MCP server factory
To get more than 100 items, loop with incremental offsets:
offset = 0
while True:
result = await client.list_items(limit=100, offset=offset)
items = result.get('items', [])
if not items:
break
# process items
offset += 100The agent can create and manage Google Docs for SEO audit reports, blog content, and other documents.
GOOGLE_DOCS_CREDENTIALS_PATH=google-sa-credentials/service-account.json
# OR use the standard Google application credentials path:
GOOGLE_APPLICATION_CREDENTIALS=google-sa-credentials/service-account.jsonfrom agent import AgentConfig, GoogleDocsConfig
config = AgentConfig(
google_docs_config=GoogleDocsConfig(
credentials_path="google-sa-credentials/service-account.json"
)
)Automatically available when credentials path is set. The following tools are added to allowed_tools:
mcp__google_docs__create_google_doc- Create a new Google Docmcp__google_docs__get_google_doc- Get document by IDmcp__google_docs__append_to_google_doc- Append content to documentmcp__google_docs__update_google_doc_title- Update document title
IMPORTANT: By design, Google Docs cannot be deleted through this integration. The agent can only:
- ✅ Create new documents
- ✅ Read documents
- ✅ Append content to documents
- ✅ Update document titles
- ❌ Delete documents (intentionally disabled)
This ensures audit reports and blog content are preserved and cannot be accidentally removed.
agent/google_docs/
├── __init__.py # Exports
├── config.py # GoogleDocsConfig dataclass
├── client.py # GoogleDocsAPIClient
├── tools.py # @tool decorated functions
└── server.py # MCP server factory
The agent can query live Search Console data: clicks, impressions, CTR, position, URL indexing status, and sitemaps. Read-only — no write operations are exposed.
Uses the same Google Service Account as Google Docs. Grant the SA access to your GSC property in Search Console → Settings → Users and permissions.
GSC_SITE_URL=sc-domain:example.com # domain property
# OR
GSC_SITE_URL=https://www.example.com/ # URL-prefix property
# Credentials (falls back to GOOGLE_DOCS_CREDENTIALS_PATH / GOOGLE_APPLICATION_CREDENTIALS):
GSC_CREDENTIALS_PATH=google-sa-credentials/service-account.jsonfrom agent import AgentConfig, GscConfig
config = AgentConfig(
gsc_config=GscConfig(
site_url="sc-domain:example.com",
credentials_path="google-sa-credentials/service-account.json"
)
)Automatically available when GSC_SITE_URL is set. Added to allowed_tools:
mcp__gsc__gsc_query_search_analytics- Query clicks/impressions/CTR/position by query, page, device, or datemcp__gsc__gsc_inspect_url- Get indexing status for a specific URLmcp__gsc__gsc_list_sitemaps- List sitemaps submitted to GSC
| Execution type | GSC tools | Reason |
|---|---|---|
seo_impact_review |
✅ | Compares ranking before/after CMS changes |
research |
✅ | Pulls live query data during keyword and competitor research |
| all others | ❌ | Publish/rewrite profiles don't need ranking data at write time |
agent/gsc/
├── __init__.py # Exports
├── config.py # GscConfig dataclass
├── client.py # GscAPIClient (read-only)
├── tools.py # @tool decorated functions
└── server.py # MCP server factory
The project includes a visual Kanban board for task management.
# Start the Kanban server
uvicorn agent.api.main:app --reload --port 8000Then open http://localhost:8000/kanban
GET /kanban- Serve Kanban HTML UIGET /health- Health checkGET /tasks- List all tasks with countsPOST /tasks- Create new taskGET /tasks/{id}- Get task by IDPATCH /tasks/{id}- Update taskDELETE /tasks/{id}- Delete taskPOST /tasks/{id}/execute- Execute task via SEOAgentGET /tasks/{id}/comments- Get task commentsPOST /tasks/{id}/comments- Add commentPOST /automation/comments/process-one- Process one eligible@agentcomment actionPOST /runs/{run_id}/seo-audit- Run SEO audit
pending- Tasks waiting to be worked onin_progress- Tasks currently being executedcompleted- Tasks finished successfullyblocked- Tasks that encountered errors
Uses SQLite with SQLAlchemy ORM. The database is created automatically on first run.
Environment-based defaults for Kanban API:
APP_ENV=production(default) ->sqlite:///./kanban.dbAPP_ENV=staging->sqlite:///./kanban.staging.dbDATABASE_URL(if set) overridesAPP_ENV
Comment revision automation:
comment_actionstable tracks comment-triggered execution attempts and status.- Trigger format: user comment body must begin with
@agent. - Background worker runs only while server is running.
- Autopilot skips
@agentcomments if the task was already executed after the comment was posted (task.updated_at > comment.created_at).
User comments in task execution:
- When a task is executed via the Execute button, all user comments on that task are included in the agent prompt under a
## User Notessection. - This means notes like "keep the tone casual" or "focus on mobile users" are automatically factored in — no
@agentprefix needed.
Comment automation environment variables:
COMMENT_AUTOPILOT_ENABLED(defaulttrue)COMMENT_AUTOPILOT_INTERVAL_SECONDS(default300— 5 minutes)AGENT_EXECUTION_TIMEOUT_SECONDS(default900)
Testing isolation:
tests/conftest.pyforces API tests to use in-memory SQLite withStaticPoolso tests never write into production/staging DB files.
agent/api/
├── __init__.py # Module init
└── main.py # FastAPI app with all endpoints and embedded Kanban HTML
Run the test suite:
# All tests
python -m pytest tests/test_seo_agent.py -v
# Specific test class
python -m pytest tests/test_seo_agent.py::TestMemorySystem -v
# Integration tests only
python -m pytest tests/test_seo_agent.py -m integration -v- Never push, merge or commit to Github without my express approval
- Always update docs per @documentation-guide.md
- Follow red/green TDD