Problem
The MCP server is the backbone of spellbook's runtime functionality. There are no benchmarks measuring tool response times, startup latency, or throughput under load.
Proposed Benchmarks
Startup Time
- Time from
spellbook start to first tool availability
- Measure with different numbers of skills loaded
Tool Response Time
- Simple tools (e.g.,
spellbook_health_check): target <50ms
- Complex tools (e.g.,
skill_instructions_get with large skills): target <200ms
- Tools with file I/O (e.g.,
workflow_state_save): target <500ms
Throughput
- Concurrent tool calls
- Memory usage under sustained operation
Regression Detection
- Store benchmark results in CI
- Alert on >20% regression
Implementation
Use pytest-benchmark or a custom benchmark harness. Results can be visualized in the docs site (similar to how Ruff and uv display benchmark charts).
Problem
The MCP server is the backbone of spellbook's runtime functionality. There are no benchmarks measuring tool response times, startup latency, or throughput under load.
Proposed Benchmarks
Startup Time
spellbook startto first tool availabilityTool Response Time
spellbook_health_check): target <50msskill_instructions_getwith large skills): target <200msworkflow_state_save): target <500msThroughput
Regression Detection
Implementation
Use
pytest-benchmarkor a custom benchmark harness. Results can be visualized in the docs site (similar to how Ruff and uv display benchmark charts).