This document provides a comprehensive technical reference for the plotnine MCP server's current implementation. It covers architecture, design decisions, code organization, and implementation details for future developers.
Version: 0.2.0 Last Updated: 2025-11-06
- Architecture Overview
- Module Structure
- Data Flow
- Implemented Features
- Design Decisions
- Testing Strategy
- Known Limitations
┌─────────────────┐
│ MCP Client │ (Claude Desktop, Cursor, VSCode)
│ (AI Assistant) │
└────────┬────────┘
│ MCP Protocol (stdio)
↓
┌─────────────────┐
│ server.py │ Main MCP Server
│ - Tool defs │ - Handles tool registration
│ - Handlers │ - Processes requests
└────────┬────────┘
│
┌────┴─────────────────┬──────────────┐
↓ ↓ ↓
┌──────────┐ ┌─────────────┐ ┌──────────────┐
│data_ │ │plot_ │ │schemas.py │
│loader.py │ │builder.py │ │- Pydantic │
│ │ │ │ │ models │
└──────────┘ └─────────────┘ └──────────────┘
│ │
↓ ↓
┌──────────┐ ┌─────────────┐
│ pandas │ │ plotnine │
│DataFrame │ │ (ggplot) │
└──────────┘ └─────────────┘
- Python: 3.10+
- MCP SDK: 1.0.0+ (Model Context Protocol)
- Core Libraries:
plotnine0.13.0+ (visualization)pandas2.0.0+ (data handling)pydantic2.0.0+ (validation)requests2.31.0+ (HTTP data sources)pillow10.0.0+ (image handling)
- Grammar of Graphics: Follow ggplot2's layered approach
- Modular: Each component is independent and testable
- Type-Safe: Use Pydantic for validation
- Backward Compatible: New features don't break existing usage
- AI-Friendly: Clear tool descriptions and error messages
Purpose: Main MCP server implementation
Key Components:
server = Server("plotnine-mcp")
@server.list_tools()
async def list_tools() -> list[Tool]:
# Returns tool definitions
# Tools: create_plot, list_geom_types
@server.call_tool()
async def call_tool(name: str, arguments: Any) -> list[TextContent]:
# Routes to specific handlers
# Handles: create_plot_handler, list_geom_types_handler
async def create_plot_handler(arguments: dict) -> list[TextContent]:
# Main plot creation logic
# 1. Parse and validate arguments
# 2. Load data
# 3. Build plot
# 4. Save plot
# 5. Return resultTool Definitions:
-
create_plot
- Inputs: data_source, aes, geom/geoms, scales, theme, facets, labels, coords, stats, output
- Output: TextContent with file path and summary
- Validation: Via Pydantic schemas
-
list_geom_types
- Inputs: None
- Output: List of available geometries with descriptions
Error Handling:
DataLoadError→ User-friendly data loading errorsPlotBuildError→ Plot construction errors with suggestions- Generic exceptions → Caught and formatted
Purpose: Load data from various sources into pandas DataFrames
Key Functions:
def load_data(data_source: DataSource) -> pd.DataFrame:
"""Main entry point for data loading"""
# Dispatches to specific loaders based on type
def _load_inline_data(data_source: DataSource) -> pd.DataFrame:
"""Load from JSON array"""
return pd.DataFrame(data_source.data)
def _load_file_data(data_source: DataSource) -> pd.DataFrame:
"""Load from local file"""
# - Resolves path
# - Auto-detects format
# - Calls appropriate pandas reader
def _load_url_data(data_source: DataSource) -> pd.DataFrame:
"""Load from URL"""
# - Fetches with requests
# - Auto-detects format
# - Reads from BytesIO
def _detect_format_from_path(path: Path) -> str:
"""Detect file format from extension"""
# Maps: .csv → "csv", .json → "json", etc.
def _read_file_by_format(source: Any, format_type: str) -> pd.DataFrame:
"""Read file based on detected format"""
# Calls: pd.read_csv, pd.read_json, pd.read_parquet, pd.read_excelSupported Formats:
- CSV: via
pandas.read_csv() - JSON: via
pandas.read_json() - Parquet: via
pandas.read_parquet()(requires pyarrow) - Excel: via
pandas.read_excel()(requires openpyxl)
Auto-Detection:
- File: Based on extension (.csv, .json, .parquet, .xlsx)
- URL: Based on URL path or Content-Type header
- Fallback: Defaults to CSV
Error Cases:
- File not found
- Unsupported format
- Network errors (URLs)
- Malformed data
Purpose: Build plotnine plots from configuration
Key Components:
GEOM_MAP = {
"point": geom_point,
"line": geom_line,
"bar": geom_bar,
# ... 20+ geometries
}
def build_plot(
data: pd.DataFrame,
aes_config: Aesthetics,
geom_config: Optional[GeomConfig] = None,
geom_configs: Optional[list[GeomConfig]] = None,
# ... other optional configs
) -> ggplot:
"""
Main plot building function
Process:
1. Handle backward compatibility (geom → geom_configs)
2. Build aesthetics from config
3. Create base ggplot object
4. Add geometry layers (single or multiple)
5. Add statistical transformations
6. Apply scales
7. Add facets
8. Add labels
9. Apply coordinate system
10. Apply theme
"""
def _build_aesthetics(aes_config: Aesthetics) -> aes:
"""Convert Aesthetics schema to plotnine aes object"""
# Maps: x, y, color, fill, size, alpha, shape, linetype, group
def _build_geom(geom_config: GeomConfig):
"""Convert GeomConfig to plotnine geom"""
# Looks up in GEOM_MAP
# Applies params
def _build_scale(scale_config: ScaleConfig):
"""Build scale from configuration"""
# Maps to: scale_x_continuous, scale_y_log10, etc.
def _build_theme(theme_config: ThemeConfig):
"""Build theme from configuration"""
# Base themes: gray, bw, minimal, classic, dark, light, void
# Applies customizations: figure_size, legend_position, etc.
def _build_facet(facet_config: FacetConfig):
"""Build facet from configuration"""
# Types: wrap, grid
# Generates formula for facet_grid
def _build_labels(labels_config: LabelsConfig):
"""Add plot labels"""
# title, x, y, caption, subtitle
def _build_coord(coord_config: CoordConfig):
"""Set coordinate system"""
# Types: cartesian, flip, fixed, trans
def save_plot(plot: ggplot, output_config: OutputConfig) -> dict:
"""
Save plot to file
Process:
1. Create output directory
2. Generate filename (or use provided)
3. Call plot.save() with dimensions and DPI
4. Return metadata
"""Geometry Mapping: All plotnine geoms are imported and mapped:
- Basic: point, line, bar, col, histogram
- Distribution: boxplot, violin, density
- Specialized: tile, text, errorbar, ribbon, path, polygon
- Reference: hline, vline, abline
- Smoothing: smooth, area, jitter
Multi-Layer Support (v0.2.0):
# Backward compatible handling
if geom_config and not geom_configs:
geom_configs = [geom_config] # Convert single to list
# Layer multiple geoms
for geom_cfg in geom_configs:
plot = plot + _build_geom(geom_cfg)Theme Customization:
Supports nested customizations via theme() function:
figure_size: tuplelegend_position: stringpanel_background,plot_background: element_recttext,axis_text,axis_title: element_text
Purpose: Pydantic models for type validation
Key Schemas:
class DataSource(BaseModel):
"""Data source configuration"""
type: Literal["file", "url", "inline"]
path: Optional[str] = None # For file/url
data: Optional[list[dict]] = None # For inline
format: Optional[Literal["csv", "json", "parquet", "excel"]] = "csv"
class Aesthetics(BaseModel):
"""Aesthetic mappings"""
x: Optional[str] = None
y: Optional[str] = None
color: Optional[str] = None
fill: Optional[str] = None
size: Optional[str] = None
alpha: Optional[str] = None
shape: Optional[str] = None
linetype: Optional[str] = None
group: Optional[str] = None
class GeomConfig(BaseModel):
"""Geometry configuration"""
type: str # Geometry name
params: dict[str, Any] = Field(default_factory=dict)
class ScaleConfig(BaseModel):
"""Scale configuration"""
aesthetic: str # x, y, color, fill, etc.
type: str # continuous, discrete, log10, etc.
params: dict[str, Any] = Field(default_factory=dict)
class ThemeConfig(BaseModel):
"""Theme configuration"""
base: str = "gray" # Base theme name
customizations: dict[str, Any] = Field(default_factory=dict)
class FacetConfig(BaseModel):
"""Faceting configuration"""
type: Literal["wrap", "grid"] = "wrap"
facets: Optional[str] = None # Formula
cols: Optional[str] = None # For grid
rows: Optional[str] = None # For grid
params: dict[str, Any] = Field(default_factory=dict)
class LabelsConfig(BaseModel):
"""Plot labels"""
title: Optional[str] = None
x: Optional[str] = None
y: Optional[str] = None
caption: Optional[str] = None
subtitle: Optional[str] = None
class CoordConfig(BaseModel):
"""Coordinate system"""
type: str = "cartesian"
params: dict[str, Any] = Field(default_factory=dict)
class StatConfig(BaseModel):
"""Statistical transformation"""
type: str # smooth, bin, density, summary
params: dict[str, Any] = Field(default_factory=dict)
class OutputConfig(BaseModel):
"""Output configuration"""
format: Literal["png", "pdf", "svg"] = "png"
filename: Optional[str] = None
width: float = 8 # inches
height: float = 6 # inches
dpi: int = 300
directory: str = "./output"Validation Benefits:
- Type checking at runtime
- Clear error messages
- Auto-generated JSON schema for MCP
- IDE autocomplete support
Simple package initialization with version number.
1. User Request (via AI Assistant)
↓
2. MCP Client → Server (stdio transport)
↓
3. server.call_tool() receives request
↓
4. create_plot_handler() processes arguments
↓
5. Pydantic Validation
- DataSource → data_loader.py
- Aesthetics, GeomConfig(s), etc. → plot_builder.py
↓
6. Data Loading
load_data() → pandas DataFrame
↓
7. Plot Building
build_plot() → ggplot object
- Build aesthetics
- Add geom layer(s) ← Multi-layer support!
- Apply scales
- Add facets
- Set labels
- Apply theme
- Set coordinates
↓
8. Plot Saving
save_plot() → File on disk
↓
9. Response
TextContent with metadata
↓
10. User sees result
Try:
Load data
↓ DataLoadError
→ User-friendly message with suggestions
Try:
Build plot
↓ PlotBuildError
→ Column/geom/config error with hints
Try:
Save plot
↓ PlotBuildError / IOError
→ File system error message
Catch all:
→ Generic error with full traceback context
File-based:
- CSV files (
pd.read_csv) - JSON files (
pd.read_json) - Parquet files (
pd.read_parquet) - optional dependency - Excel files (
pd.read_excel) - optional dependency
Network-based:
- HTTP/HTTPS URLs
- Auto-detection from Content-Type header
Inline:
- JSON arrays passed directly in MCP call
Features:
- Format auto-detection
- Path resolution (relative and absolute)
- User home directory expansion (
~) - Comprehensive error messages
Basic:
- point - Scatter points
- line - Connected lines
- bar - Bar chart (count stat)
- col - Column chart (identity stat)
- path - Path in data order
Distribution:
- histogram - Binned continuous data
- density - Kernel density estimation
- boxplot - Box and whisker
- violin - Violin plots
Smoothing:
- smooth - Conditional means (supports lm, loess)
- area - Filled area under curve
Specialized:
- tile - Heatmap tiles
- text - Text annotations
- jitter - Jittered points
- errorbar - Error bars
- ribbon - Confidence ribbons
- polygon - Filled polygons
Reference:
- hline - Horizontal line
- vline - Vertical line
- abline - Diagonal line
All ggplot2 aesthetics supported:
- x, y - Position
- color - Point/line color
- fill - Fill color
- size - Point/line size
- alpha - Transparency
- shape - Point shape
- linetype - Line style
- group - Grouping for lines/paths
Position scales:
- continuous
- discrete
- log10
- sqrt
- datetime
Color scales:
- gradient (continuous)
- discrete
- brewer (ColorBrewer palettes)
Implementation:
- Dynamic scale name construction:
scale_{aesthetic}_{type} - Parameter passing to plotnine scale functions
- Full support for limits, breaks, labels
Base themes:
- gray (default)
- bw (black and white)
- minimal
- classic
- dark
- light
- void
Customizations:
- figure_size
- legend_position
- legend_direction
- panel_background (via element_rect)
- plot_background (via element_rect)
- text (via element_text)
- axis_text (via element_text)
- axis_title (via element_text)
Types:
- facet_wrap - Single variable wrapping
- facet_grid - Two-variable grid
Features:
- Formula support (
~ variable,row ~ col) - Custom parameters (ncol, scales, etc.)
- cartesian (default)
- flip (swap x and y)
- fixed (fixed aspect ratio)
- trans (coordinate transformation)
Note: coord_polar not available in plotnine 0.13+
All label types supported:
- title
- x-axis label
- y-axis label
- caption
- subtitle
- smooth (various methods: lm, loess, etc.)
- bin (for histograms)
- density (kernel density)
- summary (aggregations)
Note: Most stats are implicit in geoms
- PNG (default, 300 DPI)
- PDF (vector)
- SVG (vector)
Configuration:
- Custom dimensions (width, height in inches)
- Custom DPI (raster formats)
- Custom filename
- Custom output directory
- Auto-generated filenames (UUID-based)
Implementation:
# Accept both single geom and array of geoms
geom_config: Optional[GeomConfig] = None
geom_configs: Optional[list[GeomConfig]] = None
# Backward compatibility conversion
if geom_config and not geom_configs:
geom_configs = [geom_config]
# Layer all geoms
for geom_cfg in geom_configs:
plot = plot + _build_geom(geom_cfg)Use cases:
- Scatter + smooth trend lines
- Boxplot + jittered points
- Line + points
- Histogram + density curve
- Area + line border
- Any combination!
Backward compatibility:
- Old code using
geomstill works - New code can use
geomsarray - No breaking changes
Pros:
- Type safety at runtime
- Clear error messages
- Auto-generates JSON schemas
- IDE support
- Nested validation
Alternative considered: Manual dict validation Decision: Pydantic provides better UX and maintainability
Reason: MCP specification standard Benefit: Works with all MCP clients (Claude Desktop, Cursor, VSCode) Alternative: HTTP (not standard for MCP)
Rationale:
- data_loader: Independent data loading logic, reusable
- plot_builder: Pure plotnine logic, no MCP coupling
- schemas: Validation layer, clear contracts
- server: MCP-specific logic, thin orchestration layer
Benefit: Testable, maintainable, clear separation of concerns
Reasoning:
- Predictable location
- Keeps project organized
- Easy to .gitignore
- User can override
Alternative: System temp directory Decision: Local output for user control
Reasoning:
- Backward compatibility (v0.1.0 used
geom) - Intuitive for single-layer plots
- Explicit for multi-layer plots
Implementation:
# Simple internal conversion
if geom_config and not geom_configs:
geom_configs = [geom_config]Reasoning:
- Users don't always want to name files
- UUID prevents collisions
- Still allows custom names
Format: plot_{8-char-uuid}.{format}
Example: plot_a3f2b9c1.png
Reason: Removed in plotnine 0.13+
Alternative: Use coord_trans for transformations
Note: Documented in README to avoid confusion
Test Coverage:
-
test_inline_data_scatter_plot()
- Tests: Inline data source, scatter plot, basic theming
- Validates: Data loading, plot building, file saving
-
test_file_data_line_plot()
- Tests: CSV file loading, line plot, custom theme
- Validates: File reading, theme customization
-
test_bar_plot()
- Tests: Bar chart with fill aesthetic
- Validates: Column chart (geom_col)
-
test_multi_layer_plot()
- Tests: Multi-layer (point + smooth)
- Validates: geom_configs array, layering
-
test_boxplot_with_jitter()
- Tests: Boxplot + jitter overlay
- Validates: Multiple geom types, transparency
Test Approach:
- Integration tests (end-to-end)
- Direct module imports (not via MCP)
- File output verification
- Visual inspection supported
Coverage Gaps:
- Unit tests for individual functions
- Error case testing
- Scale/facet/coord testing
- URL data source testing
- All geometry types
python test_basic.pyExpected output:
============================================================
Running Plotnine MCP Basic Tests
============================================================
Test 1: Simple scatter plot with inline data...
✓ Data loaded: 5 rows
✓ Plot built successfully
✓ Plot saved to: /path/to/output/test_scatter.png
...
All tests passed! ✓
Sample file: examples/sample_data.csv
x,y,category,size
1,2.3,A,10
2,4.1,A,15
...- Unit tests for each module
- Mock MCP client tests
- Error handling tests
- Performance benchmarks
- Visual regression tests (compare images)
- CI/CD pipeline (GitHub Actions)
Limitation: Cannot create multiple independent plots in one MCP call
Workaround: Make multiple MCP calls
Future: Batch processing feature planned
Limitation: Static images only (PNG, PDF, SVG)
Reason: plotnine is not interactive
Future: Consider plotly backend option
Limitation: No built-in filtering, grouping, aggregation
Workaround: Pre-process data before plotting
Future: Data transformation module planned
Limitation: Entire dataset loaded into memory
Impact: Large files (>1GB) may fail
Future: Streaming/chunking support planned
Limitation: Cannot query databases directly
Workaround: Export to CSV/JSON first
Future: SQLAlchemy integration planned
Limitation: Not all plotnine theme elements exposed
Reason: Simplified API for AI interactions
Workaround: Limited set covers most use cases
Limitation: Cannot combine multiple plots into subplots
Reason: plotnine doesn't support subplots natively
Future: External composition tool or matplotlib backend
Limitation: Some plotnine errors are cryptic
Status: Partially mitigated with try-catch
Future: Smart error handler with suggestions
Limitation: Static plots only
Future: Frame-by-frame generation planned
Limitation: No auto-generated correlations, p-values, etc.
Workaround: Manual text annotations
Future: Statistical annotation module planned
Status: Partial coverage
- Pydantic models: Full
- Public functions: Full
- Private functions: Partial
Goal: 100% coverage for public API
Docstrings:
- Modules: Yes
- Public functions: Yes
- Private functions: Partial
Format: Google style
Formatter: Black (line length 100) Linter: Ruff Import sorting: Not enforced yet
Philosophy: Minimal, well-maintained Lock file: No (pip-tools recommended) Virtual env: Recommended
Test environment:
- MacBook M1
- Python 3.12
- Dataset: 10 rows
Results:
- Simple scatter: ~1-2 seconds
- Multi-layer: ~2-3 seconds
- Faceted plot: ~2-4 seconds
Bottlenecks:
- plotnine rendering (most time)
- File I/O (negligible)
- Data loading (depends on source)
- Caching: Reuse loaded data for multiple plots
- Lazy evaluation: Defer rendering until needed
- Parallel processing: For batch operations
- Data sampling: For large datasets
plotnine-mcp/
├── pyproject.toml # Package metadata & dependencies
├── README.md # User documentation
├── LICENSE # MIT License
├── IMPLEMENTATION_REFERENCE.md # This document
├── FUTURE_ENHANCEMENTS.md # Roadmap
├── src/
│ └── plotnine_mcp/
│ ├── __init__.py
│ ├── server.py
│ ├── data_loader.py
│ ├── plot_builder.py
│ └── schemas.py
├── examples/
│ ├── sample_data.csv
│ └── usage_examples.md
├── output/ # Generated plots (gitignored)
│ └── .gitkeep
└── test_basic.py # Test suite
Development install:
pip install -e .Regular install:
pip install .With optional dependencies:
pip install -e ".[full]" # Adds pyarrow, openpyxlDefined in pyproject.toml:
[project.scripts]
plotnine-mcp = "plotnine_mcp.server:main"Can be run as:
plotnine-mcp # If installed
python -m plotnine_mcp.server # Module executionClaude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"plotnine": {
"command": "python",
"args": ["-m", "plotnine_mcp.server"]
}
}
}Cursor (.cursor/mcp.json):
{
"mcpServers": {
"plotnine": {
"command": "python",
"args": ["-m", "plotnine_mcp.server"]
}
}
}Major Features:
- ✨ Multi-layer plot support
- 📝 Enhanced documentation
Changes:
- Added
geomsarray parameter - Backward compatible with
geomparameter - Two new test cases
- Updated README with multi-layer examples
Files changed:
server.py: Added geoms schema and handler logicplot_builder.py: Modified build_plot() for multi-layertest_basic.py: Added multi-layer testsREADME.md: Added examples and feature highlight
Initial Release:
- ✅ MCP server implementation
- ✅ 20+ geometry types
- ✅ Multiple data sources
- ✅ Full ggplot2 feature parity
- ✅ Documentation and examples
- ✅ Basic test suite
- Plan: Document in FUTURE_ENHANCEMENTS.md
- Design: Update schemas if needed
- Implement: Add to appropriate module
- Test: Add test case
- Document: Update README and this file
- Commit: Descriptive commit message
- Push: To GitHub
# Install in dev mode
pip install -e .
# Run tests
python test_basic.py
# Test with MCP client
# (Configure in Claude Desktop/Cursor)MCP server logs:
- stdout/stderr captured by MCP client
- Use
print()for debugging (visible in client logs) - Consider adding
--verboseflag in future
Common issues:
- Import errors: Check installation
- Schema validation errors: Check Pydantic models
- plotnine errors: Check geom/scale names
- Follow PEP 8
- Use Black formatter (line length 100)
- Add type hints
- Write docstrings (Google style)
- Fork repository
- Create feature branch
- Implement changes with tests
- Update documentation
- Submit PR with description
Include:
- plotnine-mcp version
- Python version
- MCP client (Claude/Cursor/etc.)
- Minimal reproduction case
- Expected vs actual behavior
README.md- User guideFUTURE_ENHANCEMENTS.md- Roadmapexamples/usage_examples.md- Example prompts
- ✨ Added multi-layer plot support (v0.2.0)
- 📝 Created IMPLEMENTATION_REFERENCE.md
- 📝 Created FUTURE_ENHANCEMENTS.md
- 🎉 Initial release (v0.1.0)
- ✅ Core functionality complete
- 📚 Documentation complete
Maintained by: Fervoyush Repository: https://github.com/Fervoyush/plotnine-mcp License: MIT