The Reflection System is GDAL MCP's core innovation: a middleware-based epistemic governance layer that enforces methodological justification before executing geospatial operations with significant consequences.
Key principle: AI agents should not blindly execute operations that require domain expertise. They must demonstrate understanding through structured reasoning.
User Request
↓
AI Agent (Claude)
↓
MCP Tool Call (e.g., raster_reproject)
↓
FastMCP Server
↓
ReflectionMiddleware.on_call_tool()
├─→ Check cache (.preflight/justifications/{domain}/)
├─→ If cached: proceed immediately
└─→ If missing: raise ToolError → trigger prompt
↓
AI calls prompt (justify_crs_selection)
↓
AI provides structured justification
↓
AI calls store_justification
↓
Justification cached with SHA256 hash
↓
AI retries original tool call
↓
Cache hit → execution proceeds
↓
Tool executes with validated methodology
src/
├── middleware/
│ ├── reflection_middleware.py # FastMCP middleware (tool interception)
│ ├── reflection_transform.py # Cache checking & hash computation
│ ├── reflection_config.py # Declarative tool→prompt mapping
│ └── reflection_store.py # Persistent storage (Pydantic models)
├── prompts/
│ ├── crs.py # CRS selection reasoning prompt
│ ├── resampling.py # Resampling method reasoning prompt
│ └── justification.py # Shared schema definitions
└── tools/
└── reflection/
└── store_justification.py # Explicit storage tool for AI
.preflight/
└── justifications/
├── crs_datum/
│ └── sha256:abc123...def.json
└── resampling/
└── sha256:456789...xyz.json
Triggered by: raster_reproject(dst_crs=...)
Questions enforced:
- What spatial property must be preserved? (distance, area, shape, angle)
- Why is this CRS appropriate for the analysis intent?
- What alternative projections were considered?
- What distortion tradeoffs are acceptable?
Example justification:
{
"intent": "Preserve area accuracy for land cover classification statistics",
"alternatives": [
{
"crs": "EPSG:4326",
"why_not": "Geographic CRS distorts area significantly at high latitudes"
},
{
"crs": "EPSG:3857",
"why_not": "Web Mercator has extreme area distortion away from equator"
}
],
"choice": {
"crs": "EPSG:6933",
"rationale": "Cylindrical Equal Area preserves area globally, critical for accurate hectare calculations in classification stats",
"tradeoffs": "Shape distortion acceptable since only measuring total area per class"
},
"confidence": "high"
}Cache key includes: dst_crs (source CRS not included to allow flexibility)
Triggered by: raster_reproject(resampling=...)
Questions enforced:
- What signal property must be preserved? (exact values, smooth gradients, class boundaries)
- Is the data categorical (discrete) or continuous (measurements)?
- What artifacts might this method introduce?
- Why not other resampling methods?
Example justification:
{
"intent": "Preserve exact class values for land cover classification",
"alternatives": [
{
"method": "bilinear",
"why_not": "Introduces new pixel values between classes, creating invalid categories"
},
{
"method": "cubic",
"why_not": "Even worse smoothing artifacts, can create negative values for positive-only data"
}
],
"choice": {
"method": "nearest",
"rationale": "Only method that preserves exact input values, essential for categorical data where each integer represents a distinct class",
"tradeoffs": "Creates blocky appearance at class boundaries, but classification accuracy more important than visual smoothness"
},
"confidence": "high"
}Cache key includes: method parameter only
Triggered by: raster_query(...) and vector_query(...)
Prompt: justify_query_extent(geometry, purpose)
Questions enforced:
- What spatial intent must this extent satisfy?
- Which broader/narrower alternatives were considered?
- What coverage vs precision tradeoff is acceptable?
Cache key includes: normalized geometry + purpose
Will trigger on:
- DEM preprocessing operations (fill sinks, breach depressions)
- Flow direction algorithm selection (D8, D-infinity, MFD)
- Flow accumulation thresholds for stream network delineation
Questions to enforce:
- What hydrologic assumption is being made? (D8 vs multi-flow)
- How should depressions be handled? (fill, breach, or preserve)
- What drainage area threshold defines a stream?
Will trigger on:
- Zonal statistics from different resolution sources
- Temporal compositing (max NDVI, median reflectance)
- Data fusion from multiple sensors
Questions to enforce:
- How should resolution differences be reconciled?
- What aggregation function preserves the signal?
- What happens to outliers and nodata?
Will trigger on:
- Format conversion with compression choices
- Tiling strategy decisions
- Overview level selection
Questions to enforce:
- What access pattern will be used? (sequential, random, cloud)
- Lossless or lossy compression acceptable?
- What zoom levels are needed?
Cache keys are computed using SHA256 of:
- Domain (e.g.,
crs_datum) - Prompt source hash (e.g.,
justify_crs_selection) - Normalized prompt arguments (sorted dict, stable JSON)
Example:
# Input parameters
prompt_args = {"dst_crs": "EPSG:3857"}
domain = "crs_datum"
prompt_hash = "<hash of prompt source>"
# Normalized for hashing
normalized = json.dumps(prompt_args, sort_keys=True)
# → '{"dst_crs": "EPSG:3857"}'
# Combined hash input
hash_input = json.dumps({
"domain": domain,
"prompt_hash": prompt_hash,
**prompt_args
}, sort_keys=True, separators=(",", ":"))
# Final cache key
cache_key = hashlib.sha256(hash_input.encode()).hexdigest()
# → "sha256:abc123def456..."Important: cache keys are domain-based, not tool-based. This enables cross-tool reuse for the same methodological reasoning (for example, CRS justifications across raster/vector reprojection).
Location: .preflight/justifications/{domain}/sha256:{hash}.json
Structure:
{
"tool_name": "raster_reproject",
"domain": "crs_datum",
"prompt_name": "justify_crs_selection",
"prompt_args": {
"dst_crs": "EPSG:3857"
},
"justification": {
"intent": "...",
"alternatives": [...],
"choice": {...},
"confidence": "high"
},
"timestamp": "2025-10-24T16:30:00Z",
"cache_key": "sha256:abc123def456..."
}Cache hit conditions:
- Same tool + same parameters + same domain → instant execution
- Example: Second
raster_reproject(..., dst_crs="EPSG:3857", resampling="cubic")with identical parameters
Cache miss conditions:
- Different parameter value → new justification required
- Example: Change
dst_crs="EPSG:4326"→ triggers new CRS prompt
Partial cache:
- Multiple reflection domains per tool
- Each domain cached independently
- Example:
raster_reprojectrequires both CRS and resampling justifications- If CRS cached but resampling new → only resampling prompt triggered
All reflection prompts follow a consistent structure:
@mcp.prompt(
name="justify_*",
description="Pre-execution micro-guidance for * reasoning.",
)
def justify_something(parameter: str) -> list[Message]:
content = f"""
Before executing operation with {parameter}:
**Reason through:**
• What property must be preserved?
• Why is this choice appropriate?
• What alternatives were considered?
• What tradeoffs are acceptable?
**Return strict JSON:**
{{
"intent": "...",
"alternatives": [
{{"method": "...", "why_not": "..."}}
],
"choice": {{
"method": "{parameter}",
"rationale": "...",
"tradeoffs": "..."
}},
"confidence": "low|medium|high"
}}
"""
return [UserMessage(content=content)]class Alternative(BaseModel):
"""Alternative method considered but rejected."""
method: str = Field(description="Alternative approach name")
why_not: str = Field(description="Reason for rejection")
class Choice(BaseModel):
"""Selected method with justification."""
method: str = Field(description="Chosen approach")
rationale: str = Field(description="Why this fits the intent")
tradeoffs: str = Field(description="Known limitations or distortions")
class Justification(BaseModel):
"""Complete epistemic justification."""
intent: str = Field(description="What property/goal to preserve")
alternatives: list[Alternative] = Field(description="Rejected options")
choice: Choice = Field(description="Selected method with reasoning")
confidence: Literal["low", "medium", "high"] = Field(
description="Confidence in this methodological choice"
)Step 1: Create the prompt
# src/prompts/my_domain.py
from fastmcp import FastMCP
from fastmcp.prompts import UserMessage
def register(mcp: FastMCP) -> None:
@mcp.prompt(
name="justify_my_operation",
description="Reasoning for my operation parameters.",
tags={"reasoning", "my_domain"},
)
def justify_my_operation(param: str) -> list[Message]:
content = f"""
Before executing with {param}:
**Reason through:**
• Why this parameter value?
• What alternatives exist?
• What are the tradeoffs?
**Return JSON:**
{{
"intent": "...",
"alternatives": [{{"method": "...", "why_not": "..."}}],
"choice": {{"method": "{param}", "rationale": "...", "tradeoffs": "..."}},
"confidence": "high"
}}
"""
return [UserMessage(content=content)]Step 2: Register prompt
# src/prompts/__init__.py
import src.prompts.my_domain
def register_prompts(mcp: FastMCP) -> None:
# ... existing prompts ...
src.prompts.my_domain.register(mcp)Step 3: Add reflection config
# src/middleware/reflection_config.py
TOOL_REFLECTIONS: dict[str, list[ReflectionSpec]] = {
# ... existing tools ...
"my_new_tool": [
ReflectionSpec(
domain="my_domain",
prompt_name="justify_my_operation",
args_fn=lambda kwargs: {"param": kwargs["param"]},
),
],
}Step 4: Tool executes automatically with reflection
No decorator needed! Middleware intercepts all tool calls and checks TOOL_REFLECTIONS config.
# Clear cache to force new justifications
rm -rf .preflight/justifications/my_domain/
# Trigger the tool via Claude
# Should see prompt, provide justification, then execute
# Verify cache created
ls .preflight/justifications/my_domain/
# Second invocation should cache hit (no prompt)# In tool or middleware
logger.setLevel(logging.DEBUG)Outputs:
DEBUG: Reflection middleware intercepting tool call: raster_reproject
DEBUG: Checking cache for domain: crs_datum (hash=sha256:abc123...)
INFO: Reflection cache miss for raster_reproject: crs_datum
DEBUG: Checking cache for domain: resampling (hash=sha256:def456...)
INFO: Reflection cache hit for raster_reproject: resampling
# List all cached justifications
find .preflight/justifications -type f -name "*.json"
# View specific justification
cat .preflight/justifications/crs_datum/sha256:abc123*.json | jq .
# Count cache entries by domain
find .preflight/justifications -type f | awk -F'/' '{print $(NF-1)}' | sort | uniq -c# Temporary: rename cache directory
mv .preflight/justifications .preflight/justifications.backup
# All operations will now trigger prompts
# Restore
mv .preflight/justifications.backup .preflight/justifications-
Provide specific reasoning — Don't just restate the parameter
- ❌ "Using EPSG:3857 because user requested it"
- ✅ "Using EPSG:3857 to preserve angular relationships for web tile rendering, accepting area distortion as visualization-only use case"
-
Consider real alternatives — Show domain understanding
- ❌ Single alternative that's obviously wrong
- ✅ Multiple plausible alternatives with specific technical reasons for rejection
-
Acknowledge tradeoffs — Demonstrate awareness of limitations
- ❌ "No tradeoffs, perfect choice"
- ✅ "Distance/area distortion increases with latitude, acceptable within ±45° bounds of this dataset"
-
Match confidence to certainty
high— Standard methodology, well-understood consequencesmedium— Reasonable choice but alternatives viablelow— Uncertain about best approach, would benefit from expert review
-
Keep prompts focused — One methodological choice per prompt
- Each reflection domain should address a single decision axis
- Avoid combining CRS + resampling in one prompt
-
Make cache keys precise — Include only decision-relevant parameters
- CRS justification doesn't need source CRS (only destination matters)
- Resampling justification doesn't need CRS at all
-
Design for composition — Reflection prompts should chain well
- Hydrology workflow: condition → flow direction → accumulation
- Each step can reference previous justifications
-
Test cache behavior — Verify hit/miss logic
- See
test/test_reflection_*.pyfor the existing coverage - Test same parameters, different parameters, partial changes
- See
First invocation (cache miss):
- Prompt generation: ~50ms
- LLM reasoning: ~10-30 seconds (depends on model)
- Storage: ~10ms
- Total: ~10-30 seconds
Subsequent invocations (cache hit):
- Hash computation: ~1ms
- Cache lookup: ~5ms
- Total: ~6ms (negligible)
Optimization: Cache hit rate > 80% in typical workflows
Cache size:
- ~1-2KB per justification (JSON text)
- 100 unique operations = ~200KB
- 1000 unique operations = ~2MB
Cleanup: Justifications persist across sessions. Manual cleanup:
# Remove old justifications (older than 30 days)
find .preflight/justifications -type f -mtime +30 -delete
# Clear all cache
rm -rf .preflight/justifications/*Justification cache is workspace-specific (.preflight/ in project root). Different projects maintain separate caches.
Cache files are:
- Content-addressed — SHA256 hash prevents tampering
- Structured — Pydantic validation on read
- Auditable — Plain JSON for inspection
Middleware prevents execution without justification. AI cannot bypass by:
- Calling tool directly (middleware intercepts)
- Providing empty justification (Pydantic validation)
- Reusing wrong justification (hash mismatch)
- Justification chaining (link related operations)
- Provenance export (markdown reports)
- Confidence thresholds (require review if < medium)
- Alternative workflow suggestions
- Quality assessment prompts
- Shared team justification libraries
- Uncertainty propagation through chains
- Multi-agent collaboration on justifications
- Automated workflow discovery
See also:
- docs/PHILOSOPHY.md — design rationale
- README.md — user-facing overview