Skip to content

Latest commit

 

History

History
568 lines (443 loc) · 16.3 KB

File metadata and controls

568 lines (443 loc) · 16.3 KB

Reflection System — Technical Documentation

Overview

The Reflection System is GDAL MCP's core innovation: a middleware-based epistemic governance layer that enforces methodological justification before executing geospatial operations with significant consequences.

Key principle: AI agents should not blindly execute operations that require domain expertise. They must demonstrate understanding through structured reasoning.

Architecture

Components

User Request
    ↓
AI Agent (Claude)
    ↓
MCP Tool Call (e.g., raster_reproject)
    ↓
FastMCP Server
    ↓
ReflectionMiddleware.on_call_tool()
    ├─→ Check cache (.preflight/justifications/{domain}/)
    ├─→ If cached: proceed immediately
    └─→ If missing: raise ToolError → trigger prompt
         ↓
    AI calls prompt (justify_crs_selection)
         ↓
    AI provides structured justification
         ↓
    AI calls store_justification
         ↓
    Justification cached with SHA256 hash
         ↓
    AI retries original tool call
         ↓
    Cache hit → execution proceeds
         ↓
    Tool executes with validated methodology

File Structure

src/
├── middleware/
│   ├── reflection_middleware.py      # FastMCP middleware (tool interception)
│   ├── reflection_transform.py       # Cache checking & hash computation
│   ├── reflection_config.py          # Declarative tool→prompt mapping
│   └── reflection_store.py           # Persistent storage (Pydantic models)
├── prompts/
│   ├── crs.py                        # CRS selection reasoning prompt
│   ├── resampling.py                 # Resampling method reasoning prompt
│   └── justification.py              # Shared schema definitions
└── tools/
    └── reflection/
        └── store_justification.py    # Explicit storage tool for AI

.preflight/
└── justifications/
    ├── crs_datum/
    │   └── sha256:abc123...def.json
    └── resampling/
        └── sha256:456789...xyz.json

Reflection Domains

Current (v1.2.0 phase scope)

crs_datum — Coordinate Reference System Selection

Triggered by: raster_reproject(dst_crs=...)

Questions enforced:

  • What spatial property must be preserved? (distance, area, shape, angle)
  • Why is this CRS appropriate for the analysis intent?
  • What alternative projections were considered?
  • What distortion tradeoffs are acceptable?

Example justification:

{
  "intent": "Preserve area accuracy for land cover classification statistics",
  "alternatives": [
    {
      "crs": "EPSG:4326",
      "why_not": "Geographic CRS distorts area significantly at high latitudes"
    },
    {
      "crs": "EPSG:3857",
      "why_not": "Web Mercator has extreme area distortion away from equator"
    }
  ],
  "choice": {
    "crs": "EPSG:6933",
    "rationale": "Cylindrical Equal Area preserves area globally, critical for accurate hectare calculations in classification stats",
    "tradeoffs": "Shape distortion acceptable since only measuring total area per class"
  },
  "confidence": "high"
}

Cache key includes: dst_crs (source CRS not included to allow flexibility)

resampling — Interpolation Method Selection

Triggered by: raster_reproject(resampling=...)

Questions enforced:

  • What signal property must be preserved? (exact values, smooth gradients, class boundaries)
  • Is the data categorical (discrete) or continuous (measurements)?
  • What artifacts might this method introduce?
  • Why not other resampling methods?

Example justification:

{
  "intent": "Preserve exact class values for land cover classification",
  "alternatives": [
    {
      "method": "bilinear",
      "why_not": "Introduces new pixel values between classes, creating invalid categories"
    },
    {
      "method": "cubic",
      "why_not": "Even worse smoothing artifacts, can create negative values for positive-only data"
    }
  ],
  "choice": {
    "method": "nearest",
    "rationale": "Only method that preserves exact input values, essential for categorical data where each integer represents a distinct class",
    "tradeoffs": "Creates blocky appearance at class boundaries, but classification accuracy more important than visual smoothness"
  },
  "confidence": "high"
}

Cache key includes: method parameter only

spatial_query — Spatial Extent Selection

Triggered by: raster_query(...) and vector_query(...)

Prompt: justify_query_extent(geometry, purpose)

Questions enforced:

  • What spatial intent must this extent satisfy?
  • Which broader/narrower alternatives were considered?
  • What coverage vs precision tradeoff is acceptable?

Cache key includes: normalized geometry + purpose

Planned (v1.1.0+)

hydrology — DEM Conditioning & Flow Analysis

Will trigger on:

  • DEM preprocessing operations (fill sinks, breach depressions)
  • Flow direction algorithm selection (D8, D-infinity, MFD)
  • Flow accumulation thresholds for stream network delineation

Questions to enforce:

  • What hydrologic assumption is being made? (D8 vs multi-flow)
  • How should depressions be handled? (fill, breach, or preserve)
  • What drainage area threshold defines a stream?

aggregation — Multi-Resolution & Temporal Composition

Will trigger on:

  • Zonal statistics from different resolution sources
  • Temporal compositing (max NDVI, median reflectance)
  • Data fusion from multiple sensors

Questions to enforce:

  • How should resolution differences be reconciled?
  • What aggregation function preserves the signal?
  • What happens to outliers and nodata?

format_selection — Output Format Optimization

Will trigger on:

  • Format conversion with compression choices
  • Tiling strategy decisions
  • Overview level selection

Questions to enforce:

  • What access pattern will be used? (sequential, random, cloud)
  • Lossless or lossy compression acceptable?
  • What zoom levels are needed?

Cache Mechanism

Hash Computation

Cache keys are computed using SHA256 of:

  1. Domain (e.g., crs_datum)
  2. Prompt source hash (e.g., justify_crs_selection)
  3. Normalized prompt arguments (sorted dict, stable JSON)

Example:

# Input parameters
prompt_args = {"dst_crs": "EPSG:3857"}
domain = "crs_datum"
prompt_hash = "<hash of prompt source>"

# Normalized for hashing
normalized = json.dumps(prompt_args, sort_keys=True)
# → '{"dst_crs": "EPSG:3857"}'

# Combined hash input
hash_input = json.dumps({
    "domain": domain,
    "prompt_hash": prompt_hash,
    **prompt_args
}, sort_keys=True, separators=(",", ":"))

# Final cache key
cache_key = hashlib.sha256(hash_input.encode()).hexdigest()
# → "sha256:abc123def456..."

Important: cache keys are domain-based, not tool-based. This enables cross-tool reuse for the same methodological reasoning (for example, CRS justifications across raster/vector reprojection).

Cache Storage

Location: .preflight/justifications/{domain}/sha256:{hash}.json

Structure:

{
  "tool_name": "raster_reproject",
  "domain": "crs_datum",
  "prompt_name": "justify_crs_selection",
  "prompt_args": {
    "dst_crs": "EPSG:3857"
  },
  "justification": {
    "intent": "...",
    "alternatives": [...],
    "choice": {...},
    "confidence": "high"
  },
  "timestamp": "2025-10-24T16:30:00Z",
  "cache_key": "sha256:abc123def456..."
}

Cache Behavior

Cache hit conditions:

  • Same tool + same parameters + same domain → instant execution
  • Example: Second raster_reproject(..., dst_crs="EPSG:3857", resampling="cubic") with identical parameters

Cache miss conditions:

  • Different parameter value → new justification required
  • Example: Change dst_crs="EPSG:4326" → triggers new CRS prompt

Partial cache:

  • Multiple reflection domains per tool
  • Each domain cached independently
  • Example: raster_reproject requires both CRS and resampling justifications
    • If CRS cached but resampling new → only resampling prompt triggered

Prompt Structure

All reflection prompts follow a consistent structure:

Input Template

@mcp.prompt(
    name="justify_*",
    description="Pre-execution micro-guidance for * reasoning.",
)
def justify_something(parameter: str) -> list[Message]:
    content = f"""
Before executing operation with {parameter}:

**Reason through:**
• What property must be preserved?
• Why is this choice appropriate?
• What alternatives were considered?
• What tradeoffs are acceptable?

**Return strict JSON:**
{{
  "intent": "...",
  "alternatives": [
    {{"method": "...", "why_not": "..."}}
  ],
  "choice": {{
    "method": "{parameter}",
    "rationale": "...",
    "tradeoffs": "..."
  }},
  "confidence": "low|medium|high"
}}
"""
    return [UserMessage(content=content)]

Output Schema (Pydantic)

class Alternative(BaseModel):
    """Alternative method considered but rejected."""
    method: str = Field(description="Alternative approach name")
    why_not: str = Field(description="Reason for rejection")

class Choice(BaseModel):
    """Selected method with justification."""
    method: str = Field(description="Chosen approach")
    rationale: str = Field(description="Why this fits the intent")
    tradeoffs: str = Field(description="Known limitations or distortions")

class Justification(BaseModel):
    """Complete epistemic justification."""
    intent: str = Field(description="What property/goal to preserve")
    alternatives: list[Alternative] = Field(description="Rejected options")
    choice: Choice = Field(description="Selected method with reasoning")
    confidence: Literal["low", "medium", "high"] = Field(
        description="Confidence in this methodological choice"
    )

Integration Guide

Adding Reflection to a New Tool

Step 1: Create the prompt

# src/prompts/my_domain.py
from fastmcp import FastMCP
from fastmcp.prompts import UserMessage

def register(mcp: FastMCP) -> None:
    @mcp.prompt(
        name="justify_my_operation",
        description="Reasoning for my operation parameters.",
        tags={"reasoning", "my_domain"},
    )
    def justify_my_operation(param: str) -> list[Message]:
        content = f"""
Before executing with {param}:

**Reason through:**
• Why this parameter value?
• What alternatives exist?
• What are the tradeoffs?

**Return JSON:**
{{
  "intent": "...",
  "alternatives": [{{"method": "...", "why_not": "..."}}],
  "choice": {{"method": "{param}", "rationale": "...", "tradeoffs": "..."}},
  "confidence": "high"
}}
"""
        return [UserMessage(content=content)]

Step 2: Register prompt

# src/prompts/__init__.py
import src.prompts.my_domain

def register_prompts(mcp: FastMCP) -> None:
    # ... existing prompts ...
    src.prompts.my_domain.register(mcp)

Step 3: Add reflection config

# src/middleware/reflection_config.py
TOOL_REFLECTIONS: dict[str, list[ReflectionSpec]] = {
    # ... existing tools ...
    "my_new_tool": [
        ReflectionSpec(
            domain="my_domain",
            prompt_name="justify_my_operation",
            args_fn=lambda kwargs: {"param": kwargs["param"]},
        ),
    ],
}

Step 4: Tool executes automatically with reflection

No decorator needed! Middleware intercepts all tool calls and checks TOOL_REFLECTIONS config.

Testing Reflection Integration

# Clear cache to force new justifications
rm -rf .preflight/justifications/my_domain/

# Trigger the tool via Claude
# Should see prompt, provide justification, then execute

# Verify cache created
ls .preflight/justifications/my_domain/

# Second invocation should cache hit (no prompt)

Debugging

Enable debug logging

# In tool or middleware
logger.setLevel(logging.DEBUG)

Outputs:

DEBUG: Reflection middleware intercepting tool call: raster_reproject
DEBUG: Checking cache for domain: crs_datum (hash=sha256:abc123...)
INFO: Reflection cache miss for raster_reproject: crs_datum
DEBUG: Checking cache for domain: resampling (hash=sha256:def456...)
INFO: Reflection cache hit for raster_reproject: resampling

Inspect cache

# List all cached justifications
find .preflight/justifications -type f -name "*.json"

# View specific justification
cat .preflight/justifications/crs_datum/sha256:abc123*.json | jq .

# Count cache entries by domain
find .preflight/justifications -type f | awk -F'/' '{print $(NF-1)}' | sort | uniq -c

Bypass cache (for testing)

# Temporary: rename cache directory
mv .preflight/justifications .preflight/justifications.backup

# All operations will now trigger prompts

# Restore
mv .preflight/justifications.backup .preflight/justifications

Best Practices

For AI Agents

  1. Provide specific reasoning — Don't just restate the parameter

    • ❌ "Using EPSG:3857 because user requested it"
    • ✅ "Using EPSG:3857 to preserve angular relationships for web tile rendering, accepting area distortion as visualization-only use case"
  2. Consider real alternatives — Show domain understanding

    • ❌ Single alternative that's obviously wrong
    • ✅ Multiple plausible alternatives with specific technical reasons for rejection
  3. Acknowledge tradeoffs — Demonstrate awareness of limitations

    • ❌ "No tradeoffs, perfect choice"
    • ✅ "Distance/area distortion increases with latitude, acceptable within ±45° bounds of this dataset"
  4. Match confidence to certainty

    • high — Standard methodology, well-understood consequences
    • medium — Reasonable choice but alternatives viable
    • low — Uncertain about best approach, would benefit from expert review

For Developers

  1. Keep prompts focused — One methodological choice per prompt

    • Each reflection domain should address a single decision axis
    • Avoid combining CRS + resampling in one prompt
  2. Make cache keys precise — Include only decision-relevant parameters

    • CRS justification doesn't need source CRS (only destination matters)
    • Resampling justification doesn't need CRS at all
  3. Design for composition — Reflection prompts should chain well

    • Hydrology workflow: condition → flow direction → accumulation
    • Each step can reference previous justifications
  4. Test cache behavior — Verify hit/miss logic

    • See test/test_reflection_*.py for the existing coverage
    • Test same parameters, different parameters, partial changes

Performance Considerations

Latency

First invocation (cache miss):

  • Prompt generation: ~50ms
  • LLM reasoning: ~10-30 seconds (depends on model)
  • Storage: ~10ms
  • Total: ~10-30 seconds

Subsequent invocations (cache hit):

  • Hash computation: ~1ms
  • Cache lookup: ~5ms
  • Total: ~6ms (negligible)

Optimization: Cache hit rate > 80% in typical workflows

Storage

Cache size:

  • ~1-2KB per justification (JSON text)
  • 100 unique operations = ~200KB
  • 1000 unique operations = ~2MB

Cleanup: Justifications persist across sessions. Manual cleanup:

# Remove old justifications (older than 30 days)
find .preflight/justifications -type f -mtime +30 -delete

# Clear all cache
rm -rf .preflight/justifications/*

Security Considerations

Workspace Isolation

Justification cache is workspace-specific (.preflight/ in project root). Different projects maintain separate caches.

Justification Integrity

Cache files are:

  • Content-addressed — SHA256 hash prevents tampering
  • Structured — Pydantic validation on read
  • Auditable — Plain JSON for inspection

No Automatic Execution

Middleware prevents execution without justification. AI cannot bypass by:

  • Calling tool directly (middleware intercepts)
  • Providing empty justification (Pydantic validation)
  • Reusing wrong justification (hash mismatch)

Future Enhancements

v1.1.0

  • Justification chaining (link related operations)
  • Provenance export (markdown reports)
  • Confidence thresholds (require review if < medium)

v1.2.0

  • Alternative workflow suggestions
  • Quality assessment prompts
  • Shared team justification libraries

v2.0.0

  • Uncertainty propagation through chains
  • Multi-agent collaboration on justifications
  • Automated workflow discovery

See also: