Ideas to Dig

Research topics for future guide improvements. Curated and validated.

Done

MCP Security Hardening ✅

Unified security research covering MCP vulnerabilities, prompt injection, and secret detection.

Completed: Security Hardening Guide covers:

CVE-2025-53109/53110, 54135, 54136 with mitigations
MCP vetting workflow with 5-minute audit checklist
MCP Safe List (community vetted)
Prompt injection evasion techniques (Unicode, ANSI, null bytes)
Secret detection tool comparison (Gitleaks, TruffleHog, GitGuardian)
Incident response procedures (secret exposed, MCP compromised)
3 new hooks: unicode-injection-scanner.sh, repo-integrity-scanner.sh, mcp-config-integrity.sh

High Priority

gstack-inspired commands (deferred from 2026-03-26 audit)

Audit of gstack (Garry Tan's Claude Code sprint toolkit) identified these gaps. HIGH-priority items were already implemented (/investigate, /qa, /canary, /land-and-deploy, /review-pr updates). Remaining items deferred here.

/autoplan — single command that chains CEO → Design → Eng review automatically with encoded decision principles. Classify decisions as Mechanical (auto-decide) vs Taste (surface to user). Prerequisite: our /plan-ceo-review and /plan-eng-review commands must be stable first. Source: gstack/autoplan/SKILL.md.

/office-hours — YC-style product diagnostic with 6 forcing questions before any planning. Upstream of /plan-ceo-review. Writes a design brief to ~/.claude/projects/. Source: gstack/office-hours/SKILL.md.

/design-review — live-site visual audit + fix loop. Screenshots a URL, audits spacing/typography/color/contrast, detects AI slop patterns (purple gradients, 3-column grids, centered everything). Requires browser tooling standardization first. Source: gstack/design-review/SKILL.md + design-checklist.md.

/benchmark — Core Web Vitals regression detection (TTFB, FCP, LCP, DOM load, bundle sizes). Baseline comparison with regression thresholds (>50% OR >500ms = regression). Source: gstack/benchmark/SKILL.md.

/retro — weekly retrospective with git log analysis per contributor. Current vs prior window. Source: gstack/retro/SKILL.md.

/freeze + /unfreeze — runtime directory edit lock via PreToolUse hook (hard deny, not warning). State file at ~/.claude/freeze-dir.txt. Composable with /careful into a /guard command. Source: gstack/freeze/SKILL.md.

/document-release — post-ship documentation updates. Reads diff since last release, updates README/ARCHITECTURE/CONTRIBUTING/CLAUDE.md, polishes CHANGELOG, cleans TODOS. Source: gstack/document-release/SKILL.md.

Effort compression tables — add to skill templates: human time vs AI time per task type. Architecture decision for skill format — discuss before implementing across all skills.

Trigger to implement: reader requests, or when we do a "workflow completeness" sprint.

Medium Priority

CI/CD Workflows Gallery ✅

Completed: GitHub Actions Workflows — 5 patterns using anthropics/claude-code-action (PR review, auto-review, issue triage, security, scheduled maintenance). Includes cost control, fork safety, Bedrock/Vertex auth alternatives. Cross-linked from section 9.3 of the main guide.

MCP Server Catalog

Exhaustive list of MCP servers with real-world use cases.

Topics:

Available servers by category (dev tools, databases, APIs)
Performance benchmarks vs native tools
Security trust levels per server
Custom server development patterns

Perplexity Query:

MCP Model Context Protocol servers catalog 2024-2025:
- Most useful servers for developers
- Performance comparison MCP vs native tools
- How to build custom MCP servers

Lower Priority

CLAUDE.md Patterns Library

Stack-specific templates for common project types.

Topics:

React/Next.js optimized configurations
Python/FastAPI patterns
Go project conventions
Monorepo configurations

Perplexity Query:

CLAUDE.md configuration examples by framework:
- React, Next.js, Vue patterns
- Python, FastAPI, Django patterns
- Best practices from GitHub repositories

Watching (Waiting for Demand)

a### prompt-caching MCP Plugin

MCP plugin that automates cache_control placement for developers building apps on the Anthropic SDK. Installed locally at /Users/florianbruniaux/Sites/prompt-caching and connected to Claude Code via ~/.claude.json.

Status: Testing in progress. Real usage data required before any documentation decision.

What we know:

29 stars, v1.3.0, solo maintainer — maintenance risk
Author-reported benchmarks (80-92% savings) — unverified, cannot cite
Fills a real gap: no other MCP tool does this; Spring AI / LiteLLM / Pydantic AI serve different audiences
Blog post (Mathieu Grenier) independently documents the same pain point + 5 antipatterns — score 3/5, worth integrating in "Strategy 6" regardless

Open questions:

Do real sessions on this project actually hit the cache? (run get_cache_stats after 10+ turns)
Is the plugin stable enough to recommend? Any errors, memory leaks, session issues?
What are the real savings on a CLAUDE.md-heavy project like this guide?

If test results are positive (cache hits confirmed, no stability issues):

Add to guide/ecosystem/third-party-tools.md with verified stats (not README claims)
Add to landing third-party tools section
Score upgrade: 3/5 → 4/5

If test results are inconclusive or plugin is unstable:

Move to Discarded Ideas
Keep the Mathieu Grenier blog post integration (independent value)

Check again: After 1 week of real usage

Multi-LLM Consultation Patterns

Using external LLMs (Gemini, GPT-4) as "second opinion" from Claude Code.

Status: No proven demand. Add if 3+ reader requests.

Research done (Jan 2026):

Simple approach: Bash script calling Gemini API
Production approach: Plano (overkill for solo devs)
Community adoption: Near zero in Claude Code users

If implementing:

examples/scripts/gemini-second-opinion.sh

Type-Driven API Design for AI Agent Efficiency

Schema-first development impact on Claude Code token consumption.

Status: Anecdotal only (no empirical data). Reevaluate if benchmarks emerge.

Resource evaluated (Feb 2026):

ShipTypes by Boris Tane (Cloudflare)
Score: 2/5 (Marginal) — Claims "types → fewer tokens" unverified
Full evaluation: docs/resource-evaluations/shiptypes-evaluation.md

What's missing:

Benchmark comparing token consumption: typed APIs (tRPC/Zod) vs untyped (REST/docs)
A/B test showing AI agent iterations with/sans types
Case study with reproducible metrics

Reevaluation triggers:

Academic paper/blog with empirical data (token consumption metrics)
Anthropic official recommendation on schema-first for Claude Code
5+ community discussions/issues requesting this topic

If validated (score upgrade to 4/5):

Add subsection in guide/core/methodologies.md (after CDD, line 172)
Use micro-integration template: docs/resource-evaluations/shiptypes-evaluation.md (section "Integration Plan")

Check again: August 2026

3-line mention in "See Also" section
No full guide (maintenance burden, scope creep)

Source: daily.dev article

Vibe Coding Discourse

Evolution of the "developer as architect" narrative in AI-assisted development.

Reference: Craig Adam - "Agile is Out, Architecture is Back"

Status: Watching. Term "vibe coding" now mainstream (Collins Word of the Year 2025).

Discarded Ideas

Idea	Reason Discarded
LLM Fine-tuning guide	Out of scope - users don't control model training
Model architecture internals	Too theoretical, not actionable
Token pricing optimization	Changes too frequently, use official docs
A2A Protocol (Agent-to-Agent)	Claude Code is single-agent with sub-agents, not true multi-agent
AgentOps Enterprise Dashboard	Infrastructure doesn't exist for CLI tool
LLM-as-a-Judge evaluation	Overkill for CLI, adds latency without proportional value
Decision Trajectory logging	No access to internal traces (black box)
4 Pillars formal framework	Too academic - guide already covers symptoms pragmatically
Canary/Blue-Green deployments	Infrastructure patterns, not relevant for CLI
Memory Poisoning defenses	Theoretical risk requiring prior system compromise
Prompt Engineering for Code Gen	Already well covered (xml_prompting, prompt_formula)
Context Window Optimization	Already well covered (context_management, context_triage)
Task Decomposition Patterns	Covered via plan_mode, interaction_loop
Agent Architecture Comparisons	Out of scope - not multi-agent theory
Real-World Case Studies	Non-verifiable metrics, marketing-prone
Comparison with Other Tools	Out of scope, rapid obsolescence

Contributing

Found something interesting? Add it with:

Topic name and why it matters
Specific research questions
Perplexity query to start

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas to Dig

Done

MCP Security Hardening ✅

High Priority

gstack-inspired commands (deferred from 2026-03-26 audit)

Medium Priority

CI/CD Workflows Gallery ✅

MCP Server Catalog

Lower Priority

CLAUDE.md Patterns Library

Watching (Waiting for Demand)

Multi-LLM Consultation Patterns

Type-Driven API Design for AI Agent Efficiency

Vibe Coding Discourse

Discarded Ideas

Contributing

FilesExpand file tree

IDEAS.md

Latest commit

History

IDEAS.md

File metadata and controls

Ideas to Dig

Done

MCP Security Hardening ✅

High Priority

gstack-inspired commands (deferred from 2026-03-26 audit)

Medium Priority

CI/CD Workflows Gallery ✅

MCP Server Catalog

Lower Priority

CLAUDE.md Patterns Library

Watching (Waiting for Demand)

Multi-LLM Consultation Patterns

Type-Driven API Design for AI Agent Efficiency

Vibe Coding Discourse

Discarded Ideas

Contributing