Skip to content

Add guardrails hard rules recipe#135

Open
webchick wants to merge 6 commits into
mainfrom
recipe/guardrails-hard-rules
Open

Add guardrails hard rules recipe#135
webchick wants to merge 6 commits into
mainfrom
recipe/guardrails-hard-rules

Conversation

@webchick

Copy link
Copy Markdown
Collaborator

Summary

  • Adds agents/guardrails_hard_rules_python/ — a new recipe demonstrating the post-LLM guardrail pattern
  • The LLM classifies user-submitted content as safe / review / block; deterministic hard rules can override to block regardless of what the LLM said
  • When a hard rule fires, the LLM's original reasoning is preserved in the returned verdict so every decision is auditable

What it teaches

The core insight: for any decision with real consequences, you shouldn't rely on the LLM alone. Hard rules catch unambiguous violations deterministically; the LLM handles the grey zone. This recipe shows how to layer the two cleanly — and how to preserve auditability when the deterministic layer wins.

Fills the Guardrails gap from the cookbook wishlist. Sourced from the classifiers/_helpers.py pattern in dependency-scout, simplified to a generic content moderation scenario.

Recipe structure

agents/guardrails_hard_rules_python/
├── models/signals.py          # ContentSignals input model
├── models/verdict.py          # LLMVerdict + Verdict (with overridden_by_hard_rule)
├── guardrails/hard_rules.py   # _hard_block() + apply_hard_rules() — pure functions
├── activities/classify.py     # Calls Claude via tool_choice, then apply_hard_rules
├── workflows/classify_workflow.py
├── worker.py
├── start_workflow.py          # Two examples: override fires, then LLM verdict stands
└── tests/test_guardrails.py   # 10 tests: unit, activity (mocked Anthropic), workflow

Test plan

  • uv run pytest tests/ --timeout=30 → 10/10 passing, no API key needed
  • ANTHROPIC_API_KEY=... uv run python -m worker + uv run python -m start_workflow — verify output matches README

🤖 Generated with Claude Code

webchick and others added 6 commits May 29, 2026 00:11
Adds a Claude Code slash command that analyzes an external GitHub repo,
identifies AI/Temporal patterns worth extracting as cookbook recipes,
and generates complete recipe scaffolds in the standard format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The cookbook targets AI engineers who know LLMs but are new to Temporal.
Recipes should teach agent patterns (loops, tool use, structured output,
multi-agent, human-in-loop) with Temporal as the invisible durability
layer — not Temporal infrastructure patterns like workflow deduplication
or heartbeats.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds patterns from the AI Cookbook wishlist (RAG, streaming, memory,
guardrails, chain-of-thought, cost tracking, etc.) and a coverage-gap
check so recipe proposals are ranked higher when they fill a known hole.
Also adds a 'well-engineered, not a demo' quality filter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the thin name+description proposals with structured proposal
cards: problem statement, source code excerpt, recipe structure sketch,
diff from nearest existing recipe, wishlist gap, and size estimate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
recipe-scout: analyzes a GitHub repo and produces reviewer-ready proposal
cards — no files written, just structured recommendations.

recipe-ify: takes a pattern description (or proposal card) and generates
the complete recipe — all files, runnable, PR-ready. Can be used
standalone without recipe-scout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Demonstrates a post-LLM guardrail layer that uses deterministic hard
rules to override an LLM's content moderation verdict, ensuring
policy-critical decisions cannot be bypassed by hallucination or
prompt injection. The LLM's original reasoning is preserved inside
any override so decisions remain auditable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant