temporalio · webchick · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
diff --git a/.claude/commands/recipe-ify.md b/.claude/commands/recipe-ify.md
@@ -0,0 +1,132 @@
+# Recipe-ify
+
+Generate a complete, PR-ready recipe for the AI Cookbook from a pattern description.
+
+Usage: `/project:recipe-ify <pattern description or proposal card>`
+
+The input can be:
+- A proposal card from `/project:recipe-scout`
+- A freeform description ("generate a RAG pipeline recipe using OpenAI")
+- Anything in between
+
+---
+
+## What you do
+
+You are an expert at writing reference-quality AI Cookbook recipes. Generate ALL files for the recipe described in `$ARGUMENTS`, following the conventions below exactly. Produce complete, runnable files — not stubs or placeholders.
+
+**Audience reminder:** Recipes target AI Engineers who are comfortable with LLMs and agents but are new to Temporal. The AI pattern is the hero; Temporal is the invisible durability layer underneath. Don't over-explain Temporal mechanics — focus on making the AI concept clear.
+
+---
+
+## Cookbook conventions
+
+**Directory:** `{category}/{recipe-name}_python/`
+Categories: `foundations` (single LLM call or simple pattern), `agents` (agentic loops, tool use), `deep_research` (multi-agent), `mcp` (MCP servers)
+
+**Naming:**
+- Task queue: `{recipe-name}-task-queue`
+- Workflow class: `PascalCaseWorkflow`
+- Activity functions: `snake_case`
+- Request/response models: `ActivityNameRequest`, `ActivityNameResponse`
+
+**Always:**
+- LLM clients: `max_retries=0` — Temporal handles retries, not the client
+- Data converter: `pydantic_data_converter` everywhere — in `Client.connect()`, in `WorkflowEnvironment.start_time_skipping()`
+- Activity timeouts: always specify `start_to_close_timeout` (30s default; increase for research/LLM tasks)
+- Non-retryable errors: catch them and raise `ApplicationError(..., non_retryable=True)`
+- Python: `>=3.10,<3.14`
+- Temporalio: `>=1.15.0,<2`
+
+---
+
+## Files to generate
+
+### `README.md`
+
+Must open with this exact front matter block:
+```
+<!--
+description: One-sentence description of what the recipe demonstrates.
+tags: [category, python, provider]
+priority: 500
+-->
+```
+Then: title, 1–2 paragraph overview of what the recipe teaches, prerequisites, how to run:
+```
+uv sync
+uv run python -m worker        # terminal 1
+uv run python -m start_workflow  # terminal 2
+```
+End with what to expect in the output.
+
+### `pyproject.toml`
+```toml
+[project]
+name = "cookbook-{recipe-name}-python"
+version = "0.1"
+description = "..."
+authors = [{ name = "Temporal Technologies Inc", email = "sdk@temporal.io" }]
+requires-python = ">=3.10,<3.14"
+readme = "README.md"
+license = "MIT"
+dependencies = [
+    "temporalio>=1.15.0,<2",
+    # LLM provider SDK
+]
+
+[dependency-groups]
+dev = [
+    "pytest>=9.0.3",
+    "pytest-timeout>=2.4.0",
+    "pytest-asyncio>=0.26.0",
+]
+
+[tool.pytest.ini_options]
+pythonpath = ["."]
+```
+
+### `worker.py`
+```python
+import asyncio
+from temporalio.client import Client
+from temporalio.worker import Worker
+from temporalio.contrib.pydantic import pydantic_data_converter
+
+async def main():
+    client = await Client.connect("localhost:7233", data_converter=pydantic_data_converter)
+    worker = Worker(client, task_queue="...-task-queue", workflows=[...], activities=[...])
+    await worker.run()
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+### `start_workflow.py`
+Connect with `pydantic_data_converter`, call `execute_workflow`, print result.
+
+### `workflows/{name}.py`
+- `@workflow.defn` class with `@workflow.run` method
+- Calls activities via `workflow.execute_activity(fn, request, start_to_close_timeout=...)`
+- Pure orchestration — no LLM calls, no I/O
+
+### `activities/{name}.py`
+- `@activity.defn` functions
+- LLM client initialized with `max_retries=0`
+- Request model defined at top of file (dataclass or Pydantic `BaseModel`)
+- Catch non-retryable API errors → `ApplicationError(..., non_retryable=True)`
+
+### `tests/test_{name}.py`
+- `@pytest.mark.asyncio` and `@pytest.mark.timeout(30)` on every test
+- Use `WorkflowEnvironment.start_time_skipping(data_converter=pydantic_data_converter)`
+- Register mock activities in the Worker to avoid real API calls
+- Cover at minimum: happy path, and the key edge case the recipe is about
+
+---
+
+## After generating files
+
+1. Report what was created and the directory path
+2. Show how to run: `cd {dir} && uv sync && uv run pytest tests/`
+3. List any env vars needed (API keys, etc.) and where to set them
+4. Note any deliberate simplifications made to keep the recipe bite-sized
diff --git a/.claude/commands/recipe-scout.md b/.claude/commands/recipe-scout.md
@@ -0,0 +1,90 @@
+# Recipe Scout
+
+Analyze an external project and identify which parts would make good AI Cookbook recipes.
+
+Usage: `/project:recipe-scout <github-url>`
+
+## What you do
+
+You are an expert at spotting teachable, self-contained AI patterns in real-world projects. Your job is to produce proposal cards that a reviewer — who may never have seen the source project — can use to decide what's worth building into a recipe.
+
+**Audience reminder:** The AI Cookbook targets AI Engineers who are comfortable with LLMs and agents but are new to Temporal. Recipes should teach *AI building blocks* — patterns for how agents think, decide, call tools, and coordinate — with Temporal providing durability underneath. Do NOT propose patterns that are primarily about Temporal orchestration, distributed systems, or infrastructure; those belong in Temporal's own documentation, not here.
+
+---
+
+### Step 1 — Fetch and analyze the project
+
+Fetch the repository at `$ARGUMENTS`. Collect:
+- The README (for intent and architecture overview)
+- The full file tree (via GitHub API: `https://api.github.com/repos/{owner}/{repo}/git/trees/main?recursive=1`)
+- Key source files: LLM integration code, agent/tool patterns, prompt construction, workflow definitions
+
+Look specifically for these **AI building block** patterns, which make strong recipes:
+- **Agentic loop** — LLM called in a loop until a stop condition (tool use, stop sequence, empty tool calls)
+- **Forced completion** — On the final loop iteration, `tool_choice` is constrained to a specific tool so the agent must commit to a decision rather than looping forever
+- **Tool calling** — LLM invokes structured tools; results fed back into the conversation
+- **Parallel tool calls** — LLM requests multiple tools simultaneously; all results must be collected before the next turn
+- **Multi-agent coordination / agent supervisor** — One agent spawns or delegates to sub-agents; results are aggregated
+- **Structured output** — LLM output is parsed and validated against a Pydantic schema
+- **Human-in-the-loop** — Workflow pauses and waits for a human decision before continuing
+- **Streaming output** — Activity emits incremental tokens/chunks rather than waiting for full completion
+- **RAG (retrieval-augmented generation)** — Retrieved context injected into the prompt before calling the LLM
+- **Short-term memory** — Conversation history carried across turns within a single workflow run
+- **Long-term memory** — Facts or summaries persisted across workflow runs and retrieved on demand
+- **Context summarization** — Long conversation history compressed (e.g., via `continue_as_new`) to stay within context limits
+- **Guardrails** — LLM output checked against a policy before being acted on; rejected outputs are blocked or re-requested
+- **Chain-of-thought / tree-of-thought** — LLM explicitly reasons through steps before producing a final answer
+- **Prompt injection prevention** — Untrusted external data is isolated from control instructions (e.g., XML tags, separate message turns)
+- **Dynamic system prompts** — System instructions constructed at runtime from context (user prefs, retrieved docs, current state)
+- **Cost/token tracking** — Token usage recorded per workflow run for budgeting or rate-limiting
+- **Multi-provider LLM abstraction** — Single interface that dispatches to Anthropic, OpenAI, LiteLLM, or local models
+
+Ignore patterns that are primarily about Temporal internals (workflow ID policies, heartbeats, signal/query handlers, replay determinism) unless they are a natural, invisible part of an AI pattern above.
+
+---
+
+### Step 2 — Produce proposal cards
+
+The cookbook has a wishlist of use cases not yet covered. Patterns that fill one of these gaps should be ranked higher:
+- RAG pipeline
+- Streaming output
+- Short-term or long-term memory
+- Context summarization (ContinueAsNew)
+- Agent supervisor / multi-agent swarm
+- Guardrails
+- Chain-of-thought / tree-of-thought
+- Cost/token tracking
+- Trigger-based AI (event-driven or timer-based)
+- Web crawler
+
+For each candidate pattern you find, evaluate:
+1. **Is it an AI building block?** Would an AI engineer recognize this as a useful pattern for their LLM/agent work, independent of what orchestrator they use?
+2. **Is it well-engineered, not a demo?** The cookbook publishes reference-quality code, not flashy one-offs.
+3. **Is it self-contained?** Can it stand alone as a 200–400 line recipe without pulling in the entire project?
+4. **Is it teachable?** Does it demonstrate a single clear concept a developer can learn from?
+5. **Is it novel vs. existing recipes?** Check existing recipes in this repo (foundations/, agents/, deep_research/, mcp/).
+6. **Does it fill a wishlist gap?** Cross-reference against the coverage wishlist above.
+
+Rank the top 2–4 patterns. For each, write a proposal card with the following sections — written so a reviewer who has never seen the source project can evaluate it:
+
+**Proposed recipe:** `{category}/{recipe-name}_python`
+
+**One-line description:** _(the README front matter `description` field)_
+
+**The problem it solves:** In 2–3 sentences: what goes wrong if a developer doesn't know this pattern? What mistake do they typically make, and what does that cost them?
+
+**The pattern in the source:** A short code excerpt (10–25 lines) from the source project that shows the pattern at its clearest. If the source isn't Python or doesn't translate directly, show equivalent pseudocode. This is the "exhibit A" that justifies the recipe.
+
+**How the recipe would be structured:** A brief outline — what the workflow does, what the key activity does, what tool or API is involved. Not full code, 5–10 bullet points.
+
+**Closest existing recipe and what's different:** Name the most similar recipe already in the cookbook and state specifically what this adds or changes. If there's no close match, say so.
+
+**Wishlist gap filled:** Which item from the coverage wishlist does this address, if any? If none, say so.
+
+**Estimated size:** Rough line count for the finished recipe (all files combined). Flag anything over 400 lines as potentially too complex for a single recipe.
+
+---
+
+After the proposal cards, add an **Excluded patterns** section listing any patterns that were interesting but filtered out, with a one-line reason for each.
+
+To generate a recipe from one of these proposals, use `/project:recipe-ify` and paste in the proposal card.
diff --git a/agents/guardrails_hard_rules_python/README.md b/agents/guardrails_hard_rules_python/README.md
@@ -0,0 +1,52 @@
+<!--
+description: Demonstrates a post-LLM guardrail layer that uses deterministic hard rules to override an LLM's verdict, ensuring policy-critical decisions can never be bypassed by hallucination or prompt injection.
+tags: [agents, python, anthropic]
+priority: 500
+-->
+
+# Guardrails: Hard Rules
+
+This recipe shows how to combine an LLM classifier with a deterministic guardrail layer. The LLM provides nuanced judgment for ambiguous cases; hard rules act as a safety net for unambiguous policy violations, overriding the LLM's verdict regardless of what it concluded.
+
+The pattern answers a real problem: LLMs can be manipulated via prompt injection or simply hallucinate. For any decision with real consequences — content moderation, access control, transaction approval — you shouldn't rely on the LLM alone. Hard rules catch clear-cut cases deterministically; the LLM handles everything in the grey zone. Critically, when a hard rule fires, the LLM's original reasoning is preserved inside the override so every decision remains auditable.
+
+The recipe uses a content moderation scenario: user-submitted text is classified as `safe`, `review`, or `block`. Hard rules override to `block` when contact information or banned keywords are detected, regardless of what the LLM concluded.
+
+## Prerequisites
+
+- Python 3.10+
+- [uv](https://docs.astral.sh/uv/)
+- A running Temporal server: `temporal server start-dev`
+- `ANTHROPIC_API_KEY` environment variable set
+
+## Run it
+
+```bash
+uv sync
+
+# Terminal 1 — start the worker
+uv run python -m worker
+
+# Terminal 2 — submit two example workflows
+uv run python -m start_workflow
+```
+
+## Expected output
+
+```
+--- Example 1: Hard rule override ---
+Input: 'Great product! Contact me at john.doe@example.com for a special deal.'
+Classification: block
+Overridden by hard rule: True
+Reasoning: Hard rule: contains email address (privacy policy violation).
+
+[LLM classified as 'safe' — reasoning: The message is promotional but does not appear harmful.]
+
+--- Example 2: LLM verdict stands ---
+Input: 'I really enjoyed the hiking trail last weekend. The views were amazing!'
+Classification: safe
+Overridden by hard rule: False
+Reasoning: Positive personal experience with no policy concerns.
+```
+
+In Example 1, the LLM's classification and reasoning are preserved inside brackets — the override is fully auditable.
diff --git a/agents/guardrails_hard_rules_python/activities/classify.py b/agents/guardrails_hard_rules_python/activities/classify.py
@@ -0,0 +1,52 @@
+import anthropic
+from temporalio import activity
+from temporalio.exceptions import ApplicationError
+from pydantic import BaseModel
+
+from models.signals import ContentSignals
+from models.verdict import LLMVerdict, Verdict
+from guardrails.hard_rules import apply_hard_rules
+
+_SYSTEM = """You are a content moderation assistant. Classify the submitted text as:
+- safe: acceptable content with no policy concerns
+- review: borderline content that a human should check
+- block: clear policy violation (hate speech, harassment, explicit content, obvious spam)
+
+When uncertain, use 'review' — it's better to flag for human review than to miss a violation."""
+
+_SUBMIT_VERDICT_TOOL = {
+    "name": "submit_verdict",
+    "description": "Submit your content moderation classification.",
+    "input_schema": LLMVerdict.model_json_schema(),
+}
+
+
+class ClassifyRequest(BaseModel):
+    signals: ContentSignals
+    model: str = "claude-sonnet-4-6"
+
+
+@activity.defn
+async def classify(request: ClassifyRequest) -> Verdict:
+    client = anthropic.AsyncAnthropic(max_retries=0)
+
+    try:
+        response = await client.messages.create(
+            model=request.model,
+            max_tokens=512,
+            system=_SYSTEM,
+            messages=[
+                {"role": "user", "content": f"Classify this content:\n\n{request.signals.text}"}
+            ],
+            tools=[_SUBMIT_VERDICT_TOOL],
+            tool_choice={"type": "tool", "name": "submit_verdict"},
+        )
+    except anthropic.AuthenticationError as exc:
+        raise ApplicationError(str(exc), type="AuthenticationError", non_retryable=True) from exc
+    except anthropic.BadRequestError as exc:
+        raise ApplicationError(str(exc), type="BadRequestError", non_retryable=True) from exc
+
+    tool_block = next(b for b in response.content if b.type == "tool_use")
+    llm_verdict = Verdict.model_validate(tool_block.input)
+
+    return apply_hard_rules(request.signals, llm_verdict)
diff --git a/agents/guardrails_hard_rules_python/guardrails/hard_rules.py b/agents/guardrails_hard_rules_python/guardrails/hard_rules.py
@@ -0,0 +1,62 @@
+import re
+from models.signals import ContentSignals
+from models.verdict import Verdict
+
+_BANNED_KEYWORDS = ["buy now", "click here", "free money", "guaranteed winner"]
+
+
+def _hard_block(signals: ContentSignals) -> Verdict | None:
+    """Return a block Verdict if any hard rule matches, otherwise None."""
+    text_lower = signals.text.lower()
+
+    for keyword in _BANNED_KEYWORDS:
+        if keyword in text_lower:
+            return Verdict(
+                classification="block",
+                confidence=1.0,
+                reasoning=f"Hard rule: contains banned keyword '{keyword}'.",
+                overridden_by_hard_rule=True,
+            )
+
+    if re.search(r"\b\d{3}[-.()]?\d{3}[-.]?\d{4}\b", signals.text):
+        return Verdict(
+            classification="block",
+            confidence=1.0,
+            reasoning="Hard rule: contains phone number (privacy policy violation).",
+            overridden_by_hard_rule=True,
+        )
+
+    if re.search(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b", signals.text):
+        return Verdict(
+            classification="block",
+            confidence=1.0,
+            reasoning="Hard rule: contains email address (privacy policy violation).",
+            overridden_by_hard_rule=True,
+        )
+
+    return None
+
+
+def apply_hard_rules(signals: ContentSignals, llm_verdict: Verdict) -> Verdict:
+    """Post-filter: override the LLM verdict if a hard rule matches.
+
+    When a rule fires, the LLM's original reasoning is embedded in the
+    returned verdict so the override is auditable.
+    """
+    if llm_verdict.classification == "block":
+        return llm_verdict
+
+    hard = _hard_block(signals)
+    if hard is None:
+        return llm_verdict
+
+    return Verdict(
+        classification=hard.classification,
+        confidence=hard.confidence,
+        overridden_by_hard_rule=True,
+        reasoning=(
+            f"{hard.reasoning}\n\n"
+            f"[LLM classified as '{llm_verdict.classification}' — "
+            f"reasoning: {llm_verdict.reasoning}]"
+        ),
+    )