feat(claude_agent_sdk): add vulnerability detection agent cookbook by PedramNavid · Pull Request #595 · anthropics/claude-cookbooks

PedramNavid · 2026-05-04T20:49:38Z

Ports the vulnerability detection agent cookbook from the private repo (anthropics/claude-cookbooks-private#59, authored by @eugeneyan-ant) to the public repo.

What's new

claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb walks through threat-model → find → triage → report on a self-contained 45-line canary C target using claude-agent-sdk:

ClaudeSDKClient for the multi-turn bootstrap-then-interview threat model
Stateless query() for find/triage/report
Claude Code's built-in Read/Grep/Glob in place of a hand-rolled tool loop

Renumbered from 04_ (private) to 06_ (public) — 04_ and 05_ are already taken in public by the OpenAI migration and session browser cookbooks.

Supporting changes

vulnerability_detection_agent/canary/canary.c: three unlabeled memory-safety bugs (heap OOB, stack OOB, UAF)
registry.yaml: new entry under Claude Agent SDK + Cybersecurity; threat_intel_enrichment_agent also tagged Cybersecurity
authors.yaml: add eugeneyan-ant
pyproject.toml: +claude-agent-sdk so root uv sync covers the series
.gitignore: generated THREAT_MODEL.md
.claude/commands/add-registry.md: add Cybersecurity to category list

06_The_vulnerability_detection_agent.ipynb runs threat-model -> find -> triage -> report on a self-contained 45-line canary C target using claude-agent-sdk: ClaudeSDKClient for the multi-turn bootstrap-then- interview threat model, stateless query() for find/triage/report, with Claude Code's built-in Read/Grep/Glob in place of a hand-rolled tool loop. Supporting changes: - vulnerability_detection_agent/canary/canary.c: three unlabeled memory-safety bugs (heap OOB, stack OOB, UAF) - registry.yaml: new entry under Claude Agent SDK + Cybersecurity; threat_intel_enrichment_agent also tagged Cybersecurity - authors.yaml: add eugeneyan-ant - pyproject.toml: +claude-agent-sdk so root 'uv sync' covers the series - .gitignore: generated THREAT_MODEL.md - .claude/commands/add-registry.md: add Cybersecurity to category list

github-actions · 2026-05-04T20:50:06Z

Notebook Changes

This PR modifies the following notebooks:

📓 `claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb`

View diff

nbdiff /dev/null claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb (ecd439de08c6962b4371460d93033caf5fdd496d)
--- /dev/null  2026-05-04 20:48:17.183633
+++ claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb (ecd439de08c6962b4371460d93033caf5fdd496d)  (no timestamp)
## added /cells:
+  markdown cell:
+    source:
+      # The Vulnerability Detection Agent
+      
+      Security teams want to find memory-safety bugs before attackers do, but the existing tooling makes it hard: static analyzers produce so many false positives that reviewers stop reading, and fuzzers need a hand-written harness per entry point before they find anything. This cookbook shows how to use the **Claude Agent SDK** to build a vulnerability-discovery agent that reads source code with Claude Code's built-in `Read`, `Grep`, and `Glob` tools, reasons about which inputs could corrupt memory, and writes findings a reviewer can act on.
+      
+      **By the end of this cookbook, you'll be able to:**
+      
+      - Run a bootstrap-then-interview threat model as a multi-turn `ClaudeSDKClient` session that writes `THREAT_MODEL.md`
+      - Drive an agentic find loop with built-in `Read`/`Grep`/`Glob` tools instead of hand-rolled file access
+      - Chain find, triage, and report as separate `query()` calls that emit schema-conformant JSON
+      
+      ## Prerequisites
+      
+      **Required knowledge:**
+      - Python fundamentals, including `async`/`await`
+      - Enough C to read a 45-line file and recognize a `memcpy`
+      
+      **Required tools:**
+      - Python 3.11+
+      - Node.js 18+ and the Claude Code CLI: `npm install -g @anthropic-ai/claude-code`
+      - An Anthropic API key ([get one here](https://console.anthropic.com))
+      
+      **Required for any real target:** authorization to assess the code you point this at. This notebook ships a tiny self-contained `canary.c` with planted bugs so you can run everything end-to-end without touching production code.
+  markdown cell:
+    source:
+      ## Step 1: Set up the environment and engagement context
+      
+      We define an `ENGAGEMENT_CONTEXT` block that we pass as `system_prompt` on every agent. It records the scope of this assessment (authorized by the code owner, isolated read-only sandbox, findings headed for responsible disclosure) so every step in the pipeline operates against the same documented ground rules. Keep those three claims true for any real target.
+      
+      > **A note on cyber safeguards:** Claude applies [real-time cyber safeguards](https://support.claude.com/en/articles/14604842-real-time-cyber-safeguards-on-claude) at the API layer. If your work on a real codebase triggers these safeguards, apply to the Cyber Verification Program (CVP) via that page: a free application-based program that lets professionals continue legitimate dual-use security work with minimal interruption.
+  code cell:
+    source:
+      %%capture
+      %pip install -U claude-agent-sdk python-dotenv
+  code cell:
+    source:
+      import json
+      from collections.abc import AsyncIterator
+      from pathlib import Path
+      
+      from dotenv import load_dotenv
+      
+      from claude_agent_sdk import (
+          AssistantMessage,
+          ClaudeAgentOptions,
+          ClaudeSDKClient,
+          Message,
+          ResultMessage,
+          TextBlock,
+          ToolUseBlock,
+          query,
+      )
+      
+      load_dotenv()
+      
+      MODEL_NAME = "claude-opus-4-7"
+      # This notebook expects to be run from the claude_agent_sdk/ directory
+      # (Jupyter's default when you open the file from there). The assert makes
+      # the failure explicit if the kernel was started elsewhere.
+      TARGET_DIR = Path("vulnerability_detection_agent/canary").resolve()
+      assert TARGET_DIR.is_dir(), f"run this notebook from claude_agent_sdk/ (got cwd={Path.cwd()})"
+      
+      ENGAGEMENT_CONTEXT = """\
+      ## Engagement context
+      
+      This is authorized security research conducted as a defensive security
+      assessment on a self-contained canary target vendored in this notebook. The
+      target is read-only source (no execution). Findings are collected for
+      demonstration and responsible-disclosure workflow testing.
+      """
+      
+      
+      async def collect(stream: AsyncIterator[Message]) -> str:
+          """Consume an Agent SDK message stream; print tool calls; return final text.
+      
+          Both ``query()`` and ``ClaudeSDKClient.receive_response()`` return an
+          ``AsyncIterator[Message]`` that terminates after a ``ResultMessage``.
+          This is the same ``async for msg in ...`` loop the other notebooks in this
+          series write inline; it is factored out here because this notebook runs
+          the loop four times (TM bootstrap, TM interview, find, triage) and the
+          ``isinstance`` ladder would otherwise repeat verbatim.
+          """
+          final = ""
+          async for msg in stream:
+              if isinstance(msg, AssistantMessage):
+                  for block in msg.content:
+                      if isinstance(block, ToolUseBlock):
+                          args = str(block.input)
+                          args = args if len(args) <= 120 else args[:120] + "...}"
+                          print(f"  [tool] {block.name} {args}")
+                      elif isinstance(block, TextBlock) and block.text.strip():
+                          final += block.text
+              elif isinstance(msg, ResultMessage) and msg.is_error:
+                  raise RuntimeError(msg.result)
+          return final
+      
+      
+      print(f"Model: {MODEL_NAME}")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          Model: claude-opus-4-7
+  markdown cell:
+    source:
+      ## Step 2: Load the canary target
+      
+      `vulnerability_detection_agent/canary/canary.c` is a ~45-line C program with three deliberately planted memory-safety bugs (a heap buffer overflow, a stack buffer overflow, and a use-after-free), each reachable through a different "magic byte" at the start of the input. The bugs aren't labeled; the find agent in Step 4 has to locate them by reading the logic the same way it would in real code. When you're ready to try your own code, point `TARGET_DIR` at your checkout.
+  code cell:
+    source:
+      print((TARGET_DIR / "canary.c").read_text())
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          // canary.c
+          // Entry: ./canary <input_file>
+          #include <stdio.h>
+          #include <stdlib.h>
+          #include <string.h>
+          
+          static void parse_alpha(const unsigned char *data, size_t len) {
+              unsigned char *buf = malloc(32);
+              memcpy(buf, data, len);
+              printf("alpha: %02x\n", buf[0]);
+              free(buf);
+          }
+          
+          static void parse_bravo(const unsigned char *data, size_t len) {
+              char name[16];
+              memcpy(name, data, len);
+              name[15] = 0;
+              printf("bravo: %s\n", name);
+          }
+          
+          static void parse_charlie(const unsigned char *data, size_t len) {
+              char *p = malloc(64);
+              if (len > 0 && data[0] == 0xff) {
+                  free(p);
+              }
+              memcpy(p, data, len < 64 ? len : 64);
+              printf("charlie: %p\n", (void *)p);
+          }
+          
+          int main(int argc, char **argv) {
+              if (argc < 2) return 1;
+              FILE *f = fopen(argv[1], "rb");
+              if (!f) return 1;
+              unsigned char buf[4096];
+              size_t n = fread(buf, 1, sizeof buf, f);
+              fclose(f);
+              if (n < 1) return 1;
+              switch (buf[0]) {
+                  case 'A': parse_alpha(buf + 1, n - 1); break;
+                  case 'B': parse_bravo(buf + 1, n - 1); break;
+                  case 'C': parse_charlie(buf + 1, n - 1); break;
+                  default: printf("unknown format\n");
+              }
+              return 0;
+          }
+          
+  markdown cell:
+    source:
+      ## Step 3: Threat-model the target (bootstrap, then interview)
+      
+      A threat model answers "what could go wrong with this system, who would do it, and which outcomes matter?" independently of any specific bug. A threat ("attacker achieves memory corruption via untrusted file parsing") survives a patch; a vulnerability ("line 31 doesn't bounds-check `len`") does not. The find loop hunts vulnerabilities; the threat model tells it where to hunt and tells triage how to score.
+      
+      We build it in two turns of one `ClaudeSDKClient` session:
+      
+      1. **Bootstrap.** Claude reads the code with the built-in `Read` tool and drafts the model (context, assets, entry points & trust boundaries, threats, and **open questions** the code can't answer).
+      2. **Interview.** The application owner answers the open questions, and Claude refines likelihood and impact, then writes `THREAT_MODEL.md` next to the target.
+      
+      Keeping both turns in one client session means the interview turn can see the bootstrap's tool results without us re-sending the source. On a 45-line canary both turns are thin; the point here is the **output shape**: the entry-points table (what you'd partition across parallel find-agents on a real repo) and the open-questions list (the bootstrap-to-interview handoff).
+  code cell:
+    source:
+      (TARGET_DIR / "THREAT_MODEL.md").unlink(missing_ok=True)
+      
+      TM_SCHEMA = """\
+      # Threat Model: <system name>
+      ## 1. System context
+      ## 2. Assets
+      | asset | description | sensitivity |
+      ## 3. Entry points & trust boundaries
+      | entry_point | description | trust_boundary | reachable_assets |
+      ## 4. Threats
+      | id | threat | surface | asset | impact | likelihood |
+      ## 5. Open questions
+      - (things the code alone cannot answer: deployment context, which inputs are
+        attacker-controlled in practice, blast radius)
+      """
+      
+      BOOTSTRAP_PROMPT = f"""\
+      You are bootstrapping a threat model from source code alone; no application
+      owner is available yet. Read `canary.c` in this directory and emit a draft
+      threat model in the schema below. Be explicit in section 5 about what you could
+      NOT determine from the code: those open questions are the agenda for the owner
+      interview. Do not write any files yet.
+      
+      ## Schema
+      
+      {TM_SCHEMA}
+      """
+      
+      OWNER_ANSWERS = """\
+      - Deployment: `canary` is a local CLI that reads a file path from argv; it is
+        not network-facing.
+      - Attacker control: the input file is fully attacker-controlled (think email
+        attachment or downloaded file opened by the user).
+      - Blast radius: the process runs as the invoking user with no sandboxing;
+        memory corruption is code execution as that user.
+      """
+      
+      INTERVIEW_PROMPT = f"""\
+      The application owner has now answered your open questions:
+      
+      {OWNER_ANSWERS}
+      
+      Refine the threat model: update likelihood and impact in section 4 using the
+      owner's answers, resolve every item in section 5 that the answers cover, and
+      add any new threats the deployment context implies. Keep the same schema, then
+      write the refined model to `THREAT_MODEL.md` in this directory.
+      """
+      
+      tm_options = ClaudeAgentOptions(
+          model=MODEL_NAME,
+          cwd=str(TARGET_DIR),
+          system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
+          allowed_tools=["Read", "Write", "Edit"],
+          disallowed_tools=["Bash"],
+          permission_mode="acceptEdits",
+      )
+      
+      async with ClaudeSDKClient(options=tm_options) as tm_agent:
+          # collect() fully drains receive_response() through its terminating
+          # ResultMessage, so the second query() sees a clean stream.
+          await tm_agent.query(BOOTSTRAP_PROMPT)
+          draft_tm = await collect(tm_agent.receive_response())
+          print("--- bootstrap draft ---\n" + draft_tm + "\n")
+      
+          await tm_agent.query(INTERVIEW_PROMPT)
+          await collect(tm_agent.receive_response())
+      
+      tm_path = TARGET_DIR / "THREAT_MODEL.md"
+      if not tm_path.exists():
+          raise RuntimeError("interview agent did not write THREAT_MODEL.md; check the trace above")
+      threat_model = tm_path.read_text()
+      print("--- refined THREAT_MODEL.md ---\n" + threat_model)
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+            [tool] Read {'file_path': 'vulnerability_detection_agent/canary/cana...}
+          --- bootstrap draft ---
+          `★ Insight ─────────────────────────────────────`
+          - The file dispatches on `buf[0]` into three parsers, each with a distinct memory-safety bug class: heap overflow, stack overflow, and use-after-free. This is a canonical "one bug per parser" canary.
+          - The `len` passed to each parser is `n - 1` (up to 4095), but buffers are sized 32/16/64, so every path is reachable with attacker-controlled overflow length from a single input file.
+          - Threat modeling from source alone can enumerate the *bug classes* and *entry points*, but it cannot tell you who supplies `argv[1]` in production — that determines whether these are local footguns or remote RCE primitives.
+          `─────────────────────────────────────────────────`
+          
+          # Threat Model: canary (file-format parser CLI)
+          
+          ## 1. System context
+          A small C command-line utility invoked as `./canary <input_file>`. It opens the supplied path, reads up to 4096 bytes, and dispatches on the first byte (`A`/`B`/`C`) to one of three parsers. No network, IPC, or privilege-management code is present in the source. Deployment context (who runs it, who supplies the file, whether it is wrapped by a service) is unknown from the code alone.
+          
+          ## 2. Assets
+          | asset | description | sensitivity |
+          | --- | --- | --- |
+          | process memory / control flow | heap and stack of the `canary` process, including return addresses and heap metadata | high — corruption yields arbitrary code execution in the process's security context |
+          | input file contents | bytes read from `argv[1]`; format selector plus parser payload | low–medium (content itself); high as an attack vector |
+          | host execution context | whatever uid/role/container the binary runs as; file-system reach of that context | unknown — depends on deployment |
+          
+          ## 3. Entry points & trust boundaries
+          | entry_point | description | trust_boundary | reachable_assets |
+          | --- | --- | --- | --- |
+          | `argv[1]` path | caller-supplied filesystem path opened with `fopen(..., "rb")` | caller → process (path-traversal / symlink exposure depends on caller privilege) | any file readable by the process |
+          | file contents (first 4096 bytes) | `fread` into `buf`, dispatched by `buf[0]` | file producer → parser | process memory via all three parsers |
+          | `parse_alpha` payload (`A` prefix) | bytes 1..n copied via `memcpy` into a 32-byte heap buffer | untrusted file → heap | heap adjacent to 32-byte allocation |
+          | `parse_bravo` payload (`B` prefix) | bytes 1..n copied into a 16-byte stack buffer | untrusted file → stack | return address, saved frame pointer, stack canary (if enabled) |
+          | `parse_charlie` payload (`C` prefix) | bytes 1..n copied into 64-byte heap buffer; first byte may trigger early `free` | untrusted file → heap | freed chunk metadata, tcache/fastbin state |
+          
+          ## 4. Threats
+          | id | threat | surface | asset | impact | likelihood |
+          | --- | --- | --- | --- | --- | --- |
+          | T1 | Heap buffer overflow: `parse_alpha` copies up to `n-1` (≤4095) bytes into a 32-byte `malloc` without bounds check (canary.c:8-9) | `A`-prefixed input | process memory | heap corruption → potential RCE | high given a malicious file |
+          | T2 | Stack buffer overflow: `parse_bravo` copies up to `n-1` bytes into a 16-byte stack array (canary.c:15-16); the trailing `name[15]=0` does not prevent the overflow, only truncates the printed string | `B`-prefixed input | saved return address / stack | classic stack smash → RCE if no/bypassed stack protector | high |
+          | T3 | Use-after-free / double-free vector: `parse_charlie` frees `p` when `data[0]==0xff` then immediately `memcpy`s into the freed chunk (canary.c:23-26); the subsequent `printf` also leaks the freed pointer | `C`-prefixed input with first payload byte `0xff` | heap allocator metadata | UAF write → heap grooming, info leak of freed address | medium–high |
+          | T4 | Unchecked `malloc` return in `parse_alpha` and `parse_charlie` (canary.c:8, 22) | any `A` or `C` input under memory pressure | process stability | NULL-deref DoS | low in normal ops |
+          | T5 | Pointer disclosure via `printf("%p", p)` in `parse_charlie` (canary.c:27) | any `C` input | ASLR secret | leaks heap address, aids exploitation of T1/T3 | high when combined with T1/T3 |
+          | T6 | Path handling of `argv[1]`: no canonicalization or allow-list; process will open any readable path including symlinks and device nodes | caller-controlled argv | host files | info disclosure or hang (e.g., `/dev/zero`) depending on who runs it | unknown — depends on invoker privilege |
+          | T7 | Silent truncation: only the first 4096 bytes are read; parsers operate on truncated data, which can mask malformed-file detection upstream | any input | integrity of downstream decisions | logic bug, not memory-safety | low |
+          
+          ## 5. Open questions
+          - **Who supplies `argv[1]`?** Local user only, a setuid wrapper, a web upload handler, an MTA, a sandbox runner? This determines whether T1–T3 are local-only footguns or remote code-execution primitives.
+          - **What uid / capabilities / container does the binary run as?** Blast radius of successful RCE hinges on this (root vs. nobody vs. seccomp-confined).
+          - **Compiler and link flags in production builds:** is `-fstack-protector-strong`, `-D_FORTIFY_SOURCE=2`, PIE, RELRO, ASLR, or CFI enabled? These materially change exploitability of T2 and the usefulness of T5's leak.
+          - **Allocator in use** (glibc ptmalloc, musl, jemalloc, hardened_malloc). T3's UAF exploit primitives differ by allocator.
+          - **Is `canary` invoked directly or via a wrapper that validates the file first?** A front-end parser / size cap / magic-byte allow-list upstream could neutralize T1–T3.
+          - **Are crashes monitored?** Unchecked `malloc` (T4) and T6 DoS matter only if uptime is a requirement; source alone does not say.
+          - **Intended threat model for the format itself:** are `A`/`B`/`C` formats specified anywhere (docs, RFC, sibling files) so we can tell "malformed" from "adversarial"?
+          - **Expected lifetime and distribution** of the binary: is this a demo/test fixture (the name "canary" suggests so) or shipped to customers? Affects remediation priority and disclosure path.
+          
+            [tool] Write {'file_path': 'vulnerability_detection_agent/canary/THRE...}
+          --- refined THREAT_MODEL.md ---
+          # Threat Model: canary (local file-format parser CLI)
+          
+          ## 1. System context
+          `canary` is a local command-line utility invoked as `./canary <input_file>`. It
+          opens the supplied path, reads up to 4096 bytes, and dispatches on the first
+          byte (`A`/`B`/`C`) into one of three parsers. It is **not network-facing**. The
+          input file is **fully attacker-controlled** in the intended threat model —
+          typical delivery is a file received by email or downloaded from the web and
+          then opened by the local user. The process runs as the **invoking user with no
+          sandboxing**, so any memory-safety bug that yields control flow yields code
+          execution in that user's security context (home directory, SSH keys, browser
+          profile, cloud credentials, any writable path the user has).
+          
+          ## 2. Assets
+          | asset | description | sensitivity |
+          | --- | --- | --- |
+          | invoking user's account | uid, home directory, shell history, SSH keys, browser profile, cloud tokens, dotfiles, writable mounts | high — compromise equals full user takeover and a foothold for lateral movement |
+          | process memory / control flow | heap and stack of the `canary` process, return addresses, heap metadata | high — corruption is the exploitation primitive for the user-account asset |
+          | input file contents | bytes read from `argv[1]`; format selector plus parser payload | low as data; the primary *attack vector* |
+          | host integrity | persistence locations writable as the user (crontab, `~/.bashrc`, `~/.config/systemd/user`, login items) | high — trivially reachable post-exploitation; no sandbox to contain it |
+          
+          ## 3. Entry points & trust boundaries
+          | entry_point | description | trust_boundary | reachable_assets |
+          | --- | --- | --- | --- |
+          | `argv[1]` path | caller-supplied filesystem path opened with `fopen(..., "rb")` | local user → process (same uid; the boundary is between the untrusted *file content* and the parser, not between the caller and process) | any file readable by the user |
+          | file contents (first 4096 bytes) | `fread` into `buf`; dispatch on `buf[0]` | untrusted attacker (file author) → parser | process memory via all three parsers |
+          | `parse_alpha` payload (`A` prefix) | bytes 1..n copied via `memcpy` into a 32-byte heap buffer | untrusted file → heap | heap adjacent to 32-byte allocation |
+          | `parse_bravo` payload (`B` prefix) | bytes 1..n copied into a 16-byte stack buffer | untrusted file → stack | return address, saved frame pointer, stack canary (if enabled) |
+          | `parse_charlie` payload (`C` prefix) | bytes 1..n copied into a 64-byte heap buffer; first byte may trigger early `free` | untrusted file → heap | freed-chunk metadata, tcache/fastbin state |
+          
+          ## 4. Threats
+          | id | threat | surface | asset | impact | likelihood |
+          | --- | --- | --- | --- | --- | --- |
+          | T1 | Heap buffer overflow: `parse_alpha` copies up to `n-1` (≤4095) bytes into a 32-byte `malloc` without bounds check (canary.c:8-9) | `A`-prefixed attacker file | process memory → user account | critical — RCE as the invoking user; full account takeover, no sandbox containment | high — trivially reachable with a single crafted file |
+          | T2 | Stack buffer overflow: `parse_bravo` copies up to `n-1` bytes into a 16-byte stack array (canary.c:15-16); trailing `name[15]=0` only truncates the print, does not prevent the overflow | `B`-prefixed attacker file | saved return address → user account | critical — classic stack smash to RCE as the user; exploitability depends on stack-protector / PIE / ASLR in the shipped build | high |
+          | T3 | Use-after-free / double-free: `parse_charlie` frees `p` when `data[0]==0xff` then immediately `memcpy`s into the freed chunk (canary.c:23-26) | `C`-prefixed attacker file with first payload byte `0xff` | heap allocator metadata → user account | critical — heap-grooming primitive for RCE as the user | medium–high — reliability varies by allocator but the primitive is clean |
+          | T4 | Unchecked `malloc` return in `parse_alpha` and `parse_charlie` (canary.c:8, 22) | any `A` or `C` input under memory pressure | process stability | low — NULL-deref crash of a user-invoked CLI; annoyance, not compromise | low |
+          | T5 | Pointer disclosure via `printf("%p", p)` in `parse_charlie` (canary.c:27) | any `C` input | ASLR secret | high — directly hands the attacker a heap address, making T1/T3 reliable even with ASLR | high — unconditional on the `C` path |
+          | T6 | Path handling of `argv[1]`: no canonicalization or allow-list; `canary` opens any path the user can read, including symlinks and device nodes (e.g., `/dev/zero` hang) | caller-supplied argv | process stability / caller expectations | low — the caller is already the user, so there is no privilege boundary to cross via path tricks | low |
+          | T7 | Silent truncation: only the first 4096 bytes are read; malformed-file detection upstream can be bypassed by padding the exploit into the first 4 KB | any input | integrity of any upstream "scan then open" pipeline | low on its own; relevant if a scanner is put in front | low |
+          | T8 | **(new)** Social-engineering delivery: the attack surface is "user opens a file." Typical vectors are email attachments, messenger drops, and browser downloads. Any of T1/T2/T3 becomes a one-click RCE given a convincing lure | attacker-authored file delivered to the user | user account | critical — same as T1–T3; this threat names the delivery mechanism | high — the dominant real-world path to triggering T1–T3 |
+          | T9 | **(new)** File-association / handler registration: if `canary` is (or becomes) the registered handler for a file extension or MIME type, double-clicking a download auto-invokes it, removing the need to coach the user into a shell command | OS file-association layer | user account | critical — converts T8 from "run this binary on this file" to "open the attachment" | unknown without deployment config; flagged for owner |
+          | T10 | **(new)** Post-exploitation blast radius: no sandbox, so RCE in `canary` immediately has the user's full ambient authority — SSH keys, browser cookies/session tokens, cloud CLI credentials (`~/.aws`, `~/.config/gcloud`), persistence via `~/.bashrc`, user-level cron, user systemd units, login items | any of T1/T2/T3 succeeding | user account, connected systems | critical — lateral movement into email, cloud, source control is trivial from here | high conditional on T1/T2/T3 |
+          | T11 | **(new)** Corpus / fuzzing exposure: with three obvious memory bugs and a one-byte dispatch, the first hour of `afl-fuzz` or `libFuzzer` against `canary` will produce crashing inputs. If the binary is distributed, researchers and attackers will find these quickly | any attacker with the binary | disclosure timeline | high — forces a short remediation window | high |
+          
+          ## 5. Open questions
+          All prior open questions are resolved by the owner's answers, except the
+          following residual items, which are narrower and still code- or
+          build-dependent:
+          
+          - **Shipped compiler / linker hardening:** is the distributed build compiled
+            with `-fstack-protector-strong`, `-D_FORTIFY_SOURCE=2`, `-fPIE`/`-pie`, full
+            RELRO, and with ASLR enabled on the target OS? Changes exploit reliability of
+            T2 and the value of T5's leak, but not the severity class.
+          - **Allocator in the shipped build** (glibc ptmalloc vs. musl vs. hardened
+            allocator): determines which T3 exploitation primitives are practical.
+          - **File-association registration (T9):** does any installer or desktop entry
+            register `canary` as a handler for an extension or MIME type? If yes, T8
+            collapses into a pure double-click RCE.
+          - **Signed / notarized distribution:** is the binary shipped in a way that OS
+            gatekeepers (macOS Gatekeeper, Windows SmartScreen, Linux desktop "executable
+            bit" prompts) would warn the user before first run? Affects the friction on
+            T8 but not the final impact.
+          - **Telemetry / crash reporting:** are crashes from T1–T4 reported anywhere the
+            defender would see them, or does a failed exploit attempt go unnoticed?
+          
+  markdown cell:
+    source:
+      ## Step 4: Run the agentic find loop
+      
+      With the raw Messages API this step would be a hand-written `while stop_reason == "tool_use"` loop with custom file tools. The Agent SDK handles all of that: we call `query()` once with `allowed_tools=["Read", "Grep", "Glob"]` and `disallowed_tools=["Bash"]`, and Claude Code runs the explore-read-reason loop on its own. `cwd=str(TARGET_DIR)` points the agent at the canary, and `system_prompt={"type": "preset", "preset": "claude_code", "append": ...}` keeps Claude Code's default system prompt (which already tells the agent its working directory) while appending our engagement context, so the agent never has to guess its own location. With `Bash`/`Write`/`Edit` withheld the agent stays read-only.
+      
+      **The most important part of the prompt is still the quality-tier rubric.** Without it, LLM vuln hunters report every null-pointer dereference and failed assertion they can find, which are real crashes but almost never exploitable. The rubric tells the agent which crash classes to submit (heap/stack overflow, use-after-free, controlled-address write) and which are signposts to keep reading past. This one block is most of the difference between a report a security engineer acts on and one they ignore.
+      
+      A production version would add `"Bash"` to `allowed_tools` so the agent can compile with `-fsanitize=address` and confirm each crash; that belongs inside a locked-down container, so this notebook stays read-only.
+  code cell:
+    source:
+      FIND_PROMPT = f"""\
+      Find memory-safety bugs in the target source tree using the file tools
+      available to you.
+      
+      ## Threat model
+      
+      Focus on the entry points and threats identified here; you do not need to
+      re-derive them.
+      
+      {threat_model}
+      
+      ## Quality tiers: what to report
+      
+      **HIGH VALUE (report these):**
+      - heap-buffer-overflow (especially WRITE)
+      - heap-use-after-free / double-free
+      - stack-buffer-overflow
+      - global-buffer-overflow
+      
+      **LOW VALUE (note but keep looking):**
+      - assertion failures (clean abort, no corruption)
+      - stack exhaustion from recursion (DoS only)
+      - null-pointer deref at fixed small offsets
+      
+      ## Output
+      
+      For each HIGH VALUE finding emit a block:
+      
+      <finding>
+      <id>F-NN</id>
+      <file>path:line</file>
+      <category>heap-buffer-overflow | stack-buffer-overflow | use-after-free | ...</category>
+      <description>one paragraph: root cause, attacker control, trigger condition</description>
+      </finding>
+      """
+      
+      find_options = ClaudeAgentOptions(
+          model=MODEL_NAME,
+          cwd=str(TARGET_DIR),
+          system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
+          allowed_tools=["Read", "Grep", "Glob"],
+          disallowed_tools=["Bash"],
+      )
+      
+      findings_text = await collect(query(prompt=FIND_PROMPT, options=find_options))
+      print("\n" + findings_text)
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+            [tool] Glob {'pattern': '**/*'}
+            [tool] Read {'file_path': 'vulnerability_detection_agent/canary/cana...}
+          
+          `★ Insight ─────────────────────────────────────`
+          - The one-byte dispatch at line 38 makes each parser independently reachable with a trivial file prefix (`A`/`B`/`C`), so each bug is a standalone attack surface.
+          - `parse_bravo`'s `name[15]=0` is a common false-safety pattern: it null-terminates for the `printf`, but the `memcpy` on line 16 has already written past the 16-byte frame before that line runs.
+          - `parse_charlie` composes two primitives (conditional free then unconditional write) into a clean UAF whose trigger byte is attacker-chosen, which is rarer than an accidental UAF and nastier to fix.
+          `─────────────────────────────────────────────────`
+          
+          <finding>
+          <id>F-01</id>
+          <file>canary.c:8-9</file>
+          <category>heap-buffer-overflow</category>
+          <description>`parse_alpha` allocates a fixed 32-byte heap buffer and then `memcpy`s `len` attacker-controlled bytes into it with no bounds check. `len` is `n-1` where `n` is up to 4096, so an `A`-prefixed file delivers up to 4095 bytes into the 32-byte chunk, overwriting adjacent heap metadata and neighboring allocations. Trigger: any file whose first byte is `A` and whose total size exceeds 33 bytes.</description>
+          </finding>
+          
+          <finding>
+          <id>F-02</id>
+          <file>canary.c:15-16</file>
+          <category>stack-buffer-overflow</category>
+          <description>`parse_bravo` copies up to `n-1` attacker-controlled bytes into a 16-byte stack array `name[16]` without bounds-checking `len`. The subsequent `name[15]=0` only truncates the `printf` output; the out-of-bounds write in `memcpy` has already clobbered the saved frame pointer, return address, and any stack canary. Trigger: any file whose first byte is `B` and whose payload exceeds 16 bytes.</description>
+          </finding>
+          
+          <finding>
+          <id>F-03</id>
+          <file>canary.c:22-26</file>
+          <category>use-after-free</category>
+          <description>`parse_charlie` allocates 64 bytes, conditionally frees `p` when the first payload byte equals `0xff`, and then unconditionally `memcpy`s into `p`. When the trigger byte is present, the `memcpy` writes into a freed chunk, corrupting allocator freelist metadata (tcache/fastbin on glibc) and yielding a standard heap-grooming primitive toward arbitrary write / control-flow hijack. The subsequent `printf("%p", p)` also leaks the heap pointer, making exploitation reliable under ASLR. Trigger: a file starting with `C` followed by byte `0xff`.</description>
+          </finding>
+          
+          Three high-value findings (F-01 heap overflow, F-02 stack overflow, F-03 UAF with pointer disclosure) match threats T1/T2/T3+T5 in the threat model. No additional high-value memory-safety bugs in the 45-line source beyond these.
+  markdown cell:
+    source:
+      ## Step 5: Triage the raw findings
+      
+      The find agent is tuned for recall, so its output usually contains duplicates (one root cause reached from two paths) and occasionally a finding that doesn't hold up. Triage is the filter: a fresh `query()` re-reads the code, **verifies** each finding against the actual lines, **collapses duplicates** by root cause, and **re-derives severity** from reachability across the trust boundaries in the threat model. We deliberately don't let triage inherit the find agent's severity scores; re-deriving them independently is a cheap way to catch overconfidence.
+  code cell:
+    source:
+      TRIAGE_PROMPT = f"""\
+      Triage these findings against the source in this directory and the threat model
+      below. For each: verify it is real (cite the line), derive severity from
+      reachability across the trust boundaries in the threat model, and collapse
+      duplicates by root cause.
+      
+      ## Threat model
+      
+      {threat_model}
+      
+      ## Raw findings
+      
+      {findings_text}
+      """
+      
+      triage_options = ClaudeAgentOptions(
+          model=MODEL_NAME,
+          cwd=str(TARGET_DIR),
+          system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
+          allowed_tools=["Read", "Grep"],
+          disallowed_tools=["Bash"],
+      )
+      
+      triaged_text = await collect(query(prompt=TRIAGE_PROMPT, options=triage_options))
+      print("\n" + triaged_text)
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+            [tool] Read {'file_path': 'vulnerability_detection_agent/canary/cana...}
+          
+          `★ Insight ─────────────────────────────────────`
+          - All three bugs sit behind the same trust boundary (attacker-authored file → parser), reached by a single-byte dispatch in `main`. No auth, no sandbox, same-uid blast radius, so reachability is identical for all three and severity is driven by the memory-corruption primitive itself.
+          - F-03 is really two bugs fused at one call site: the UAF (T3) and the `printf("%p")` heap-pointer disclosure (T5). Collapsing them under F-03 is correct by root cause (both are in `parse_charlie`'s 6 lines), but the leak is what upgrades the UAF from "unreliable under ASLR" to "one-shot".
+          `─────────────────────────────────────────────────`
+          
+          ## Triage
+          
+          | id | verdict | line(s) | threat | severity | notes |
+          | --- | --- | --- | --- | --- | --- |
+          | F-01 | real | canary.c:8-9 | T1 | **Critical** | `malloc(32)` then `memcpy(buf, data, len)` with `len` up to 4095 (from `main` `n-1`, line 39). Attacker file `A` + ≥33 bytes overflows heap; no sandbox → RCE as invoking user. |
+          | F-02 | real | canary.c:15-16 | T2 | **Critical** | `char name[16]` then `memcpy(name, data, len)`; `name[15]=0` on line 17 runs *after* the OOB write and only truncates the `printf`. Classic stack smash; exploit reliability depends on shipped hardening (open question in threat model) but severity class unchanged. |
+          | F-03 | real | canary.c:22-27 | T3 + T5 | **Critical** | `malloc(64)` on line 22, conditional `free(p)` on line 24 when `data[0]==0xff`, unconditional `memcpy` on line 26 → UAF into freelist metadata. Line 27 `printf("charlie: %p", p)` leaks the heap pointer unconditionally, collapsing ASLR for both T1 and T3. Keep as one finding (root cause = `parse_charlie`), but call out the leak explicitly. |
+          
+          **Duplicates:** none to collapse. Three distinct root causes in three distinct parsers.
+          
+          **Coverage check against threat model:** F-01/F-02/F-03 cover T1/T2/T3/T5. T4 (NULL-`malloc`) and T6–T7 (path handling, 4 KB truncation) are intentionally out of scope as memory-safety findings and the threat model already rates them low. T8–T11 are delivery/deployment/program concerns, not source bugs — not expected in this pass.
+  markdown cell:
+    source:
+      ## Step 6: Emit a structured report
+      
+      Downstream systems (issue trackers, dashboards, SIEMs) need structured data. A final toolless `query()` converts the triaged findings into JSON that conforms to an explicit schema. We mark every field required and use `null` for "not applicable" so the model doesn't have to guess about optionality; in production you'd validate with `jsonschema` and retry on failure.
+  code cell:
+    source:
+      # Every key is in `required` so the model never silently drops a field; values
+      # may be null when a field is not applicable (e.g., no recommendation yet).
+      REPORT_SCHEMA = {
+          "type": "object",
+          "properties": {
+              "findings": {
+                  "type": "array",
+                  "items": {
+                      "type": "object",
+                      "properties": {
+                          "id": {"type": ["string", "null"]},
+                          "category": {"type": ["string", "null"]},
+                          "severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
+                          "file": {"type": ["string", "null"]},
+                          "description": {"type": ["string", "null"]},
+                          "recommendation": {"type": ["string", "null"]},
+                      },
+                      "required": ["id", "category", "severity", "file", "description", "recommendation"],
+                  },
+              }
+          },
+          "required": ["findings"],
+      }
+      
+      REPORT_PROMPT = f"""\
+      Convert the triaged findings below into strict JSON conforming to this schema.
+      Every field is required; use null for not-applicable. Respond with JSON only,
+      no surrounding prose or code fences.
+      
+      ## Schema
+      
+      {json.dumps(REPORT_SCHEMA, indent=2)}
+      
+      ## Triaged findings
+      
+      {triaged_text}
+      """
+      
+      report_options = ClaudeAgentOptions(
+          model=MODEL_NAME,
+          system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
+          allowed_tools=[],
+      )
+      
+      report_json = await collect(query(prompt=REPORT_PROMPT, options=report_options))
+      raw = report_json.strip()
+      if raw.startswith("```"):
+          raw = raw.split("\n", 1)[1].rsplit("```", 1)[0]
+      try:
+          report = json.loads(raw)
+      except json.JSONDecodeError as e:
+          print(f"[report agent did not return clean JSON: {e}]\n")
+          print(raw)
+      else:
+          print(json.dumps(report, indent=2))
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          {
+            "findings": [
+              {
+                "id": "F-01",
+                "category": "Heap buffer overflow",
+                "severity": "critical",
+                "file": "canary.c",
+                "description": "parse_alpha allocates malloc(32) then memcpys attacker-controlled data of length up to 4095 bytes (from main's n-1 at line 39). An attacker-authored file beginning with 'A' followed by >=33 bytes overflows the heap allocation. With no sandbox, this yields RCE as the invoking user.",
+                "recommendation": "Validate len against the allocation size before memcpy (e.g., require len <= 32) or size the allocation from len. Reject oversized inputs at the parser boundary and add bounds-checked copy helpers."
+              },
+              {
+                "id": "F-02",
+                "category": "Stack buffer overflow",
+                "severity": "critical",
+                "file": "canary.c",
+                "description": "parse_bravo declares char name[16] then memcpys attacker-controlled data of length up to 4095 bytes into it. The name[15]=0 null-termination on line 17 executes after the out-of-bounds write and only truncates the subsequent printf; it does not prevent the stack smash. Exploit reliability depends on compile-time hardening but the vulnerability class is unchanged.",
+                "recommendation": "Bound len to sizeof(name)-1 before memcpy, or use strncpy/snprintf with explicit size. Ensure stack protectors (-fstack-protector-strong), FORTIFY_SOURCE, and PIE are enabled in the build."
+              },
+              {
+                "id": "F-03",
+                "category": "Use-after-free with heap address disclosure",
+                "severity": "critical",
+                "file": "canary.c",
+                "description": "parse_charlie mallocs 64 bytes at line 22, conditionally frees p at line 24 when data[0]==0xff, then unconditionally memcpys into p at line 26, corrupting freelist metadata. Line 27's printf(\"charlie: %p\", p) unconditionally leaks the heap pointer, defeating ASLR and upgrading the UAF (and the F-01 heap overflow) from unreliable to one-shot exploitable.",
+                "recommendation": "Set p=NULL after free and guard subsequent uses; restructure so the free and the write cannot both execute on the same pointer. Remove the %p disclosure (or gate it behind a debug build) to preserve ASLR."
+              }
+            ]
+          }
+  markdown cell:
+    source:
+      ## Summary and next steps
+      
+      You've built the full threat-model, find, triage, report pipeline with the Agent SDK: one multi-turn `ClaudeSDKClient` session for the threat model and three one-shot `query()` calls for the rest, with Claude Code's built-in file tools doing the exploration. The key patterns to take with you:
+      
+      - **`ClaudeSDKClient` for conversations, `query()` for one-shots.** The threat-model interview needs the bootstrap turn in context; find, triage, and report don't depend on each other's tool transcripts, so stateless calls are simpler.
+      - **`cwd` + `allowed_tools` replace hand-rolled tools.** `Read`/`Grep`/`Glob` scoped to the target directory is the whole find-agent scaffold.
+      - **Triage is a separate pass.** Re-verifying and re-scoring independently of the find agent catches overconfidence cheaply.
+      
+      ### Going further
+      
+      - **Use the hosted version.** [Claude Code Security](https://claude.com/claude-code-security) runs this same find-and-triage capability as a managed product, so you point it at a repo and Anthropic handles the sandboxing and scaling.
+      - **Scale to a real repo.** Point `cwd` at a real checkout, add `"Bash"` to `allowed_tools` inside a sandboxed container so the agent can compile with `-fsanitize=address` and confirm crashes, and spawn one `query()` per entry point from the threat model with `asyncio.gather`.
+      - **Wire the report into your tracker.** Validate Step 6's JSON with `jsonschema`, map it to SARIF or your ticket schema, and POST it.

Generated by nbdime

github-actions · 2026-05-04T20:50:13Z

Full Github Actions output

github-actions

PR Review

Recommendation: REQUEST_CHANGES

Summary

This PR ports a well-structured vulnerability detection agent cookbook from the private repo, demonstrating a threat-model → find → triage → report pipeline using the Claude Agent SDK. The architecture and pedagogy are solid, but there are two blocking issues that need resolution before merge.

Actionable Feedback (7 items)

Blocking:

06_The_vulnerability_detection_agent.ipynb (kernel metadata) — Python version mismatch: notebook was executed on Python 3.13.12 but pyproject.toml restricts to requires-python = ">=3.11,<3.13". Re-run the notebook under Python 3.12, confirm all cells execute cleanly, and commit the regenerated output. If 3.13 is genuinely required, widen the constraint via uv (not a direct edit) and update Prerequisites.
06_The_vulnerability_detection_agent.ipynb (setup cell, MODEL_NAME = "claude-opus-4-7") — Uses a different model than every other notebook in the claude_agent_sdk/ series (00_ through 05_ all use claude-opus-4-6) and differs from the CLAUDE.md canonical alias. Either align to claude-opus-4-6, or open a separate PR that upgrades the entire series together and updates CLAUDE.md. Silent divergence within the same series is confusing.

Important (non-blocking but should be fixed):

06_The_vulnerability_detection_agent.ipynb (setup cell, collect() docstring) — Multi-paragraph docstring violates project style (one short line max). Trim to """Drain a message stream, print tool calls, return final assistant text.""" and move the "why factored out" prose to the preceding markdown cell.
06_The_vulnerability_detection_agent.ipynb (report cell, code fence stripping) — The if raw.startswith("```"): strip is fragile: it fails if the model adds prose after the closing fence, or if the JSON contains a literal ```. At minimum add a comment noting the limitation; ideally use a regex or rely solely on the JSONDecodeError handler.
06_The_vulnerability_detection_agent.ipynb (find/triage cells) — collect() silently returns "" if the model emits only tool calls with no TextBlock. Add a guard after each stage that feeds the next: if not findings_text.strip(): raise RuntimeError("find agent produced no text; check trace above").
06_The_vulnerability_detection_agent.ipynb (report cell, REPORT_SCHEMA) — The "file" field conflates path and line range (e.g., "canary.c:8-9"), making programmatic extraction of line numbers require string parsing. Add a "line_range" field or rename to "location" with a comment explaining the format.
General: assert TARGET_DIR.is_dir() — Per the pattern used in other Agent SDK notebooks (e.g., 03_), prefer if not TARGET_DIR.is_dir(): raise RuntimeError(...) to avoid silent no-ops under -O.

Detailed Review

Code Quality

The three-stage pipeline (threat model, find, triage, then a separate report step) is architecturally clean and pedagogically sound. The decision to run triage as an independent pass that re-scores severity without inheriting the find agent's confidence scores is a real production pattern and is explained well in the markdown. The collect() helper is a sensible factoring that avoids repeating the isinstance ladder four times.

The prompt engineering is high quality: the quality-tier rubric in FIND_PROMPT (HIGH VALUE vs LOW VALUE bug classes) is the most important part of a production vuln-hunting prompt and the notebook calls this out explicitly. REPORT_SCHEMA with all fields in required and null for not-applicable is a solid pattern for structured LLM output.

The %%capture install cell, dotenv.load_dotenv(), and assert TARGET_DIR.is_dir() with explicit cwd message are all aligned with project conventions.

Security

disallowed_tools=["Bash"] is applied consistently across all agents, keeping the pipeline read-only and scoped to the vendored canary. The ENGAGEMENT_CONTEXT system prompt block documents authorization scope on every call — a good defensive pattern. The CVP link in Step 1's markdown is responsible and directly relevant. The .gitignore entry for THREAT_MODEL.md correctly prevents generated agentic artifacts from being committed.

Suggestions

jsonschema is mentioned in Step 6's prose ("in production you'd validate with jsonschema") but not installed in the %%capture cell; readers who want to follow through on the suggestion without leaving the notebook will hit an ImportError.
The principle of least privilege demonstrated through scoped allowed_tools per agent (Read/Write/Edit → Read/Grep/Glob → Read/Grep → []) is worth calling out explicitly in the Step 4 markdown or the Summary, since it's one of the strongest architectural points in the notebook.

Positive Notes

The introduction leads with the pain point (false-positive fatigue, fuzzer harness cost) before naming any technology — the right structure for a problem-driven cookbook.
The two-turn ClaudeSDKClient threat model (bootstrap → interview) vs. stateless query() for the remaining stages cleanly illustrates when to use each SDK primitive, and the markdown explains the tradeoff.
The canary target is a perfect teaching artifact: 45 lines, three distinct bug classes, one-byte dispatch, no false complexity.

Add Cybersecurity to .github/registry_schema.json so the new vuln detection cookbook entry passes registry-check.

github-actions · 2026-05-04T21:17:28Z

Notebook Changes

This PR modifies the following notebooks:

📓 `claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb`

View diff

nbdiff /dev/null claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb (1fd713e68964981777241b6142a159b6235af899)
--- /dev/null  2026-05-04 21:15:40.131010
+++ claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb (1fd713e68964981777241b6142a159b6235af899)  (no timestamp)
## added /cells:
+  markdown cell:
+    source:
+      # The Vulnerability Detection Agent
+      
+      Security teams want to find memory-safety bugs before attackers do, but the existing tooling makes it hard: static analyzers produce so many false positives that reviewers stop reading, and fuzzers need a hand-written harness per entry point before they find anything. This cookbook shows how to use the **Claude Agent SDK** to build a vulnerability-discovery agent that reads source code with Claude Code's built-in `Read`, `Grep`, and `Glob` tools, reasons about which inputs could corrupt memory, and writes findings a reviewer can act on.
+      
+      **By the end of this cookbook, you'll be able to:**
+      
+      - Run a bootstrap-then-interview threat model as a multi-turn `ClaudeSDKClient` session that writes `THREAT_MODEL.md`
+      - Drive an agentic find loop with built-in `Read`/`Grep`/`Glob` tools instead of hand-rolled file access
+      - Chain find, triage, and report as separate `query()` calls that emit schema-conformant JSON
+      
+      ## Prerequisites
+      
+      **Required knowledge:**
+      - Python fundamentals, including `async`/`await`
+      - Enough C to read a 45-line file and recognize a `memcpy`
+      
+      **Required tools:**
+      - Python 3.11+
+      - Node.js 18+ and the Claude Code CLI: `npm install -g @anthropic-ai/claude-code`
+      - An Anthropic API key ([get one here](https://console.anthropic.com))
+      
+      **Required for any real target:** authorization to assess the code you point this at. This notebook ships a tiny self-contained `canary.c` with planted bugs so you can run everything end-to-end without touching production code.
+  markdown cell:
+    source:
+      ## Step 1: Set up the environment and engagement context
+      
+      We define an `ENGAGEMENT_CONTEXT` block that we pass as `system_prompt` on every agent. It records the scope of this assessment (authorized by the code owner, isolated read-only sandbox, findings headed for responsible disclosure) so every step in the pipeline operates against the same documented ground rules. Keep those three claims true for any real target.
+      
+      > **A note on cyber safeguards:** Claude applies [real-time cyber safeguards](https://support.claude.com/en/articles/14604842-real-time-cyber-safeguards-on-claude) at the API layer. If your work on a real codebase triggers these safeguards, apply to the Cyber Verification Program (CVP) via that page: a free application-based program that lets professionals continue legitimate dual-use security work with minimal interruption.
+  code cell:
+    source:
+      %%capture
+      %pip install -U claude-agent-sdk python-dotenv
+  code cell:
+    source:
+      import json
+      from collections.abc import AsyncIterator
+      from pathlib import Path
+      
+      from dotenv import load_dotenv
+      
+      from claude_agent_sdk import (
+          AssistantMessage,
+          ClaudeAgentOptions,
+          ClaudeSDKClient,
+          Message,
+          ResultMessage,
+          TextBlock,
+          ToolUseBlock,
+          query,
+      )
+      
+      load_dotenv()
+      
+      MODEL_NAME = "claude-opus-4-7"
+      # This notebook expects to be run from the claude_agent_sdk/ directory
+      # (Jupyter's default when you open the file from there). The assert makes
+      # the failure explicit if the kernel was started elsewhere.
+      TARGET_DIR = Path("vulnerability_detection_agent/canary").resolve()
+      assert TARGET_DIR.is_dir(), f"run this notebook from claude_agent_sdk/ (got cwd={Path.cwd()})"
+      
+      ENGAGEMENT_CONTEXT = """\
+      ## Engagement context
+      
+      This is authorized security research conducted as a defensive security
+      assessment on a self-contained canary target vendored in this notebook. The
+      target is read-only source (no execution). Findings are collected for
+      demonstration and responsible-disclosure workflow testing.
+      """
+      
+      
+      async def collect(stream: AsyncIterator[Message]) -> str:
+          """Consume an Agent SDK message stream; print tool calls; return final text.
+      
+          Both ``query()`` and ``ClaudeSDKClient.receive_response()`` return an
+          ``AsyncIterator[Message]`` that terminates after a ``ResultMessage``.
+          This is the same ``async for msg in ...`` loop the other notebooks in this
+          series write inline; it is factored out here because this notebook runs
+          the loop four times (TM bootstrap, TM interview, find, triage) and the
+          ``isinstance`` ladder would otherwise repeat verbatim.
+          """
+          final = ""
+          async for msg in stream:
+              if isinstance(msg, AssistantMessage):
+                  for block in msg.content:
+                      if isinstance(block, ToolUseBlock):
+                          args = str(block.input)
+                          args = args if len(args) <= 120 else args[:120] + "...}"
+                          print(f"  [tool] {block.name} {args}")
+                      elif isinstance(block, TextBlock) and block.text.strip():
+                          final += block.text
+              elif isinstance(msg, ResultMessage) and msg.is_error:
+                  raise RuntimeError(msg.result)
+          return final
+      
+      
+      print(f"Model: {MODEL_NAME}")
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          Model: claude-opus-4-7
+  markdown cell:
+    source:
+      ## Step 2: Load the canary target
+      
+      `vulnerability_detection_agent/canary/canary.c` is a ~45-line C program with three deliberately planted memory-safety bugs (a heap buffer overflow, a stack buffer overflow, and a use-after-free), each reachable through a different "magic byte" at the start of the input. The bugs aren't labeled; the find agent in Step 4 has to locate them by reading the logic the same way it would in real code. When you're ready to try your own code, point `TARGET_DIR` at your checkout.
+  code cell:
+    source:
+      print((TARGET_DIR / "canary.c").read_text())
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          // canary.c
+          // Entry: ./canary <input_file>
+          #include <stdio.h>
+          #include <stdlib.h>
+          #include <string.h>
+          
+          static void parse_alpha(const unsigned char *data, size_t len) {
+              unsigned char *buf = malloc(32);
+              memcpy(buf, data, len);
+              printf("alpha: %02x\n", buf[0]);
+              free(buf);
+          }
+          
+          static void parse_bravo(const unsigned char *data, size_t len) {
+              char name[16];
+              memcpy(name, data, len);
+              name[15] = 0;
+              printf("bravo: %s\n", name);
+          }
+          
+          static void parse_charlie(const unsigned char *data, size_t len) {
+              char *p = malloc(64);
+              if (len > 0 && data[0] == 0xff) {
+                  free(p);
+              }
+              memcpy(p, data, len < 64 ? len : 64);
+              printf("charlie: %p\n", (void *)p);
+          }
+          
+          int main(int argc, char **argv) {
+              if (argc < 2) return 1;
+              FILE *f = fopen(argv[1], "rb");
+              if (!f) return 1;
+              unsigned char buf[4096];
+              size_t n = fread(buf, 1, sizeof buf, f);
+              fclose(f);
+              if (n < 1) return 1;
+              switch (buf[0]) {
+                  case 'A': parse_alpha(buf + 1, n - 1); break;
+                  case 'B': parse_bravo(buf + 1, n - 1); break;
+                  case 'C': parse_charlie(buf + 1, n - 1); break;
+                  default: printf("unknown format\n");
+              }
+              return 0;
+          }
+          
+  markdown cell:
+    source:
+      ## Step 3: Threat-model the target (bootstrap, then interview)
+      
+      A threat model answers "what could go wrong with this system, who would do it, and which outcomes matter?" independently of any specific bug. A threat ("attacker achieves memory corruption via untrusted file parsing") survives a patch; a vulnerability ("line 31 doesn't bounds-check `len`") does not. The find loop hunts vulnerabilities; the threat model tells it where to hunt and tells triage how to score.
+      
+      We build it in two turns of one `ClaudeSDKClient` session:
+      
+      1. **Bootstrap.** Claude reads the code with the built-in `Read` tool and drafts the model (context, assets, entry points & trust boundaries, threats, and **open questions** the code can't answer).
+      2. **Interview.** The application owner answers the open questions, and Claude refines likelihood and impact, then writes `THREAT_MODEL.md` next to the target.
+      
+      Keeping both turns in one client session means the interview turn can see the bootstrap's tool results without us re-sending the source. On a 45-line canary both turns are thin; the point here is the **output shape**: the entry-points table (what you'd partition across parallel find-agents on a real repo) and the open-questions list (the bootstrap-to-interview handoff).
+  code cell:
+    source:
+      (TARGET_DIR / "THREAT_MODEL.md").unlink(missing_ok=True)
+      
+      TM_SCHEMA = """\
+      # Threat Model: <system name>
+      ## 1. System context
+      ## 2. Assets
+      | asset | description | sensitivity |
+      ## 3. Entry points & trust boundaries
+      | entry_point | description | trust_boundary | reachable_assets |
+      ## 4. Threats
+      | id | threat | surface | asset | impact | likelihood |
+      ## 5. Open questions
+      - (things the code alone cannot answer: deployment context, which inputs are
+        attacker-controlled in practice, blast radius)
+      """
+      
+      BOOTSTRAP_PROMPT = f"""\
+      You are bootstrapping a threat model from source code alone; no application
+      owner is available yet. Read `canary.c` in this directory and emit a draft
+      threat model in the schema below. Be explicit in section 5 about what you could
+      NOT determine from the code: those open questions are the agenda for the owner
+      interview. Do not write any files yet.
+      
+      ## Schema
+      
+      {TM_SCHEMA}
+      """
+      
+      OWNER_ANSWERS = """\
+      - Deployment: `canary` is a local CLI that reads a file path from argv; it is
+        not network-facing.
+      - Attacker control: the input file is fully attacker-controlled (think email
+        attachment or downloaded file opened by the user).
+      - Blast radius: the process runs as the invoking user with no sandboxing;
+        memory corruption is code execution as that user.
+      """
+      
+      INTERVIEW_PROMPT = f"""\
+      The application owner has now answered your open questions:
+      
+      {OWNER_ANSWERS}
+      
+      Refine the threat model: update likelihood and impact in section 4 using the
+      owner's answers, resolve every item in section 5 that the answers cover, and
+      add any new threats the deployment context implies. Keep the same schema, then
+      write the refined model to `THREAT_MODEL.md` in this directory.
+      """
+      
+      tm_options = ClaudeAgentOptions(
+          model=MODEL_NAME,
+          cwd=str(TARGET_DIR),
+          system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
+          allowed_tools=["Read", "Write", "Edit"],
+          disallowed_tools=["Bash"],
+          permission_mode="acceptEdits",
+      )
+      
+      async with ClaudeSDKClient(options=tm_options) as tm_agent:
+          # collect() fully drains receive_response() through its terminating
+          # ResultMessage, so the second query() sees a clean stream.
+          await tm_agent.query(BOOTSTRAP_PROMPT)
+          draft_tm = await collect(tm_agent.receive_response())
+          print("--- bootstrap draft ---\n" + draft_tm + "\n")
+      
+          await tm_agent.query(INTERVIEW_PROMPT)
+          await collect(tm_agent.receive_response())
+      
+      tm_path = TARGET_DIR / "THREAT_MODEL.md"
+      if not tm_path.exists():
+          raise RuntimeError("interview agent did not write THREAT_MODEL.md; check the trace above")
+      threat_model = tm_path.read_text()
+      print("--- refined THREAT_MODEL.md ---\n" + threat_model)
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+            [tool] Read {'file_path': 'vulnerability_detection_agent/canary/cana...}
+          --- bootstrap draft ---
+          `★ Insight ─────────────────────────────────────`
+          - The file dispatches on `buf[0]` into three parsers, each with a distinct memory-safety bug class: heap overflow, stack overflow, and use-after-free. This is a canonical "one bug per parser" canary.
+          - The `len` passed to each parser is `n - 1` (up to 4095), but buffers are sized 32/16/64, so every path is reachable with attacker-controlled overflow length from a single input file.
+          - Threat modeling from source alone can enumerate the *bug classes* and *entry points*, but it cannot tell you who supplies `argv[1]` in production — that determines whether these are local footguns or remote RCE primitives.
+          `─────────────────────────────────────────────────`
+          
+          # Threat Model: canary (file-format parser CLI)
+          
+          ## 1. System context
+          A small C command-line utility invoked as `./canary <input_file>`. It opens the supplied path, reads up to 4096 bytes, and dispatches on the first byte (`A`/`B`/`C`) to one of three parsers. No network, IPC, or privilege-management code is present in the source. Deployment context (who runs it, who supplies the file, whether it is wrapped by a service) is unknown from the code alone.
+          
+          ## 2. Assets
+          | asset | description | sensitivity |
+          | --- | --- | --- |
+          | process memory / control flow | heap and stack of the `canary` process, including return addresses and heap metadata | high — corruption yields arbitrary code execution in the process's security context |
+          | input file contents | bytes read from `argv[1]`; format selector plus parser payload | low–medium (content itself); high as an attack vector |
+          | host execution context | whatever uid/role/container the binary runs as; file-system reach of that context | unknown — depends on deployment |
+          
+          ## 3. Entry points & trust boundaries
+          | entry_point | description | trust_boundary | reachable_assets |
+          | --- | --- | --- | --- |
+          | `argv[1]` path | caller-supplied filesystem path opened with `fopen(..., "rb")` | caller → process (path-traversal / symlink exposure depends on caller privilege) | any file readable by the process |
+          | file contents (first 4096 bytes) | `fread` into `buf`, dispatched by `buf[0]` | file producer → parser | process memory via all three parsers |
+          | `parse_alpha` payload (`A` prefix) | bytes 1..n copied via `memcpy` into a 32-byte heap buffer | untrusted file → heap | heap adjacent to 32-byte allocation |
+          | `parse_bravo` payload (`B` prefix) | bytes 1..n copied into a 16-byte stack buffer | untrusted file → stack | return address, saved frame pointer, stack canary (if enabled) |
+          | `parse_charlie` payload (`C` prefix) | bytes 1..n copied into 64-byte heap buffer; first byte may trigger early `free` | untrusted file → heap | freed chunk metadata, tcache/fastbin state |
+          
+          ## 4. Threats
+          | id | threat | surface | asset | impact | likelihood |
+          | --- | --- | --- | --- | --- | --- |
+          | T1 | Heap buffer overflow: `parse_alpha` copies up to `n-1` (≤4095) bytes into a 32-byte `malloc` without bounds check (canary.c:8-9) | `A`-prefixed input | process memory | heap corruption → potential RCE | high given a malicious file |
+          | T2 | Stack buffer overflow: `parse_bravo` copies up to `n-1` bytes into a 16-byte stack array (canary.c:15-16); the trailing `name[15]=0` does not prevent the overflow, only truncates the printed string | `B`-prefixed input | saved return address / stack | classic stack smash → RCE if no/bypassed stack protector | high |
+          | T3 | Use-after-free / double-free vector: `parse_charlie` frees `p` when `data[0]==0xff` then immediately `memcpy`s into the freed chunk (canary.c:23-26); the subsequent `printf` also leaks the freed pointer | `C`-prefixed input with first payload byte `0xff` | heap allocator metadata | UAF write → heap grooming, info leak of freed address | medium–high |
+          | T4 | Unchecked `malloc` return in `parse_alpha` and `parse_charlie` (canary.c:8, 22) | any `A` or `C` input under memory pressure | process stability | NULL-deref DoS | low in normal ops |
+          | T5 | Pointer disclosure via `printf("%p", p)` in `parse_charlie` (canary.c:27) | any `C` input | ASLR secret | leaks heap address, aids exploitation of T1/T3 | high when combined with T1/T3 |
+          | T6 | Path handling of `argv[1]`: no canonicalization or allow-list; process will open any readable path including symlinks and device nodes | caller-controlled argv | host files | info disclosure or hang (e.g., `/dev/zero`) depending on who runs it | unknown — depends on invoker privilege |
+          | T7 | Silent truncation: only the first 4096 bytes are read; parsers operate on truncated data, which can mask malformed-file detection upstream | any input | integrity of downstream decisions | logic bug, not memory-safety | low |
+          
+          ## 5. Open questions
+          - **Who supplies `argv[1]`?** Local user only, a setuid wrapper, a web upload handler, an MTA, a sandbox runner? This determines whether T1–T3 are local-only footguns or remote code-execution primitives.
+          - **What uid / capabilities / container does the binary run as?** Blast radius of successful RCE hinges on this (root vs. nobody vs. seccomp-confined).
+          - **Compiler and link flags in production builds:** is `-fstack-protector-strong`, `-D_FORTIFY_SOURCE=2`, PIE, RELRO, ASLR, or CFI enabled? These materially change exploitability of T2 and the usefulness of T5's leak.
+          - **Allocator in use** (glibc ptmalloc, musl, jemalloc, hardened_malloc). T3's UAF exploit primitives differ by allocator.
+          - **Is `canary` invoked directly or via a wrapper that validates the file first?** A front-end parser / size cap / magic-byte allow-list upstream could neutralize T1–T3.
+          - **Are crashes monitored?** Unchecked `malloc` (T4) and T6 DoS matter only if uptime is a requirement; source alone does not say.
+          - **Intended threat model for the format itself:** are `A`/`B`/`C` formats specified anywhere (docs, RFC, sibling files) so we can tell "malformed" from "adversarial"?
+          - **Expected lifetime and distribution** of the binary: is this a demo/test fixture (the name "canary" suggests so) or shipped to customers? Affects remediation priority and disclosure path.
+          
+            [tool] Write {'file_path': 'vulnerability_detection_agent/canary/THRE...}
+          --- refined THREAT_MODEL.md ---
+          # Threat Model: canary (local file-format parser CLI)
+          
+          ## 1. System context
+          `canary` is a local command-line utility invoked as `./canary <input_file>`. It
+          opens the supplied path, reads up to 4096 bytes, and dispatches on the first
+          byte (`A`/`B`/`C`) into one of three parsers. It is **not network-facing**. The
+          input file is **fully attacker-controlled** in the intended threat model —
+          typical delivery is a file received by email or downloaded from the web and
+          then opened by the local user. The process runs as the **invoking user with no
+          sandboxing**, so any memory-safety bug that yields control flow yields code
+          execution in that user's security context (home directory, SSH keys, browser
+          profile, cloud credentials, any writable path the user has).
+          
+          ## 2. Assets
+          | asset | description | sensitivity |
+          | --- | --- | --- |
+          | invoking user's account | uid, home directory, shell history, SSH keys, browser profile, cloud tokens, dotfiles, writable mounts | high — compromise equals full user takeover and a foothold for lateral movement |
+          | process memory / control flow | heap and stack of the `canary` process, return addresses, heap metadata | high — corruption is the exploitation primitive for the user-account asset |
+          | input file contents | bytes read from `argv[1]`; format selector plus parser payload | low as data; the primary *attack vector* |
+          | host integrity | persistence locations writable as the user (crontab, `~/.bashrc`, `~/.config/systemd/user`, login items) | high — trivially reachable post-exploitation; no sandbox to contain it |
+          
+          ## 3. Entry points & trust boundaries
+          | entry_point | description | trust_boundary | reachable_assets |
+          | --- | --- | --- | --- |
+          | `argv[1]` path | caller-supplied filesystem path opened with `fopen(..., "rb")` | local user → process (same uid; the boundary is between the untrusted *file content* and the parser, not between the caller and process) | any file readable by the user |
+          | file contents (first 4096 bytes) | `fread` into `buf`; dispatch on `buf[0]` | untrusted attacker (file author) → parser | process memory via all three parsers |
+          | `parse_alpha` payload (`A` prefix) | bytes 1..n copied via `memcpy` into a 32-byte heap buffer | untrusted file → heap | heap adjacent to 32-byte allocation |
+          | `parse_bravo` payload (`B` prefix) | bytes 1..n copied into a 16-byte stack buffer | untrusted file → stack | return address, saved frame pointer, stack canary (if enabled) |
+          | `parse_charlie` payload (`C` prefix) | bytes 1..n copied into a 64-byte heap buffer; first byte may trigger early `free` | untrusted file → heap | freed-chunk metadata, tcache/fastbin state |
+          
+          ## 4. Threats
+          | id | threat | surface | asset | impact | likelihood |
+          | --- | --- | --- | --- | --- | --- |
+          | T1 | Heap buffer overflow: `parse_alpha` copies up to `n-1` (≤4095) bytes into a 32-byte `malloc` without bounds check (canary.c:8-9) | `A`-prefixed attacker file | process memory → user account | critical — RCE as the invoking user; full account takeover, no sandbox containment | high — trivially reachable with a single crafted file |
+          | T2 | Stack buffer overflow: `parse_bravo` copies up to `n-1` bytes into a 16-byte stack array (canary.c:15-16); trailing `name[15]=0` only truncates the print, does not prevent the overflow | `B`-prefixed attacker file | saved return address → user account | critical — classic stack smash to RCE as the user; exploitability depends on stack-protector / PIE / ASLR in the shipped build | high |
+          | T3 | Use-after-free / double-free: `parse_charlie` frees `p` when `data[0]==0xff` then immediately `memcpy`s into the freed chunk (canary.c:23-26) | `C`-prefixed attacker file with first payload byte `0xff` | heap allocator metadata → user account | critical — heap-grooming primitive for RCE as the user | medium–high — reliability varies by allocator but the primitive is clean |
+          | T4 | Unchecked `malloc` return in `parse_alpha` and `parse_charlie` (canary.c:8, 22) | any `A` or `C` input under memory pressure | process stability | low — NULL-deref crash of a user-invoked CLI; annoyance, not compromise | low |
+          | T5 | Pointer disclosure via `printf("%p", p)` in `parse_charlie` (canary.c:27) | any `C` input | ASLR secret | high — directly hands the attacker a heap address, making T1/T3 reliable even with ASLR | high — unconditional on the `C` path |
+          | T6 | Path handling of `argv[1]`: no canonicalization or allow-list; `canary` opens any path the user can read, including symlinks and device nodes (e.g., `/dev/zero` hang) | caller-supplied argv | process stability / caller expectations | low — the caller is already the user, so there is no privilege boundary to cross via path tricks | low |
+          | T7 | Silent truncation: only the first 4096 bytes are read; malformed-file detection upstream can be bypassed by padding the exploit into the first 4 KB | any input | integrity of any upstream "scan then open" pipeline | low on its own; relevant if a scanner is put in front | low |
+          | T8 | **(new)** Social-engineering delivery: the attack surface is "user opens a file." Typical vectors are email attachments, messenger drops, and browser downloads. Any of T1/T2/T3 becomes a one-click RCE given a convincing lure | attacker-authored file delivered to the user | user account | critical — same as T1–T3; this threat names the delivery mechanism | high — the dominant real-world path to triggering T1–T3 |
+          | T9 | **(new)** File-association / handler registration: if `canary` is (or becomes) the registered handler for a file extension or MIME type, double-clicking a download auto-invokes it, removing the need to coach the user into a shell command | OS file-association layer | user account | critical — converts T8 from "run this binary on this file" to "open the attachment" | unknown without deployment config; flagged for owner |
+          | T10 | **(new)** Post-exploitation blast radius: no sandbox, so RCE in `canary` immediately has the user's full ambient authority — SSH keys, browser cookies/session tokens, cloud CLI credentials (`~/.aws`, `~/.config/gcloud`), persistence via `~/.bashrc`, user-level cron, user systemd units, login items | any of T1/T2/T3 succeeding | user account, connected systems | critical — lateral movement into email, cloud, source control is trivial from here | high conditional on T1/T2/T3 |
+          | T11 | **(new)** Corpus / fuzzing exposure: with three obvious memory bugs and a one-byte dispatch, the first hour of `afl-fuzz` or `libFuzzer` against `canary` will produce crashing inputs. If the binary is distributed, researchers and attackers will find these quickly | any attacker with the binary | disclosure timeline | high — forces a short remediation window | high |
+          
+          ## 5. Open questions
+          All prior open questions are resolved by the owner's answers, except the
+          following residual items, which are narrower and still code- or
+          build-dependent:
+          
+          - **Shipped compiler / linker hardening:** is the distributed build compiled
+            with `-fstack-protector-strong`, `-D_FORTIFY_SOURCE=2`, `-fPIE`/`-pie`, full
+            RELRO, and with ASLR enabled on the target OS? Changes exploit reliability of
+            T2 and the value of T5's leak, but not the severity class.
+          - **Allocator in the shipped build** (glibc ptmalloc vs. musl vs. hardened
+            allocator): determines which T3 exploitation primitives are practical.
+          - **File-association registration (T9):** does any installer or desktop entry
+            register `canary` as a handler for an extension or MIME type? If yes, T8
+            collapses into a pure double-click RCE.
+          - **Signed / notarized distribution:** is the binary shipped in a way that OS
+            gatekeepers (macOS Gatekeeper, Windows SmartScreen, Linux desktop "executable
+            bit" prompts) would warn the user before first run? Affects the friction on
+            T8 but not the final impact.
+          - **Telemetry / crash reporting:** are crashes from T1–T4 reported anywhere the
+            defender would see them, or does a failed exploit attempt go unnoticed?
+          
+  markdown cell:
+    source:
+      ## Step 4: Run the agentic find loop
+      
+      With the raw Messages API this step would be a hand-written `while stop_reason == "tool_use"` loop with custom file tools. The Agent SDK handles all of that: we call `query()` once with `allowed_tools=["Read", "Grep", "Glob"]` and `disallowed_tools=["Bash"]`, and Claude Code runs the explore-read-reason loop on its own. `cwd=str(TARGET_DIR)` points the agent at the canary, and `system_prompt={"type": "preset", "preset": "claude_code", "append": ...}` keeps Claude Code's default system prompt (which already tells the agent its working directory) while appending our engagement context, so the agent never has to guess its own location. With `Bash`/`Write`/`Edit` withheld the agent stays read-only.
+      
+      **The most important part of the prompt is still the quality-tier rubric.** Without it, LLM vuln hunters report every null-pointer dereference and failed assertion they can find, which are real crashes but almost never exploitable. The rubric tells the agent which crash classes to submit (heap/stack overflow, use-after-free, controlled-address write) and which are signposts to keep reading past. This one block is most of the difference between a report a security engineer acts on and one they ignore.
+      
+      A production version would add `"Bash"` to `allowed_tools` so the agent can compile with `-fsanitize=address` and confirm each crash; that belongs inside a locked-down container, so this notebook stays read-only.
+  code cell:
+    source:
+      FIND_PROMPT = f"""\
+      Find memory-safety bugs in the target source tree using the file tools
+      available to you.
+      
+      ## Threat model
+      
+      Focus on the entry points and threats identified here; you do not need to
+      re-derive them.
+      
+      {threat_model}
+      
+      ## Quality tiers: what to report
+      
+      **HIGH VALUE (report these):**
+      - heap-buffer-overflow (especially WRITE)
+      - heap-use-after-free / double-free
+      - stack-buffer-overflow
+      - global-buffer-overflow
+      
+      **LOW VALUE (note but keep looking):**
+      - assertion failures (clean abort, no corruption)
+      - stack exhaustion from recursion (DoS only)
+      - null-pointer deref at fixed small offsets
+      
+      ## Output
+      
+      For each HIGH VALUE finding emit a block:
+      
+      <finding>
+      <id>F-NN</id>
+      <file>path:line</file>
+      <category>heap-buffer-overflow | stack-buffer-overflow | use-after-free | ...</category>
+      <description>one paragraph: root cause, attacker control, trigger condition</description>
+      </finding>
+      """
+      
+      find_options = ClaudeAgentOptions(
+          model=MODEL_NAME,
+          cwd=str(TARGET_DIR),
+          system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
+          allowed_tools=["Read", "Grep", "Glob"],
+          disallowed_tools=["Bash"],
+      )
+      
+      findings_text = await collect(query(prompt=FIND_PROMPT, options=find_options))
+      print("\n" + findings_text)
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+            [tool] Glob {'pattern': '**/*'}
+            [tool] Read {'file_path': 'vulnerability_detection_agent/canary/cana...}
+          
+          `★ Insight ─────────────────────────────────────`
+          - The one-byte dispatch at line 38 makes each parser independently reachable with a trivial file prefix (`A`/`B`/`C`), so each bug is a standalone attack surface.
+          - `parse_bravo`'s `name[15]=0` is a common false-safety pattern: it null-terminates for the `printf`, but the `memcpy` on line 16 has already written past the 16-byte frame before that line runs.
+          - `parse_charlie` composes two primitives (conditional free then unconditional write) into a clean UAF whose trigger byte is attacker-chosen, which is rarer than an accidental UAF and nastier to fix.
+          `─────────────────────────────────────────────────`
+          
+          <finding>
+          <id>F-01</id>
+          <file>canary.c:8-9</file>
+          <category>heap-buffer-overflow</category>
+          <description>`parse_alpha` allocates a fixed 32-byte heap buffer and then `memcpy`s `len` attacker-controlled bytes into it with no bounds check. `len` is `n-1` where `n` is up to 4096, so an `A`-prefixed file delivers up to 4095 bytes into the 32-byte chunk, overwriting adjacent heap metadata and neighboring allocations. Trigger: any file whose first byte is `A` and whose total size exceeds 33 bytes.</description>
+          </finding>
+          
+          <finding>
+          <id>F-02</id>
+          <file>canary.c:15-16</file>
+          <category>stack-buffer-overflow</category>
+          <description>`parse_bravo` copies up to `n-1` attacker-controlled bytes into a 16-byte stack array `name[16]` without bounds-checking `len`. The subsequent `name[15]=0` only truncates the `printf` output; the out-of-bounds write in `memcpy` has already clobbered the saved frame pointer, return address, and any stack canary. Trigger: any file whose first byte is `B` and whose payload exceeds 16 bytes.</description>
+          </finding>
+          
+          <finding>
+          <id>F-03</id>
+          <file>canary.c:22-26</file>
+          <category>use-after-free</category>
+          <description>`parse_charlie` allocates 64 bytes, conditionally frees `p` when the first payload byte equals `0xff`, and then unconditionally `memcpy`s into `p`. When the trigger byte is present, the `memcpy` writes into a freed chunk, corrupting allocator freelist metadata (tcache/fastbin on glibc) and yielding a standard heap-grooming primitive toward arbitrary write / control-flow hijack. The subsequent `printf("%p", p)` also leaks the heap pointer, making exploitation reliable under ASLR. Trigger: a file starting with `C` followed by byte `0xff`.</description>
+          </finding>
+          
+          Three high-value findings (F-01 heap overflow, F-02 stack overflow, F-03 UAF with pointer disclosure) match threats T1/T2/T3+T5 in the threat model. No additional high-value memory-safety bugs in the 45-line source beyond these.
+  markdown cell:
+    source:
+      ## Step 5: Triage the raw findings
+      
+      The find agent is tuned for recall, so its output usually contains duplicates (one root cause reached from two paths) and occasionally a finding that doesn't hold up. Triage is the filter: a fresh `query()` re-reads the code, **verifies** each finding against the actual lines, **collapses duplicates** by root cause, and **re-derives severity** from reachability across the trust boundaries in the threat model. We deliberately don't let triage inherit the find agent's severity scores; re-deriving them independently is a cheap way to catch overconfidence.
+  code cell:
+    source:
+      TRIAGE_PROMPT = f"""\
+      Triage these findings against the source in this directory and the threat model
+      below. For each: verify it is real (cite the line), derive severity from
+      reachability across the trust boundaries in the threat model, and collapse
+      duplicates by root cause.
+      
+      ## Threat model
+      
+      {threat_model}
+      
+      ## Raw findings
+      
+      {findings_text}
+      """
+      
+      triage_options = ClaudeAgentOptions(
+          model=MODEL_NAME,
+          cwd=str(TARGET_DIR),
+          system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
+          allowed_tools=["Read", "Grep"],
+          disallowed_tools=["Bash"],
+      )
+      
+      triaged_text = await collect(query(prompt=TRIAGE_PROMPT, options=triage_options))
+      print("\n" + triaged_text)
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+            [tool] Read {'file_path': 'vulnerability_detection_agent/canary/cana...}
+          
+          `★ Insight ─────────────────────────────────────`
+          - All three bugs sit behind the same trust boundary (attacker-authored file → parser), reached by a single-byte dispatch in `main`. No auth, no sandbox, same-uid blast radius, so reachability is identical for all three and severity is driven by the memory-corruption primitive itself.
+          - F-03 is really two bugs fused at one call site: the UAF (T3) and the `printf("%p")` heap-pointer disclosure (T5). Collapsing them under F-03 is correct by root cause (both are in `parse_charlie`'s 6 lines), but the leak is what upgrades the UAF from "unreliable under ASLR" to "one-shot".
+          `─────────────────────────────────────────────────`
+          
+          ## Triage
+          
+          | id | verdict | line(s) | threat | severity | notes |
+          | --- | --- | --- | --- | --- | --- |
+          | F-01 | real | canary.c:8-9 | T1 | **Critical** | `malloc(32)` then `memcpy(buf, data, len)` with `len` up to 4095 (from `main` `n-1`, line 39). Attacker file `A` + ≥33 bytes overflows heap; no sandbox → RCE as invoking user. |
+          | F-02 | real | canary.c:15-16 | T2 | **Critical** | `char name[16]` then `memcpy(name, data, len)`; `name[15]=0` on line 17 runs *after* the OOB write and only truncates the `printf`. Classic stack smash; exploit reliability depends on shipped hardening (open question in threat model) but severity class unchanged. |
+          | F-03 | real | canary.c:22-27 | T3 + T5 | **Critical** | `malloc(64)` on line 22, conditional `free(p)` on line 24 when `data[0]==0xff`, unconditional `memcpy` on line 26 → UAF into freelist metadata. Line 27 `printf("charlie: %p", p)` leaks the heap pointer unconditionally, collapsing ASLR for both T1 and T3. Keep as one finding (root cause = `parse_charlie`), but call out the leak explicitly. |
+          
+          **Duplicates:** none to collapse. Three distinct root causes in three distinct parsers.
+          
+          **Coverage check against threat model:** F-01/F-02/F-03 cover T1/T2/T3/T5. T4 (NULL-`malloc`) and T6–T7 (path handling, 4 KB truncation) are intentionally out of scope as memory-safety findings and the threat model already rates them low. T8–T11 are delivery/deployment/program concerns, not source bugs — not expected in this pass.
+  markdown cell:
+    source:
+      ## Step 6: Emit a structured report
+      
+      Downstream systems (issue trackers, dashboards, SIEMs) need structured data. A final toolless `query()` converts the triaged findings into JSON that conforms to an explicit schema. We mark every field required and use `null` for "not applicable" so the model doesn't have to guess about optionality; in production you'd validate with `jsonschema` and retry on failure.
+  code cell:
+    source:
+      # Every key is in `required` so the model never silently drops a field; values
+      # may be null when a field is not applicable (e.g., no recommendation yet).
+      REPORT_SCHEMA = {
+          "type": "object",
+          "properties": {
+              "findings": {
+                  "type": "array",
+                  "items": {
+                      "type": "object",
+                      "properties": {
+                          "id": {"type": ["string", "null"]},
+                          "category": {"type": ["string", "null"]},
+                          "severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
+                          "file": {"type": ["string", "null"]},
+                          "description": {"type": ["string", "null"]},
+                          "recommendation": {"type": ["string", "null"]},
+                      },
+                      "required": ["id", "category", "severity", "file", "description", "recommendation"],
+                  },
+              }
+          },
+          "required": ["findings"],
+      }
+      
+      REPORT_PROMPT = f"""\
+      Convert the triaged findings below into strict JSON conforming to this schema.
+      Every field is required; use null for not-applicable. Respond with JSON only,
+      no surrounding prose or code fences.
+      
+      ## Schema
+      
+      {json.dumps(REPORT_SCHEMA, indent=2)}
+      
+      ## Triaged findings
+      
+      {triaged_text}
+      """
+      
+      report_options = ClaudeAgentOptions(
+          model=MODEL_NAME,
+          system_prompt={"type": "preset", "preset": "claude_code", "append": ENGAGEMENT_CONTEXT},
+          allowed_tools=[],
+      )
+      
+      report_json = await collect(query(prompt=REPORT_PROMPT, options=report_options))
+      raw = report_json.strip()
+      if raw.startswith("```"):
+          raw = raw.split("\n", 1)[1].rsplit("```", 1)[0]
+      try:
+          report = json.loads(raw)
+      except json.JSONDecodeError as e:
+          print(f"[report agent did not return clean JSON: {e}]\n")
+          print(raw)
+      else:
+          print(json.dumps(report, indent=2))
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          {
+            "findings": [
+              {
+                "id": "F-01",
+                "category": "Heap buffer overflow",
+                "severity": "critical",
+                "file": "canary.c",
+                "description": "parse_alpha allocates malloc(32) then memcpys attacker-controlled data of length up to 4095 bytes (from main's n-1 at line 39). An attacker-authored file beginning with 'A' followed by >=33 bytes overflows the heap allocation. With no sandbox, this yields RCE as the invoking user.",
+                "recommendation": "Validate len against the allocation size before memcpy (e.g., require len <= 32) or size the allocation from len. Reject oversized inputs at the parser boundary and add bounds-checked copy helpers."
+              },
+              {
+                "id": "F-02",
+                "category": "Stack buffer overflow",
+                "severity": "critical",
+                "file": "canary.c",
+                "description": "parse_bravo declares char name[16] then memcpys attacker-controlled data of length up to 4095 bytes into it. The name[15]=0 null-termination on line 17 executes after the out-of-bounds write and only truncates the subsequent printf; it does not prevent the stack smash. Exploit reliability depends on compile-time hardening but the vulnerability class is unchanged.",
+                "recommendation": "Bound len to sizeof(name)-1 before memcpy, or use strncpy/snprintf with explicit size. Ensure stack protectors (-fstack-protector-strong), FORTIFY_SOURCE, and PIE are enabled in the build."
+              },
+              {
+                "id": "F-03",
+                "category": "Use-after-free with heap address disclosure",
+                "severity": "critical",
+                "file": "canary.c",
+                "description": "parse_charlie mallocs 64 bytes at line 22, conditionally frees p at line 24 when data[0]==0xff, then unconditionally memcpys into p at line 26, corrupting freelist metadata. Line 27's printf(\"charlie: %p\", p) unconditionally leaks the heap pointer, defeating ASLR and upgrading the UAF (and the F-01 heap overflow) from unreliable to one-shot exploitable.",
+                "recommendation": "Set p=NULL after free and guard subsequent uses; restructure so the free and the write cannot both execute on the same pointer. Remove the %p disclosure (or gate it behind a debug build) to preserve ASLR."
+              }
+            ]
+          }
+  markdown cell:
+    source:
+      ## Summary and next steps
+      
+      You've built the full threat-model, find, triage, report pipeline with the Agent SDK: one multi-turn `ClaudeSDKClient` session for the threat model and three one-shot `query()` calls for the rest, with Claude Code's built-in file tools doing the exploration. The key patterns to take with you:
+      
+      - **`ClaudeSDKClient` for conversations, `query()` for one-shots.** The threat-model interview needs the bootstrap turn in context; find, triage, and report don't depend on each other's tool transcripts, so stateless calls are simpler.
+      - **`cwd` + `allowed_tools` replace hand-rolled tools.** `Read`/`Grep`/`Glob` scoped to the target directory is the whole find-agent scaffold.
+      - **Triage is a separate pass.** Re-verifying and re-scoring independently of the find agent catches overconfidence cheaply.
+      
+      ### Going further
+      
+      - **Use the hosted version.** [Claude Code Security](https://claude.com/claude-code-security) runs this same find-and-triage capability as a managed product, so you point it at a repo and Anthropic handles the sandboxing and scaling.
+      - **Scale to a real repo.** Point `cwd` at a real checkout, add `"Bash"` to `allowed_tools` inside a sandboxed container so the agent can compile with `-fsanitize=address` and confirm crashes, and spawn one `query()` per entry point from the threat model with `asyncio.gather`.
+      - **Wire the report into your tracker.** Validate Step 6's JSON with `jsonschema`, map it to SARIF or your ticket schema, and POST it.

Generated by nbdime

stale

github-actions

PR Review

Recommendation: COMMENT

Summary

Ports a well-structured vulnerability-detection agent cookbook from the private repo. The multi-turn threat-model session + stateless find/triage/report pipeline is a clean, pedagogically sound demonstration of ClaudeSDKClient vs query() usage patterns.

Actionable Feedback (4 items)

06_The_vulnerability_detection_agent.ipynb (setup cell with assert TARGET_DIR.is_dir()) — Replace assert with raise RuntimeError. Python's -O flag silently disables assert statements, so a CI runner with optimization enabled would skip the guard entirely. A RuntimeError with explicit instructions is more robust:
```
if not TARGET_DIR.is_dir():
    raise RuntimeError(
        f"TARGET_DIR not found: {TARGET_DIR}\n"
        "Open this notebook from the claude_agent_sdk/ directory, "
        "or set TARGET_DIR to an absolute path."
    )
```
.gitignore — The entry claude_agent_sdk/vulnerability_detection_agent/canary/THREAT_MODEL.md is a sub-project artifact. Move it to claude_agent_sdk/.gitignore (or a new vulnerability_detection_agent/canary/.gitignore) to avoid polluting the root gitignore with path-specific patterns.
06_The_vulnerability_detection_agent.ipynb (Step 6 report cell, REPORT_SCHEMA) — The schema collapses path and line number into a single file field (e.g., "canary.c:8-9"). This will be hard for downstream consumers (SARIF, issue trackers) to parse reliably. Consider adding a separate "line" field, especially since the summary cell already recommends wiring the output into trackers.
06_The_vulnerability_detection_agent.ipynb (collect() docstring) — The 7-line docstring restates what the adjacent markdown cells already explain. Per project style, a one-liner suffices: """Drain an Agent SDK message stream, print tool calls, and return the final text."""

Detailed Review

Code Quality

The async patterns are correct throughout — async with ClaudeSDKClient for the multi-turn session and await collect(query(...)) for stateless one-shots. The collect() helper is a nice factoring that avoids repeating the isinstance ladder four times. The JSON fence-stripping in Step 6 (raw.split("\n", 1)[1].rsplit("```", 1)[0]) handles the common case; a .strip() call after the fence block would close the edge case where a leading newline remains, but this is minor for a demo notebook.

Security

The canary.c intentionally contains memory-safety bugs — this is clearly labeled in both the notebook prose and the file header. The disallowed_tools=["Bash"] constraint keeping agents read-only is a good safety default, and the explanation for why Bash was excluded (needs a sandboxed container) sets the right expectation for readers who want to extend the notebook.

The ENGAGEMENT_CONTEXT system prompt appended to every agent is excellent practice — it surfaces authorization scope as a first-class concern rather than a buried comment.

Suggestions

The three learning-objective bullets in the introduction are slightly implementation-focused ("Run a bootstrap-then-interview session…") rather than outcome-focused ("Produce a structured threat model from source code alone…"). Not blocking, but outcome framing reads better for readers skimming to decide whether the notebook is relevant to them.
A ## Setup markdown cell before the %%capture %pip install cell would separate dependency installation from the conceptual steps, consistent with the pattern used in the other notebooks in this series.

Positive Notes

load_dotenv() is used correctly; no hardcoded API keys anywhere.
claude-opus-4-7 is the current latest Opus model — correct choice.
The registry entry format, Cybersecurity category addition to the schema enum, and eugeneyan-ant author entry all follow established patterns exactly.
canary.c is well-designed as a teaching artifact: three distinct bug classes, each reachable via a one-byte dispatch, in 45 lines — clean and self-contained.
The multi-turn ClaudeSDKClient session for the threat model versus single-shot query() calls for stateless steps is a textbook demonstration of when to use each pattern, and the prose in Step 3 makes the distinction explicit.

- Rebase onto current main (resolves DIRTY merge state from PRs anthropics#573, anthropics#595) - Re-execute notebook end-to-end against live XPOZ MCP + Claude Sonnet 4.6 (804 posts sampled across Twitter/Reddit/Instagram, structured analysis committed as cell outputs per CONTRIBUTING.md guidance) - ruff check + ruff format clean (sorted imports, formatted code) - authors.yaml sorted alphabetically (validate_authors_sorted.py passes) - registry.yaml description: 'analyse' -> 'analyze' (US English)

github-actions Bot previously requested changes May 4, 2026

View reviewed changes

feat: add Cybersecurity to registry category enum

1fd713e

Add Cybersecurity to .github/registry_schema.json so the new vuln detection cookbook entry passes registry-check.

github-actions Bot reviewed May 4, 2026

View reviewed changes

abelribbink self-assigned this May 5, 2026

abelribbink approved these changes May 5, 2026

View reviewed changes

abelribbink removed their assignment May 5, 2026

PedramNavid merged commit 876d099 into main May 5, 2026
9 checks passed

oferw-xpz mentioned this pull request May 6, 2026

feat(misc): Social Media Intelligence with XPOZ MCP #529

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(claude_agent_sdk): add vulnerability detection agent cookbook#595

feat(claude_agent_sdk): add vulnerability detection agent cookbook#595
PedramNavid merged 2 commits into
mainfrom
pedram/vulnerability-detection-agent

PedramNavid commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

PedramNavid commented May 4, 2026

What's new

Supporting changes

Uh oh!

github-actions Bot commented May 4, 2026

Notebook Changes

📓 claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

PR Review

Summary

Code Quality

Security

Suggestions

Positive Notes

Uh oh!

github-actions Bot commented May 4, 2026

Notebook Changes

📓 claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

PR Review

Summary

Code Quality

Security

Suggestions

Positive Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

📓 `claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb`

github-actions Bot commented May 4, 2026 •

edited

Loading

📓 `claude_agent_sdk/06_The_vulnerability_detection_agent.ipynb`