Skip to content

Commit d0d1de8

Browse files
jdnichollscclaude
andcommitted
feat: full building-block catalog, portable evals, self-contained repo
- add service-decomposition block (monolith vs microservices, granularity, gateway, discovery), wired into routing table, ownership map, building-blocks index, README; cross-linked from consistency-coordination - reframe "common compositions" as falsifiable hypotheses + add a cross-block synthesis worksheet with 3 worked constraint cascades - add docs/study-path.md (interview-prep path) and bring the failure-mode guide in-repo as docs/GUIDE.md - fix confirmed errors: exactly-once wording (broker EOS vs cross-system), SSD latency figure, geo-routing trigger collision (dns vs content-delivery), caching/consistent-hashing ownership, service-decomposition latency math - self-containment: remove build/eval scaffolding that carried machine paths; gitignore *.workflow.js; all references now in-repo - portable evals: meta/evals/run_checks.py (self-locating stdlib checker — BOTEC golden fixture, eval-data integrity, self-containment invariant) + GitHub Actions CI - record eval iteration-2 (with-skill 30/30, no regression; new block + synthesis validated) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 6b6fb3e commit d0d1de8

32 files changed

Lines changed: 1291 additions & 684 deletions

File tree

.github/workflows/checks.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
name: checks
2+
3+
# Portable, machine-independent eval/CI gate for the plugin.
4+
# Runs the stdlib-only self-locating checker — BOTEC golden fixture,
5+
# eval-data integrity, and the self-containment invariant (no machine paths).
6+
7+
on:
8+
push:
9+
branches: [main]
10+
pull_request:
11+
workflow_dispatch:
12+
13+
jobs:
14+
checks:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v4
18+
- uses: actions/setup-python@v5
19+
with:
20+
python-version: "3.x"
21+
- name: Run portable checks
22+
run: python3 meta/evals/run_checks.py

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,6 @@
33
node_modules/
44
__pycache__/
55
*.pyc
6+
meta/*.workflow.js
7+
meta/**/*.workflow.js
8+
meta/ALIGNMENT-BRIEF.md

README.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
88
[![Claude Code](https://img.shields.io/badge/Claude_Code-Plugin-blueviolet)](https://code.claude.com/docs/en/plugins)
9-
[![Skills](https://img.shields.io/badge/Building_Blocks-21_Skills-brightgreen)](#building-blocks)
9+
[![Skills](https://img.shields.io/badge/Building_Blocks-22_Skills-brightgreen)](#building-blocks)
1010
[![Self-contained](https://img.shields.io/badge/Runtime_deps-none-success)](#design-principle)
1111
[![Providers](https://img.shields.io/badge/Providers-AWS%20%C2%B7%20Azure%20%C2%B7%20GCP%20%C2%B7%20Temporal-blue)](#provider-modularity)
1212

@@ -34,7 +34,7 @@ flowchart TB
3434
direction TB
3535
L0["L0 Frame &nbsp;· requirements-scoping · back-of-the-envelope"]
3636
L1["L1 Edge &nbsp;&nbsp;· dns · load-balancing · content-delivery"]
37-
L2["L2 Contract · api-design"]
37+
L2["L2 Services · api-design · service-decomposition"]
3838
L3["L3 State &nbsp;· data-storage · caching · blob-store · sequencer · sharded-counters · distributed-search"]
3939
L4["L4 Async &nbsp;· messaging-streaming · task-scheduling"]
4040
L5["L5 Correctness · consistency-coordination"]
@@ -121,19 +121,21 @@ Claude runs the full loop — clarifies scope, sizes it with back-of-the-envelop
121121

122122
1. **Run the workflow.** `/design <system>` runs the whole process (or dispatches to the **`system-design-orchestrator`** agent). Best for full designs and interview practice — it scores the result against the GUIDE quality bar and persists the design doc.
123123
2. **Start a design conversationally.** Trigger the **system-design** orchestrator skill ("design WhatsApp-scale messaging"); it runs the same loop and routes to the blocks.
124+
> **Preparing for an interview?** [`docs/study-path.md`](docs/study-path.md) sequences the plugin into a learn → drill → self-test path (method, numbers, building-block syllabus, mistakes checklist, and the practice bank).
125+
124126
3. **Reason about one part.** Trigger a building block directly ("what caching strategy here?", "SQL or NoSQL for this?", "how do I shard this table?") — the recipe, trade-offs, and provider variants for just that part.
125127

126128
---
127129

128130
## Building blocks
129131

130-
21 skills: an orchestrator, a diagram engine, and 19 building blocks arranged **bottom-up** so each layer depends only on the ones beneath it.
132+
22 skills: an orchestrator, a diagram engine, and 20 building blocks arranged **bottom-up** so each layer depends only on the ones beneath it.
131133

132134
| Layer | Blocks | What they decide |
133135
|:------|:-------|:-----------------|
134136
| **L0 Frame** | `requirements-scoping` · `back-of-the-envelope` | what to build, how big |
135137
| **L1 Edge** | `dns` · `load-balancing` · `content-delivery` | get traffic in, served close |
136-
| **L2 Contract** | `api-design` | the interface boundary |
138+
| **L2 Services** | `api-design` · `service-decomposition` | the interface boundary + how the system is split into services |
137139
| **L3 State** | `data-storage` · `caching` · `blob-store` · `sequencer` · `sharded-counters` · `distributed-search` | store it, read it fast |
138140
| **L4 Async** | `messaging-streaming` · `task-scheduling` | decouple, schedule, absorb spikes |
139141
| **L5 Correctness** | `consistency-coordination` | CAP, ordering, consensus |
@@ -212,7 +214,7 @@ A realistic eval harness measures whether the skills make Claude design *better*
212214

213215
- **Eval set**`meta/evals/evals.json`: multi-turn exercises (URL shortener, rate limiter, news feed, observability pipeline, typeahead, WhatsApp) that each lead with different blocks. Scored on 6 GUIDE behaviors (clarify / quantify / trade-offs / failure / pivot / concrete-API) + a composition check.
214216
- **Trigger evals**`meta/evals/trigger-evals.json`: ~20 should / should-not routing queries.
215-
- **Run a comparison** (with-skill vs baseline, then a judge): see `meta/evals/whatsapp-eval.workflow.js`. The first run (`meta/evals/iteration-1/`) scored **with-skill 30/30 vs baseline 20/30** with composition confirmed real.
217+
- **Run a comparison** (with-skill vs baseline, then a judge) — the method is in `meta/evals/README.md`; run it with whatever orchestration you have. The first run (`meta/evals/iteration-1/`) scored **with-skill 30/30 vs baseline 20/30** with composition confirmed real.
216218
- **Deterministic check** (the one objectively-scriptable surface):
217219
```bash
218220
python3 skills/back-of-the-envelope/scripts/test_botec.py # asserts calculator == golden fixture
@@ -235,13 +237,16 @@ system-design-skills/
235237
│ └── design.md # /design <system> — runs the workflow
236238
├── skills/
237239
│ ├── system-design/ # ORCHESTRATOR — reasoning loop + routing + templates
238-
│ ├── <19 building blocks>/ # SKILL.md + references/ (+ providers/ for component blocks)
240+
│ ├── <20 building blocks>/ # SKILL.md + references/ (+ providers/ for component blocks)
239241
│ └── architecture-diagram/ # DIAGRAM ENGINE — self-contained HTML+SVG
240-
├── meta/ # maintainer docs (not skills)
242+
├── docs/ # shared, in-repo references (not skills)
243+
│ ├── GUIDE.md # the ten failure modes (source of the method)
244+
│ ├── study-path.md # interview-prep study path (links the pieces)
245+
│ └── hero.png # README banner
246+
├── meta/ # maintainer docs + eval set (not skills)
241247
│ ├── SKILL-CONTRACT.md # the authoring contract every block follows
242248
│ ├── PLAN.md # creation plan + status
243-
│ ├── *.workflow.js # authoring / eval / research workflows
244-
│ └── evals/ # realistic eval set + results
249+
│ └── evals/ # realistic eval set + results (evals.json, iteration-1/)
245250
├── LICENSE
246251
└── README.md
247252
```
@@ -275,7 +280,7 @@ Built on the shoulders of the best system-design resources:
275280
- [*System Design Interview*](https://bytebytego.com/) (Alex Xu / ByteByteGo) — the four-step process and back-of-the-envelope numbers
276281
- *Grokking Modern System Design* — the bottom-up building-block catalog
277282
- [*Designing Data-Intensive Applications*](https://dataintensive.net/) (Martin Kleppmann) — data-systems fundamentals
278-
- The project's `GUIDE.md` — the ten failure modes that shape the reasoning loop
283+
- [`docs/GUIDE.md`](docs/GUIDE.md) — the ten system-design failure modes that shape the reasoning loop (condensed, runtime version embedded in `skills/system-design/references/failure-modes.md`)
279284

280285
---
281286

agents/system-design-orchestrator.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Invoke the `system-design` skill (Skill tool) to load the method — the reasoni
3737

3838
1. **Clarify requirements** — invoke `requirements-scoping`. Separate functional / non-functional / explicitly out-of-scope. Pick 2–4 core features; say so. Ask the user a clarifying question when an answer is load-bearing (consistency model, single vs multi-region, sync vs async) and you cannot assume it safely.
3939
2. **Estimate scale** — invoke `back-of-the-envelope`. Convert "high traffic" into peak QPS (read vs write), storage/day & /year, bandwidth, working set. Let the numbers force the architecture.
40-
3. **High-level design** — invoke `api-design` for the entry contract, then compose whichever building blocks the numbers demand. The **full catalog is the building-blocks index** in the `system-design` skill (21 blocks across the bottom-up layers L0–L7) — consult it; don't design from the short list below. Common reaches: `dns`, `load-balancing`, `content-delivery` (edge); `data-storage`, `caching`, `blob-store`, `sequencer`, `sharded-counters`, `distributed-search` (state); `messaging-streaming`, `task-scheduling` (async); `observability`, `distributed-logging` (ops). Pick the *cheapest* design that meets the constraints.
40+
3. **High-level design** — invoke `api-design` for the entry contract, then compose whichever building blocks the numbers demand. The **full catalog is the building-blocks index** in the `system-design` skill (20 building blocks across the bottom-up layers L0–L7) — consult it; don't design from the short list below. Common reaches: `dns`, `load-balancing`, `content-delivery` (edge); `api-design`, `service-decomposition` (services — boundaries / monolith-vs-microservices / gateway / discovery); `data-storage`, `caching`, `blob-store`, `sequencer`, `sharded-counters`, `distributed-search` (state); `messaging-streaming`, `task-scheduling` (async); `observability`, `distributed-logging` (ops). Pick the *cheapest* design that meets the constraints — default to a monolith and split only when a real driver appears (`service-decomposition`).
4141
4. **Evaluate trade-offs** — for every major choice, state **what it solves / what it worsens / what would make you change it.** Never name a tool without this.
4242
5. **Stress-test failure modes** — invoke `resilience-failure`. Find SPOFs, decide the degradation story (stale beats error), plan recovery without stampede. Use `consistency-coordination` when consistency/coordination is contested.
4343
6. **Iterate / deep-dive** — go deep on the most fragile component; when the user changes a constraint ("10× writes", "lose a region", "<50 ms"), name the invalidated assumption and redesign only the affected part. Invoke `scaling-evolution` to project the next bottleneck.

commands/design.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Run the system-design process for: **$ARGUMENTS**
99
Drive this as a collaborative design session using the `system-design-skills` plugin, following the GUIDE reasoning loop. Do not jump to a finished architecture.
1010

1111
1. **Launch the orchestrator.** Use the Task tool to run the `system-design-orchestrator` agent on the problem above. It loads the `system-design` skill (the method) and composes the building-block skills.
12-
- If subagent launch is unavailable, run the loop inline yourself: invoke the `system-design` skill first, then route to building-block skills as each concern arises — invoke each skill, don't paraphrase it. The full catalog (21 blocks, bottom-up layers) is the building-blocks index inside the `system-design` skill; reach for whichever fit, including the less-obvious ones (`dns`, `sequencer`, `blob-store`, `observability`, `distributed-logging`, `distributed-search`, `task-scheduling`, `sharded-counters`) alongside the core (`requirements-scoping`, `back-of-the-envelope`, `api-design`, `data-storage`, `caching`, `load-balancing`, `messaging-streaming`, `consistency-coordination`, `resilience-failure`, `content-delivery`, `scaling-evolution`).
12+
- If subagent launch is unavailable, run the loop inline yourself: invoke the `system-design` skill first, then route to building-block skills as each concern arises — invoke each skill, don't paraphrase it. The full catalog (the 20 building blocks, bottom-up layers) is the building-blocks index inside the `system-design` skill; reach for whichever fit, including the less-obvious ones (`service-decomposition`, `dns`, `sequencer`, `blob-store`, `observability`, `distributed-logging`, `distributed-search`, `task-scheduling`, `sharded-counters`) alongside the core (`requirements-scoping`, `back-of-the-envelope`, `api-design`, `data-storage`, `caching`, `load-balancing`, `messaging-streaming`, `consistency-coordination`, `resilience-failure`, `content-delivery`, `scaling-evolution`). For "monolith vs microservices / where are the service boundaries", that is `service-decomposition`.
1313

1414
2. **Work the loop out loud:** clarify requirements (functional / non-functional / out-of-scope) → estimate scale with numbers → propose a high-level design + API → articulate trade-offs (solves / worsens / when-to-change) per major choice → stress-test failure modes → iterate, pivoting the affected part when a constraint changes.
1515

0 commit comments

Comments
 (0)