Skip to content

Commit 4d5842d

Browse files
authored
Merge pull request #4 from armstrongl/feat/onboarding-docs
docs: add onboarding guides (how-it-works + get-started)
2 parents cfdb86c + cabd423 commit 4d5842d

6 files changed

Lines changed: 347 additions & 22 deletions

File tree

.github/copilot-instructions.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,36 @@
11

2+
3+
4+
## vexp context tools <!-- vexp v1.2.30 -->
5+
6+
**MANDATORY: use `run_pipeline` — do NOT grep, glob, or read files manually.**
7+
vexp returns pre-indexed, graph-ranked context in a single call.
8+
9+
### Workflow
10+
1. `run_pipeline` with your task description — ALWAYS FIRST (replaces all other tools)
11+
2. Make targeted changes based on the context returned
12+
3. `run_pipeline` again only if you need more context
13+
14+
### Available MCP tools
15+
- `run_pipeline`**PRIMARY TOOL**. Runs capsule + impact + memory in 1 call.
16+
Auto-detects intent. Includes file content. Example: `run_pipeline({ "task": "fix auth bug" })`
17+
- `get_context_capsule` — lightweight, for simple questions only
18+
- `get_impact_graph` — impact analysis of a specific symbol
19+
- `search_logic_flow` — execution paths between functions
20+
- `get_skeleton` — compact file structure
21+
- `index_status` — indexing status
22+
- `get_session_context` — recall observations from sessions
23+
- `search_memory` — cross-session search
24+
- `save_observation` — persist insights (prefer run_pipeline's observation param)
25+
26+
### Agentic search
27+
- Do NOT use built-in file search, grep, or codebase indexing — always call `run_pipeline` first
28+
- If you spawn sub-agents or background tasks, pass them the context from `run_pipeline`
29+
rather than letting them search the codebase independently
30+
31+
### Smart Features
32+
Intent auto-detection, hybrid ranking, session memory, auto-expanding budget.
33+
34+
### Multi-Repo
35+
`run_pipeline` auto-queries all indexed repos. Use `repos: ["alias"]` to scope. Run `index_status` to see aliases.
36+
<!-- /vexp -->

AGENTS.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,13 @@ the task accurately.
1919
<!-- AGENTS-INDEX-START -->
2020

2121
| Doc | When to load | Last validated | Status | Paths |
22-
| ---------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | -------------- | ------- | ------------------------------------------------------------------------- |
22+
|---|---|---|---|---|
2323
| [AGENTS.md structure](docs/agents-md-structure.md) | Load when editing AGENTS.md preamble, modifying the index table format, or updating build-index.py. | 2026-03-12 | current | `AGENTS.md`<br>`scripts/agents/build-index.py` |
2424
| [Automation workflow](docs/automation-workflow.md) | Load when modifying GitHub Actions workflows, debugging CI runs, or changing staleness detection logic. | 2026-03-12 | current | `.github/workflows/**`<br>`scripts/agents/**` |
2525
| [Frontmatter schema](docs/frontmatter-schema.md) | Load when authoring new docs, reviewing frontmatter validation, or modifying the build-index script. | 2026-03-12 | current | `docs/**`<br>`scripts/agents/build-index.py`<br>`.agentsrc.yaml` |
26-
| [LLM prompt design (provider-agnostic)](docs/llm-prompt-design-agnostic.md) | Load when implementing a provider-agnostic LLM layer or porting frontmatter generation to non-Claude providers. | 2026-03-12 | current | |
26+
| [Get started](docs/get-started.md) | Load when setting up code-docs in a new repo, migrating existing docs to use frontmatter, or troubleshooting initial configuration. | 2026-03-28 | current | |
27+
| [How it works](docs/how-it-works.md) | Load when evaluating code-docs for adoption, onboarding to the system, or seeking an end-to-end understanding of the documentation lifecycle. | 2026-03-28 | current | |
28+
| [LLM prompt design (provider-agnostic)](docs/llm-prompt-design-agnostic.md) | Load when implementing a provider-agnostic LLM layer or porting frontmatter generation to non-Claude providers. | 2026-03-12 | current | |
2729
| [LLM prompt design for Claude Code](docs/llm-prompt-design-claude.md) | Load when modifying the Claude Code task prompt, adjusting CI frontmatter generation, or debugging LLM output. | 2026-03-12 | current | `.github/agents/**`<br>`.github/workflows/docs-sync.yml` |
2830

2931
<!-- AGENTS-INDEX-END -->

README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,14 @@ No additional configuration required.
9696

9797
**Scale.** At a few dozen docs, scanning the index is fast and cheap. At several hundred, tag-based or semantic filtering would help. That's out of scope here.
9898

99-
## Detailed specs
99+
## Documentation
100+
101+
### Guides
102+
103+
- [How it works](docs/how-it-works.md) — end-to-end walkthrough of the system lifecycle
104+
- [Get started](docs/get-started.md) — add code-docs to an existing repo
105+
106+
### Detailed specs
100107

101108
- [Frontmatter schema](docs/frontmatter-schema.md)
102109
- [AGENTS.md structure](docs/agents-md-structure.md)

docs/automation-workflow.md

Lines changed: 7 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,6 @@ title: "Automation workflow"
1717

1818
This system uses two discrete GitHub Actions workflows. The first keeps frontmatter and the AGENTS.md index in sync whenever a doc changes. The second detects stale docs on a schedule and when relevant code paths change. Both workflows write their changes back to the repo via pull requests rather than committing directly to any branch.
1919

20-
---
21-
2220
## Overview
2321

2422
| Workflow | File | Trigger | Responsibility |
@@ -30,8 +28,6 @@ The two workflows are intentionally decoupled. `docs-sync.yml` owns content accu
3028

3129
Both workflows delegate their logic to scripts in a `scripts/agents/` directory. Keeping logic out of YAML makes it testable locally and reusable across repos.
3230

33-
---
34-
3531
## Workflow 1: `docs-sync.yml`
3632

3733
This workflow runs whenever a file in `docs/` is added or modified on any branch. It calls an LLM to generate or refresh frontmatter, then regenerates the AGENTS.md index table, and opens a PR with both sets of changes.
@@ -70,8 +66,6 @@ The workflow opens a PR using the `peter-evans/create-pull-request` action. The
7066

7167
If a PR with this label already exists for the branch, the action updates it rather than opening a duplicate.
7268

73-
---
74-
7569
## Workflow 2: `docs-staleness.yml`
7670

7771
This workflow detects docs that are stale by two independent signals: time elapsed since `lastValidated`, and code changes that touched paths listed in a doc's frontmatter. It opens a PR that updates the `status` field in AGENTS.md for any flagged docs.
@@ -117,8 +111,6 @@ The staleness workflow opens a PR targeting `main` (not the triggering branch).
117111

118112
The PR is a notification mechanism. A human reviews it, validates the flagged docs, updates `lastValidated` in the relevant frontmatter, and merges. The `docs-sync.yml` workflow then picks up the frontmatter change and regenerates the index.
119113

120-
---
121-
122114
## Provider-agnostic LLM layer
123115

124116
The scripts call a thin wrapper (`scripts/agents/llm.py`) that abstracts the LLM provider. The wrapper reads two environment variables:
@@ -144,8 +136,6 @@ defaults:
144136

145137
Secrets are stored as GitHub Actions secrets (`AGENTS_LLM_API_KEY`) and never committed to the repo.
146138

147-
---
148-
149139
## Shared tooling
150140

151141
Both workflows call scripts in `scripts/agents/`. The scripts and their responsibilities are:
@@ -159,8 +149,6 @@ Both workflows call scripts in `scripts/agents/`. The scripts and their responsi
159149

160150
All scripts accept a `--dry-run` flag that prints intended changes without writing them. This makes local testing straightforward without needing to stub the LLM.
161151

162-
---
163-
164152
## Failure modes and guards
165153

166154
**LLM call fails:** The frontmatter script catches API errors and writes a `status: llm-error` field to the affected doc's frontmatter. The index regeneration step still runs and reflects the error status in AGENTS.md. The PR is still opened so a human can see which file failed.
@@ -173,17 +161,17 @@ All scripts accept a `--dry-run` flag that prints intended changes without writi
173161

174162
**Rate limiting:** The frontmatter script processes changed files sequentially with a configurable delay between calls (`AGENTS_LLM_DELAY_MS`, default 500ms). For repos with many simultaneous doc changes, this prevents bursting the API.
175163

176-
---
177-
178164
## Reusable template strategy
179165

180166
To stamp this system onto a new repo:
181167

182-
1. Copy `.github/workflows/docs-sync.yml` and `docs-staleness.yml`.
183-
2. Copy `scripts/agents/` in its entirety.
184-
3. Add `.agentsrc.yaml` to the repo root and set `defaults.maxAgeDays` and `llm` config.
185-
4. Add `AGENTS_LLM_API_KEY` to the repo's GitHub Actions secrets.
186-
5. Create the `docs/` directory and add an initial `AGENTS.md`.
168+
1. Copy `.github/workflows/docs-sync.yml` and `.github/workflows/docs-staleness.yml`.
169+
2. Copy `.github/agents/frontmatter-prompt.md` (the task prompt used by the docs-sync workflow).
170+
3. Copy `scripts/agents/` in its entirety.
171+
4. Copy `requirements.txt`.
172+
5. Add `.agentsrc.yaml` to the repo root and set `defaults.maxAgeDays`. The `llm:` block is optional and only needed if using the provider-agnostic LLM layer described above.
173+
6. Add `ANTHROPIC_API_KEY` to the repo's GitHub Actions secrets. This is the secret name used by `docs-sync.yml` for the Claude Code implementation. If using the provider-agnostic LLM layer instead, the secret name is `AGENTS_LLM_API_KEY`.
174+
7. Create the `docs/` directory (if it doesn't already exist) and add an initial `AGENTS.md` with boundary markers (`<!-- AGENTS-INDEX-START -->` and `<!-- AGENTS-INDEX-END -->`).
187175

188176
No other configuration is required. The push trigger paths in `docs-staleness.yml` should be updated to reflect the repo's actual code paths, but the workflow runs safely without them (it will only perform time-based checks until paths are configured).
189177

docs/get-started.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
description: "Load when setting up code-docs in a new repo, migrating existing docs to use frontmatter, or troubleshooting initial configuration."
3+
lastValidated: "2026-03-28"
4+
maxAgeDays: 90
5+
tags:
6+
- migration
7+
- onboarding
8+
- setup
9+
title: "Get started"
10+
---
11+
12+
# Get started
13+
14+
This guide walks through adding code-docs to an existing repo that already has a `docs/` folder. By the end, your repo will have automated frontmatter generation, an AGENTS.md index, and staleness detection.
15+
16+
## What to copy
17+
18+
The [reusable template strategy](automation-workflow.md#reusable-template-strategy) section in the automation workflow doc has the authoritative file checklist. In summary, you need:
19+
20+
- **Workflows**: `.github/workflows/docs-sync.yml` and `.github/workflows/docs-staleness.yml`
21+
- **Task prompt**: `.github/agents/frontmatter-prompt.md`
22+
- **Scripts**: `scripts/agents/` (all three files: `build-index.py`, `check-staleness.py`, `frontmatter.py`)
23+
- **Dependencies**: `requirements.txt` (`PyYAML >= 6.0`)
24+
- **Config**: `.agentsrc.yaml`
25+
- **Index**: `AGENTS.md` with boundary markers
26+
27+
Copy these from the [code-docs repo](https://github.com/armstrongl/code-docs) into your repo, preserving the directory structure.
28+
29+
After copying, two things need attention:
30+
31+
**GitHub Actions secret.** Add `ANTHROPIC_API_KEY` to your repo's GitHub Actions secrets (Settings > Secrets and variables > Actions). The `docs-sync.yml` workflow uses this to call Claude Code for frontmatter generation.
32+
33+
**Staleness trigger paths.** Open `.github/workflows/docs-staleness.yml` and update the `push.paths` section to include your repo's actual code paths (e.g., `src/**`, `lib/**`). Without this, the workflow only performs time-based staleness checks — it won't detect when code changes make a doc stale.
34+
35+
## Configure
36+
37+
### `.agentsrc.yaml`
38+
39+
The default config sets a single value:
40+
41+
```yaml
42+
defaults:
43+
maxAgeDays: 90
44+
```
45+
46+
This is the fallback staleness threshold for any doc that doesn't set its own `maxAgeDays` in frontmatter. Adjust it to match your team's review cadence. 90 days is a reasonable default for most repos.
47+
48+
### `AGENTS.md`
49+
50+
Open `AGENTS.md` and customize the preamble to describe your repo. The preamble is the first thing agents read on every session. Keep it to three to five sentences that explain what the repo contains and what kind of work agents are likely to do.
51+
52+
Everything above the `<!-- AGENTS-INDEX-START -->` marker is human-authored and never touched by automation. Everything between the markers is regenerated by `build-index.py` on every run.
53+
54+
## Migrate existing docs
55+
56+
If your repo already has Markdown files in `docs/`, they need YAML frontmatter before they can appear in the AGENTS.md index. There are two paths: manual and automated.
57+
58+
### Manual: add frontmatter yourself
59+
60+
Add a YAML frontmatter block to the top of each doc. The fields appear in alphabetical order:
61+
62+
```yaml
63+
---
64+
description: "Load when [trigger conditions for when an agent should read this doc]."
65+
lastValidated: "2026-03-28"
66+
maxAgeDays: 90
67+
paths:
68+
- "src/auth/**"
69+
- "src/middleware/session.ts"
70+
tags:
71+
- auth
72+
- security
73+
title: "Authentication flow"
74+
---
75+
```
76+
77+
Two fields are yours to set:
78+
79+
- `lastValidated` — set this to today's date. You are confirming the doc is accurate as of now.
80+
- `maxAgeDays` — set this to your preferred review interval, or omit it to inherit the default from `.agentsrc.yaml`.
81+
82+
The other four fields (`title`, `description`, `paths`, `tags`) are LLM-owned. You can write them yourself, or leave them empty and let the automation fill them in (see below). If you write them manually, the automation will never overwrite them.
83+
84+
The `description` field is the most important. Write it as a trigger condition starting with "Load when", not a topic summary. Keep it under 160 characters.
85+
86+
For docs that don't map to specific code paths (architectural overviews, onboarding guides, process docs), set `paths: []` to indicate there are no code paths. Staleness detection will use time-based checks only. If you omit `paths`, automation may add `paths: []` for you.
87+
88+
### Automated: let docs-sync generate frontmatter
89+
90+
If you prefer, add minimal frontmatter to each doc (just `lastValidated` and `maxAgeDays`) and push:
91+
92+
```yaml
93+
---
94+
lastValidated: "2026-03-28"
95+
maxAgeDays: 90
96+
---
97+
```
98+
99+
When the push lands, `docs-sync.yml` detects the changed files and calls Claude Code to generate the missing `title`, `description`, `paths`, and `tags`. It opens a PR with the generated values so you can review them before merging.
100+
101+
This is the fastest path for migrating many docs at once. Push them all in one commit, review the generated descriptions in the PR, and correct any that don't read as clear trigger conditions.
102+
103+
## Verify
104+
105+
After adding frontmatter to your docs (manually or via automation), run the index builder to confirm everything is wired correctly:
106+
107+
```bash
108+
python scripts/agents/build-index.py
109+
```
110+
111+
Check the output in `AGENTS.md`:
112+
113+
- Each doc should have a row in the index table with correct title, description, date, and status.
114+
- No rows should show `missing fields` warnings. If they do, the doc is missing one of the four required frontmatter fields (`title`, `description`, `lastValidated`, `maxAgeDays`).
115+
116+
If you're using a virtual environment for Python dependencies, set it up first:
117+
118+
```bash
119+
python -m venv .venv
120+
source .venv/bin/activate
121+
pip install -r requirements.txt
122+
```
123+
124+
## Nested directory structures
125+
126+
The default `build-index.py` scans `docs/*.md` — a flat glob that does not recurse into subdirectories. If your repo organizes docs in a nested structure like `docs/guide/`, you need to update the `--docs-dir` argument and fix the path resolution in the script.
127+
128+
The main issue is the repo-root derivation. The script uses `os.path.dirname(docs_dir)` to find the repo root, which assumes a single directory level. For `docs/guide/`, you need an extra `os.path.dirname()` call:
129+
130+
```python
131+
# Default: one level up from docs/ -> repo root
132+
repo_root = os.path.dirname(docs_dir)
133+
134+
# Nested: two levels up from docs/guide/ -> repo root
135+
repo_root = os.path.dirname(os.path.dirname(docs_dir))
136+
```
137+
138+
Without this fix, AGENTS.md links will have wrong relative paths (e.g., `guide/getting-started.md` instead of `docs/guide/getting-started.md`).
139+
140+
## Troubleshooting
141+
142+
**Docs not appearing in AGENTS.md.** The doc is missing a valid frontmatter block (delimited by `---` at the top of the file), or one of the four required fields (`title`, `description`, `lastValidated`, `maxAgeDays`) is absent. `build-index.py` skips files with no frontmatter and emits a warning row for files with incomplete frontmatter. Run `build-index.py` locally and check for warnings.
143+
144+
**docs-sync PR not appearing after push.** The `ANTHROPIC_API_KEY` secret is not set, or the Claude Code step failed. The workflow uses `continue-on-error: true` on the Claude Code step, so failures are silent. Check the workflow run logs in GitHub Actions for details. The index regeneration step (`build-index.py`) still runs even if Claude Code fails, so the AGENTS.md table will update — but LLM-owned fields that were missing will stay missing.
145+
146+
**Poor auto-generated descriptions.** Descriptions are never auto-regenerated after initial creation. If Claude Code produced a vague or inaccurate description, you must manually edit the `description` field in the doc's frontmatter. Review generated descriptions carefully on the first PR — this is the one chance to catch them.
147+
148+
**Markdownlint MD025 errors.** If your repo uses markdownlint, the YAML frontmatter `title:` field is treated as an H1 heading by default. Combined with a `# Heading` in the doc body, this triggers MD025 (multiple top-level headings). Add this to your `.markdownlint-cli2.yaml`:
149+
150+
```yaml
151+
MD025:
152+
front_matter_title: ""
153+
```
154+
155+
**AGENTS.md invisible to git.** Some tools (such as OpenAI Codex CLI) add generated files to `.git/info/exclude`, which makes them invisible to `git status`. Run `git check-ignore -v AGENTS.md` to check. If it is excluded, remove the relevant line from `.git/info/exclude`.
156+
157+
**`pip install PyYAML` fails with PEP 668.** On modern systems with externally managed Python environments, direct `pip install` is blocked. Use a virtual environment instead:
158+
159+
```bash
160+
python -m venv .venv
161+
source .venv/bin/activate
162+
pip install -r requirements.txt
163+
```
164+
165+
**Multi-commit pushes miss changed files.** The default `docs-sync.yml` uses `HEAD~1..HEAD` to detect changed files, which only sees the last commit. For pushes with multiple commits, use the full push range in your workflow:
166+
167+
```yaml
168+
env:
169+
BEFORE_SHA: ${{ github.event.before }}
170+
AFTER_SHA: ${{ github.sha }}
171+
run: |
172+
FILES=$(git diff --name-only --diff-filter=AM "$BEFORE_SHA" "$AFTER_SHA" -- 'docs/*.md')
173+
```

0 commit comments

Comments
 (0)