|
| 1 | +# docs-toolkit Architecture Reference |
| 2 | + |
| 3 | +Technical reference for the `backend.ai-docs-toolkit` package internals. |
| 4 | +This document helps developers and AI agents understand the codebase before making changes. |
| 5 | + |
| 6 | +## Overview |
| 7 | + |
| 8 | +A TypeScript-based documentation engine that transforms Markdown into PDF and HTML output. |
| 9 | +Two rendering pipelines share a common markdown processing core. |
| 10 | + |
| 11 | +``` |
| 12 | +book.config.yaml ──┐ |
| 13 | + ├── markdown-processor.ts ──→ PDF pipeline |
| 14 | +docs-toolkit.config.yaml (Playwright → pdf-lib) |
| 15 | + ├── markdown-processor-web.ts ──→ HTML preview |
| 16 | + │ (single-page, live-reload) |
| 17 | +src/{lang}/*.md ────┘ |
| 18 | +``` |
| 19 | + |
| 20 | +## File Map |
| 21 | + |
| 22 | +| File | Purpose | |
| 23 | +|------|---------| |
| 24 | +| `cli.ts` | CLI entry point. Routes commands: `pdf`, `preview`, `preview:html`, `init`, `agents` | |
| 25 | +| `config.ts` | Config loading (`docs-toolkit.config.yaml`), defaults, type definitions | |
| 26 | +| `markdown-processor.ts` | PDF markdown pipeline. Shared utilities: `slugify`, `deduplicateH1`, `substituteTemplateVars`, `normalizeRstTables`, `convertIndentedNotes`, `resolveMarkdownPath` | |
| 27 | +| `markdown-processor-web.ts` | Web HTML pipeline. Two-pass rendering with anchor registry. **Has `multiPage` flag already** | |
| 28 | +| `markdown-extensions.ts` | Admonition processing, code block title/highlight parsing, figure labels, image size hints | |
| 29 | +| `html-builder.ts` | PDF HTML template (cover page, TOC, chapters with page breaks) | |
| 30 | +| `html-builder-web.ts` | Web HTML template (sidebar + content, single-page layout, live-reload script) | |
| 31 | +| `styles.ts` | PDF CSS (A4 print layout, CJK typography) | |
| 32 | +| `styles-web.ts` | Web CSS (Infima variables, responsive layout, admonition styles) | |
| 33 | +| `generate-pdf.ts` | PDF orchestrator. Reads config, processes markdown, renders via Playwright | |
| 34 | +| `pdf-renderer.ts` | Playwright PDF rendering, multi-pass page number injection | |
| 35 | +| `preview-server.ts` | PDF preview dev server (live-reload) | |
| 36 | +| `preview-server-web.ts` | HTML preview dev server (live-reload, image serving) | |
| 37 | +| `version.ts` | Version resolution from `package.json` | |
| 38 | +| `theme.ts` | PDF theme definitions | |
| 39 | +| `sample-content.ts` / `sample-content-markdown.ts` | Style catalog sample content | |
| 40 | +| `index.ts` | Public API exports | |
| 41 | + |
| 42 | +## Core Data Types |
| 43 | + |
| 44 | +```typescript |
| 45 | +// A processed markdown chapter ready for rendering |
| 46 | +interface Chapter { |
| 47 | + title: string; // From book.config.yaml nav entry |
| 48 | + slug: string; // slugify(title), e.g. "session-page" |
| 49 | + htmlContent: string; // Rendered HTML |
| 50 | + headings: Heading[]; // Collected during rendering |
| 51 | +} |
| 52 | + |
| 53 | +interface Heading { |
| 54 | + level: number; // 1-6 |
| 55 | + text: string; // Plain text (tags stripped) |
| 56 | + id: string; // e.g. "session-page-resource-summary-panels" |
| 57 | +} |
| 58 | +``` |
| 59 | + |
| 60 | +## Anchor ID System |
| 61 | + |
| 62 | +All heading IDs follow the pattern: `{chapterSlug}-{headingSlug}` |
| 63 | + |
| 64 | +- Chapter slug: `slugify(nav.title)` from `book.config.yaml` |
| 65 | + - Example: `"Session Page"` → `"session-page"` |
| 66 | +- Heading slug: `slugify(headingText)` |
| 67 | + - Example: `"Resource Summary Panels"` → `"resource-summary-panels"` |
| 68 | +- Final ID: `"session-page-resource-summary-panels"` |
| 69 | + |
| 70 | +Explicit anchors (`<a id="custom-id">`) keep their raw ID without chapter prefix. |
| 71 | + |
| 72 | +### Anchor Registry (Web pipeline only) |
| 73 | + |
| 74 | +Built in `markdown-processor-web.ts`: |
| 75 | + |
| 76 | +```typescript |
| 77 | +interface AnchorRegistry { |
| 78 | + anchors: Map<string, AnchorEntry[]>; // rawId → entries across chapters |
| 79 | + resolvedIds: Set<string>; // all final IDs for quick lookup |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +**Two-pass rendering**: |
| 84 | +1. Pass 1: Render all chapters, collect headings and explicit anchors into registry |
| 85 | +2. Pass 2: Rewrite `href="#anchor"` links using the registry |
| 86 | + |
| 87 | +**Cross-page link resolution** (`rewriteCrossPageLinks`): |
| 88 | +- Same-chapter links: rewrite to chapter-prefixed resolved ID |
| 89 | +- Cross-chapter links (single-page mode): rewrite to `#resolvedId` |
| 90 | +- Cross-chapter links (**multi-page mode**): rewrite to `./{targetSlug}.html#resolvedId` |
| 91 | +- The `multiPage` parameter already exists but is currently always `false` |
| 92 | + |
| 93 | +## Markdown Processing Pipeline |
| 94 | + |
| 95 | +Both PDF and Web pipelines share these preprocessing steps (in order): |
| 96 | + |
| 97 | +1. `deduplicateH1` — Remove duplicate H1 headings (RST migration artifact) |
| 98 | +2. `substituteTemplateVars` — Replace `|year|`, `|version|`, `|date|` etc. |
| 99 | +3. Image path rewriting — Resolve relative paths for the target environment |
| 100 | +4. `normalizeRstTables` — Convert RST grid tables to Markdown tables |
| 101 | +5. `convertIndentedNotes` — Convert 3-space indented blocks to blockquotes |
| 102 | +6. `processAdmonitions` — Convert `:::note` blocks to HTML divs with icons |
| 103 | +7. `processCodeBlockMeta` — Extract `title="..."` and `{1,3-5}` from code fences |
| 104 | + |
| 105 | +Then `marked` renders with a custom renderer that handles headings, images, and code blocks. |
| 106 | + |
| 107 | +## Configuration |
| 108 | + |
| 109 | +### `docs-toolkit.config.yaml` (toolkit-level) |
| 110 | + |
| 111 | +Controls engine behavior: title, company, paths, PDF settings, language labels, agent templates. |
| 112 | + |
| 113 | +Key fields for website feature: |
| 114 | +- `languageLabels` — Display names per language |
| 115 | +- `localizedStrings` — "User Guide", "Table of Contents" per language |
| 116 | +- `admonitionTitles` — Localized admonition type labels |
| 117 | +- `figureLabels` — "Figure" label per language |
| 118 | + |
| 119 | +### `src/book.config.yaml` (content-level) |
| 120 | + |
| 121 | +Defines navigation structure per language. Each entry has `title` and `path`: |
| 122 | + |
| 123 | +```yaml |
| 124 | +navigation: |
| 125 | + en: |
| 126 | + - title: Session Page |
| 127 | + path: session_page/session_page.md |
| 128 | +``` |
| 129 | +
|
| 130 | +The `path` is relative to `src/{lang}/`. |
| 131 | + |
| 132 | +## Preview Server Architecture |
| 133 | + |
| 134 | +`preview-server-web.ts`: |
| 135 | +- Node.js `http.createServer` (no Express or framework) |
| 136 | +- Routes: `/` (HTML page), `/__reload` (live-reload polling), `/images/*` (static files) |
| 137 | +- File watching with debounced rebuild (300ms) |
| 138 | +- Serves images from `src/{lang}/` directory |
| 139 | + |
| 140 | +## Extension Points for Website Feature |
| 141 | + |
| 142 | +### Already prepared |
| 143 | + |
| 144 | +1. **`multiPage` flag** in `rewriteCrossPageLinks()` — enables `./slug.html#id` link format |
| 145 | +2. **`AnchorRegistry`** — global anchor index usable for search index building |
| 146 | +3. **`Chapter` type** — contains all data needed for individual page generation |
| 147 | +4. **`styles-web.ts`** — Infima-based CSS, ready to extend |
| 148 | + |
| 149 | +### Needs to be built |
| 150 | + |
| 151 | +1. **`website-builder.ts`** — Multi-page HTML generator (page template with sidebar, prev/next nav, footer) |
| 152 | +2. **`website-generator.ts`** — Build orchestrator (read → process → write individual files + assets) |
| 153 | +3. **`search-index-builder.ts`** — Extract text content, build inverted index JSON |
| 154 | +4. **`styles-website.ts`** or extend `styles-web.ts` — Additional CSS for pagination, search UI, footer |
| 155 | +5. **`cli.ts`** — New `build:web` command |
| 156 | +6. **`config.ts`** — New `website` config section (editBaseUrl, GitHub repo info) |
| 157 | + |
| 158 | +### Key design considerations |
| 159 | + |
| 160 | +- **Image path resolution**: Currently resolves to absolute file URLs (PDF) or server-relative paths (preview). Static website needs paths relative to each page's location. |
| 161 | +- **CSS delivery**: Currently inlined in `<style>` tag. Static website should use a shared `.css` file to avoid duplication across pages. |
| 162 | +- **Search index**: Must support CJK languages (Korean, Japanese, Thai). Bigram tokenization is effective for CJK. The index must work entirely client-side (no server, no CDN) for air-gapped deployments. |
| 163 | +- **Last updated date**: `git log -1 --format=%aI -- {filepath}` is the most accurate source. Fallback to `fs.statSync().mtime` for non-git environments. This data should be collected during build and embedded in each page. |
0 commit comments