Skip to content

Commit 92ee4f0

Browse files
authored
QVAC-18431 Docs: markdown negotiation - support for AI agents (#2375)
* doc: fix broken redirect rules * doc: add fixed redirects for versioned docs * doc: fix versioned docs - readd hardcoded rules as sevalla doesnt handle splats correctly * doc: test to use placeholders rather than hardcoded using sevalla syntax * doc: fix versioned docs - readd hardcoded rules as sevalla doesnt handle splats correctly * doc: test redirect for markdown negotiation * doc: test markdown negotiation, with only hardcoded paths * doc: test markdown negotiation with splats * doc: markdown negotiation tests - placeholders * doc: final test for markdown negotiation: validating status code * doc: standardize trailing slash in the end in all URLs * doc: testing deduplication of URLs without trailing slash * doc: redirects - test - add protection for URLs with dot * doc: redirects - test - markdown negotiation - with correct syntax * doc: redirects - test - markdown negotiation - with correct syntax * doc: redirects - test - markdown negotiation - with correct syntax * doc: redirects - test - fix versioned paths * doc: fix markdown missing .md files being generated - versioned API pages * doc: add comments in redirects to remember why it was required to add the 404 block atthe bottom
1 parent 8d1d73a commit 92ee4f0

6 files changed

Lines changed: 181 additions & 95 deletions

File tree

docs/website/public/_redirects

Lines changed: 88 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,127 +1,158 @@
1-
# Markdown content negotiation — serve the .md sibling of a page when the
2-
# client asks for it via `Accept: text/markdown`. Home maps to `/index.md`;
3-
# every other page `/foo/bar/` maps to `/foo/bar.md`. We need both the
4-
# trailing-slash and no-slash variants so the `:splat` placeholder doesn't
5-
# produce `/foo/.md`.
6-
# / /index.md 301 Header:Accept=text/markdown
7-
# /*/ /:splat.md 301 Header:Accept=text/markdown
8-
# /* /:splat.md 301 Header:Accept=text/markdown
1+
# Markdown content negotiation — serve the `.md` sibling of a page when
2+
# the client requests it via `Accept: text/markdown`. Home maps to
3+
# `/index.md`; every other page `/a/b/.../z/` maps to `/a/b/.../z.md`.
4+
#
5+
# Three properties of these rules and why they are required:
6+
#
7+
# 1. `!` (force) — the source path has an underlying static file (the
8+
# HTML page) that would otherwise be served; the force flag is what
9+
# lets the rule fire at all.
10+
#
11+
# 2. `303` (See Other) — the rule MUST be a 3xx redirect, not a 2xx
12+
# rewrite. With a 2xx rewrite the response body for the `.md` content
13+
# gets cached by Cloudflare under the page's URL key (Cloudflare does
14+
# not honour `Vary: Accept`), poisoning the cache for subsequent
15+
# HTML clients. A redirect is fetched anew by the client, so the
16+
# Markdown response is cached under the `.md` URL key and the HTML
17+
# response stays cached under the page URL key. Among 3xx codes,
18+
# `303` is the semantically correct match for content negotiation
19+
# ("the representation you asked for lives at another URI").
20+
#
21+
# 3. One rule per nesting depth, no splat. `_redirects` placeholders
22+
# (`:name`) match a single path segment — verified empirically
23+
# against Sevalla (a single `/:slug/` rule served HTML, not Markdown,
24+
# for `/ai-capabilities/text-generation/`). Covering the site
25+
# therefore requires one rule per depth. The splat form
26+
# (`/* /:splat.md`) would be greedy but empirically crashes the
27+
# Sevalla edge worker (HTTP 1101) when combined with `Header:Accept`
28+
# — bug in their parser. Current site max depth is 3 segments; one
29+
# extra level is included as headroom.
30+
#
31+
# Loop safety: the source pattern always ends in `/`, the rewrite target
32+
# always ends in `.md` (no slash). Those two URL spaces are disjoint, so
33+
# the client's follow-up `GET /foo.md` does not re-match the rule. This
34+
# only holds because all incoming page requests carry a trailing slash
35+
# (Sevalla's Pretty URLs normalizes bare paths first) AND because Pretty
36+
# URLs skips paths whose final segment contains a dot — so `/foo.md` is
37+
# never normalized into `/foo.md/`, which would otherwise loop.
38+
/ /index.md 303! Header:Accept=text/markdown
39+
/:a/ /:a.md 303! Header:Accept=text/markdown
40+
/:a/:b/ /:a/:b.md 303! Header:Accept=text/markdown
41+
/:a/:b/:c/ /:a/:b/:c.md 303! Header:Accept=text/markdown
42+
/:a/:b/:c/:d/ /:a/:b/:c/:d.md 303! Header:Accept=text/markdown
943

1044
# Sevalla CDN does not resolve directory-to-index.html for path segments
1145
# containing dots (e.g. `v0.7.x`). The `:version` placeholder matches a
12-
# single path segment between slashes — so the trailing-slash variant
13-
# captures the version without the trailing slash, avoiding the double-
14-
# slash issue that the splat (`*`) has.
46+
# single path segment between slashes; the `200` rule rewrites each
47+
# versioned page's trailing-slash form to its `index.html`.
48+
#
49+
# Sevalla's Pretty URLs feature normally 301-redirects bare paths to the
50+
# trailing-slash form, but empirically it SKIPS paths whose final segment
51+
# contains a dot (treats them as file requests). So we cannot rely on it
52+
# for `vX.Y.Z` segments — we provide an explicit bare→with-slash 301
53+
# below.
54+
#
55+
# ORDER MATTERS (first-match-wins). The `200` rewrites for the
56+
# with-slash source MUST come before the `301` redirects for the bare
57+
# source. Sevalla's placeholder matcher is lenient about trailing slash
58+
# in the source pattern: a source ending in `:version` (no slash) will
59+
# match both `/foo/X` AND `/foo/X/`. If the `301` rule appeared first,
60+
# it would match the with-slash request too and 301 to itself — an
61+
# infinite redirect loop (verified empirically). Putting the `:version/`
62+
# rule first, with literal trailing slash, makes the more-specific
63+
# pattern win for with-slash requests, leaving the `301` rule to handle
64+
# only the bare form.
1565
#
1666
# Syntax mirrors the doc example
1767
# (https://docs.sevalla.com/static-sites/redirects#placeholders):
1868
# /store/:category/:item /products/:category/:item
19-
# No explicit status code — defaults to 301 redirect.
20-
21-
/reference/api/:version/ /reference/api/:version/index.html 200
22-
/reference/api/:version /reference/api/:version/index.html 200
2369

70+
/reference/api/:version/ /reference/api/:version/index.html 200
2471
/reference/release-notes/:version/ /reference/release-notes/:version/index.html 200
25-
/reference/release-notes/:version /reference/release-notes/:version/index.html 200
72+
/reference/api/:version /reference/api/:version/ 301
73+
/reference/release-notes/:version /reference/release-notes/:version/ 301
2674

2775
# ======================================================================
2876
# PERMANENT REDIRECTS DUE TO CONTENT MOVE
2977
# ----------------------------------------------------------------------
3078
# Everything above is pattern-based (wildcards / :placeholders).
3179
# Everything below is a 1:1 path move kept for backward compatibility
3280
# with links from qvac.tether.io and older bookmarks (301 → current IA).
81+
# Sevalla's Pretty URLs 301-normalizes bare-path requests to the
82+
# trailing-slash form before these rules run, so only the with-slash
83+
# variants are listed.
3384
# ======================================================================
3485

3586
# Section rename: /about-qvac/* → /about/* (+ deletions folded into home)
3687
/about-qvac/welcome/ / 301
37-
/about-qvac/welcome / 301
3888
/about-qvac/flagship-apps/ / 301
39-
/about-qvac/flagship-apps / 301
4089
/about-qvac/how-it-works/ /about/how-it-works/ 301
41-
/about-qvac/how-it-works /about/how-it-works/ 301
4290
/about-qvac/public-launch/ /about/public-launch/ 301
43-
/about-qvac/public-launch /about/public-launch/ 301
4491
/about-qvac/vision/ /about/vision/ 301
45-
/about-qvac/vision /about/vision/ 301
4692

4793
# Section rename: /sdk/getting-started/* → top-level guides
4894
/sdk/getting-started/ /introduction/ 301
49-
/sdk/getting-started /introduction/ 301
5095
/sdk/getting-started/quickstart/ /quickstart/ 301
51-
/sdk/getting-started/quickstart /quickstart/ 301
5296
/sdk/getting-started/installation/ /installation/ 301
53-
/sdk/getting-started/installation /installation/ 301
5497
/sdk/getting-started/configuration/ /configuration/ 301
55-
/sdk/getting-started/configuration /configuration/ 301
5698

5799
# Section rename: /sdk/examples/ai-tasks/* → /ai-capabilities/*
58100
/sdk/examples/ai-tasks/completion/ /ai-capabilities/text-generation/ 301
59-
/sdk/examples/ai-tasks/completion /ai-capabilities/text-generation/ 301
60101
/sdk/examples/ai-tasks/fine-tuning/ /ai-capabilities/fine-tuning/ 301
61-
/sdk/examples/ai-tasks/fine-tuning /ai-capabilities/fine-tuning/ 301
62102
/sdk/examples/ai-tasks/image-generation/ /ai-capabilities/image-generation/ 301
63-
/sdk/examples/ai-tasks/image-generation /ai-capabilities/image-generation/ 301
64103
/sdk/examples/ai-tasks/multimodal/ /ai-capabilities/multimodal/ 301
65-
/sdk/examples/ai-tasks/multimodal /ai-capabilities/multimodal/ 301
66104
/sdk/examples/ai-tasks/ocr/ /ai-capabilities/ocr/ 301
67-
/sdk/examples/ai-tasks/ocr /ai-capabilities/ocr/ 301
68105
/sdk/examples/ai-tasks/rag/ /ai-capabilities/rag/ 301
69-
/sdk/examples/ai-tasks/rag /ai-capabilities/rag/ 301
70106
/sdk/examples/ai-tasks/text-embeddings/ /ai-capabilities/text-embeddings/ 301
71-
/sdk/examples/ai-tasks/text-embeddings /ai-capabilities/text-embeddings/ 301
72107
/sdk/examples/ai-tasks/text-to-speech/ /ai-capabilities/text-to-speech/ 301
73-
/sdk/examples/ai-tasks/text-to-speech /ai-capabilities/text-to-speech/ 301
74108
/sdk/examples/ai-tasks/transcription/ /ai-capabilities/transcription/ 301
75-
/sdk/examples/ai-tasks/transcription /ai-capabilities/transcription/ 301
76109
/sdk/examples/ai-tasks/translation/ /ai-capabilities/translation/ 301
77-
/sdk/examples/ai-tasks/translation /ai-capabilities/translation/ 301
78110
/sdk/examples/ai-tasks/voice-assistant/ /ai-capabilities/voice-assistant/ 301
79-
/sdk/examples/ai-tasks/voice-assistant /ai-capabilities/voice-assistant/ 301
80111

81112
# Section rename: /sdk/examples/p2p/* → /p2p-capabilities/*
82113
/sdk/examples/p2p/blind-relays/ /p2p-capabilities/blind-relays/ 301
83-
/sdk/examples/p2p/blind-relays /p2p-capabilities/blind-relays/ 301
84114
/sdk/examples/p2p/delegated-inference/ /p2p-capabilities/delegated-inference/ 301
85-
/sdk/examples/p2p/delegated-inference /p2p-capabilities/delegated-inference/ 301
86115

87116
# Section rename: /sdk/examples/utilities/* → /runtime/*, /models/*, /configuration/plugins/*
88-
/sdk/examples/utilities/logging/ /runtime/logging/ 301
89-
/sdk/examples/utilities/logging /runtime/logging/ 301
90-
/sdk/examples/utilities/profiler/ /runtime/profiler/ 301
91-
/sdk/examples/utilities/profiler /runtime/profiler/ 301
92-
/sdk/examples/utilities/runtime-lifecycle/ /runtime/lifecycle/ 301
93-
/sdk/examples/utilities/runtime-lifecycle /runtime/lifecycle/ 301
94-
/sdk/examples/utilities/download-lifecycle/ /models/download-lifecycle/ 301
95-
/sdk/examples/utilities/download-lifecycle /models/download-lifecycle/ 301
96-
/sdk/examples/utilities/sharded-models/ /models/sharded-models/ 301
97-
/sdk/examples/utilities/sharded-models /models/sharded-models/ 301
98-
/sdk/examples/utilities/plugin-system/ /configuration/plugins/ 301
99-
/sdk/examples/utilities/plugin-system /configuration/plugins/ 301
117+
/sdk/examples/utilities/logging/ /runtime/logging/ 301
118+
/sdk/examples/utilities/profiler/ /runtime/profiler/ 301
119+
/sdk/examples/utilities/runtime-lifecycle/ /runtime/lifecycle/ 301
120+
/sdk/examples/utilities/download-lifecycle/ /models/download-lifecycle/ 301
121+
/sdk/examples/utilities/sharded-models/ /models/sharded-models/ 301
122+
/sdk/examples/utilities/plugin-system/ /configuration/plugins/ 301
100123
/sdk/examples/utilities/write-custom-plugin/ /configuration/plugins/write-custom-plugin/ 301
101-
/sdk/examples/utilities/write-custom-plugin /configuration/plugins/write-custom-plugin/ 301
102124

103125
# Section rename: /sdk/api → /reference/api, /sdk/release-notes → /reference/release-notes
104126
/sdk/api/ /reference/api/ 301
105-
/sdk/api /reference/api/ 301
106127

107128
# Section rename: /sdk/tutorials/* → /tutorials/*
108129
/sdk/tutorials/electron/ /tutorials/electron/ 301
109-
/sdk/tutorials/electron /tutorials/electron/ 301
110130
/sdk/tutorials/expo/ /tutorials/expo/ 301
111-
/sdk/tutorials/expo /tutorials/expo/ 301
112131

113132
# Section rename: /http-server → /cli/http-server
114133
/http-server/ /cli/http-server/ 301
115-
/http-server /cli/http-server/ 301
116134

117135
# ======================================================================
118136
# CATCH-ALL 404 — MUST BE THE LAST RULE
119137
# ----------------------------------------------------------------------
120138
# Sevalla serves /404.html automatically for unresolved paths, but with
121139
# HTTP 200, which breaks SEO, link checkers, analytics and HTTP clients.
122140
# This explicit rule forces a real `404 Not Found` status while still
123-
# rendering the same /404.html body. Only reached when no static file and
124-
# no rule above matches.
141+
# rendering the same /404.html body. Reached in two distinct cases:
142+
#
143+
# 1. No static file matches the request AND no earlier rule matches.
144+
# 2. An earlier `200` rewrite points at a target that doesn't exist
145+
# (e.g. `/reference/api/notaversion/` → rewrite to
146+
# `/reference/api/notaversion/index.html` which isn't built).
147+
# Sevalla's pipeline continues evaluating rules in this case, and
148+
# this catch-all gives those orphaned rewrites a clean 404 instead
149+
# of falling back to Sevalla's soft-200 default.
150+
#
151+
# Known edge case (intentionally not fixed): a direct request to
152+
# `/404.html` returns 200 because Sevalla's static-file resolution runs
153+
# before `_redirects` and serves the file as-is, bypassing this rule.
154+
# `/404.html` is not in the sitemap, not linked from any page, and not
155+
# referenced in any external surface; practical traffic to it is zero.
125156
# ======================================================================
126157

127158
/* /404.html 404

docs/website/scripts/generate-llm-md-files.ts

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,20 @@
1818
* this script handles the layout entirely in user-space.
1919
*
2020
* URL → file mapping:
21-
* '/' → out/index.md
22-
* '/quickstart' → out/quickstart.md
23-
* '/reference/api' → out/reference/api.md
21+
* '/' → out/index.md
22+
* '/quickstart' → out/quickstart.md
23+
* '/reference/api' → out/reference/api.md
24+
* '/reference/api/v0.10.x' → out/reference/api/v0.10.x.md (archived)
2425
*
25-
* Archived per-section versions (e.g. `/reference/api/v0.7.0`) are not in
26-
* the manifest, so they are not emitted as `.md` either — consistent with
27-
* `llms.txt`, `llms-full.txt`, `sitemap.xml`, and per-page `noindex`.
26+
* Archived per-section versions ARE included in the manifest. The HTML
27+
* for those pages renders publicly (with `noindex` + canonical-to-latest
28+
* for SEO posture); the per-page `.md` is just its Markdown
29+
* representation, so it must exist for the in-page "Copy as Markdown"
30+
* action and the `Accept: text/markdown` content-negotiation flow
31+
* (configured in `public/_redirects`) to resolve cleanly. The aggregate
32+
* catalogs (`llms.txt`, `llms-full.txt`, `sitemap.xml`) keep filtering
33+
* archives — see the comment in `llm-md-manifest.json/route.ts` for the
34+
* rationale.
2835
*
2936
* Usage (invoked from `package.json` after `next build`):
3037
* bun run scripts/generate-llm-md-files.ts

docs/website/src/app/llm-md-manifest.json/route.ts

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
import { source } from '@/lib/source';
22
import { getLLMText } from '@/lib/get-llm-text';
3-
import { isArchivedPage } from '@/lib/docs-open-graph';
43

54
// Resolves the response at build time so the result is written to
65
// `out/llm-md-manifest.json` as a static file under `output: 'export'`.
@@ -9,19 +8,50 @@ export const revalidate = false;
98

109
/**
1110
* Internal build-time data dump consumed by
12-
* `scripts/generate-llm-md-files.ts`. Emits one entry per non-archived page
13-
* with the processed Markdown body (same format as `llms-full.txt` chunks);
14-
* the post-build splitter reads it, writes one `out/<slug>.md` per entry,
15-
* and then deletes the manifest so it never ships to the CDN.
11+
* `scripts/generate-llm-md-files.ts`. Emits one entry per page (including
12+
* archived versions of indexable sections — see below) with the processed
13+
* Markdown body; the post-build splitter reads it, writes one `out/<slug>.md`
14+
* per entry, and then deletes the manifest so it never ships to the CDN.
1615
*
1716
* This indirection exists because `output: 'export'` does not support
1817
* `rewrites()` and Next.js does not allow `.md` as part of a dynamic route
1918
* segment (e.g. `[[...slug]].md/route.ts` is invalid). A JSON dump consumed
2019
* by a tiny splitter gives us predictable file naming with no `out/...`
2120
* staging tree to clean up.
21+
*
22+
* Policy — archived pages ARE included
23+
* ------------------------------------
24+
* Unlike `sitemap.xml`, `llms.txt`, and `llms-full.txt`, this manifest does
25+
* not filter out archived pages (`isArchivedPage`). The earlier policy was
26+
* to suppress every "AI-friendly" representation of archived sections that
27+
* carry `noindex` + canonical-to-latest (currently `API_SECTION`), but
28+
* conflating "not in aggregate catalogs" with "no per-page `.md`" hurts UX
29+
* for two distinct callers:
30+
*
31+
* - The in-page "Copy as Markdown" action issues `fetch(`${pageUrl}.md`)`.
32+
* When the `.md` is missing the button silently 404s, even though the
33+
* HTML page renders fine.
34+
* - AI agents performing per-page Markdown content negotiation
35+
* (`Accept: text/markdown`) get redirected by `public/_redirects` to a
36+
* `.md` URL that does not resolve, ending the chain in a broken 404.
37+
*
38+
* Per-page `.md` is just an alternate representation of an already-public
39+
* HTML page (the archive HTML still renders, with `noindex` carrying the
40+
* SEO signal). It is conceptually different from the aggregate catalogs
41+
* (`llms.txt` / `llms-full.txt`), which are bulk training-corpus artefacts
42+
* where excluding archives genuinely reduces cross-version AI ingestion.
43+
*
44+
* Net effect of including archived pages here:
45+
* - "Copy as Markdown" works on every page that renders HTML.
46+
* - Markdown content negotiation in `_redirects` resolves cleanly because
47+
* the static `.md` file always exists alongside the static HTML.
48+
* - SEO posture is unchanged: archives stay out of `sitemap.xml`, carry
49+
* `noindex`, and point their canonical at the latest series.
50+
* - Bulk AI training corpus posture is unchanged: archives remain
51+
* excluded from `llms.txt` and `llms-full.txt`.
2252
*/
2353
export async function GET() {
24-
const pages = source.getPages().filter((page) => !isArchivedPage(page));
54+
const pages = source.getPages();
2555

2656
const entries = await Promise.all(
2757
pages.map(async (page) => ({

docs/website/src/lib/docs-json-ld.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ function buildBreadcrumbList(slugs: string[]): JsonLdBlock {
7878
'@type': 'ListItem',
7979
position: i + 2,
8080
name: slugs[i],
81-
item: `${DOCS_SITE_ORIGIN}${accumulatedPath}`,
81+
item: `${DOCS_SITE_ORIGIN}${accumulatedPath}/`,
8282
});
8383
}
8484

0 commit comments

Comments
 (0)