Skip to content

Commit ab6f35b

Browse files
critesjoshclaude
andcommitted
fix(docs): extract only main content from API HTML to reduce redundancy
- Extract only <main> element content, skipping sidebar navigation - Remove breadcrumb div to avoid repeated navigation paths - Reduces llms-full.txt from ~3.7MB to ~2.4MB with cleaner content Co-Authored-By: Claude Opus 4.5 <[email protected]>
1 parent 4a29ec6 commit ab6f35b

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

docs/scripts/append_api_docs_to_llms.js

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,17 @@ const API_DIRS = [
3535

3636
/**
3737
* Extract text content from HTML, stripping tags and normalizing whitespace.
38+
* Only extracts content from <main> element to avoid redundant navigation.
3839
*/
3940
function htmlToText(html) {
41+
// Extract only the <main> content to avoid sidebar/navigation redundancy
42+
const mainMatch = html.match(/<main[^>]*>([\s\S]*?)<\/main>/i);
43+
const content = mainMatch ? mainMatch[1] : html;
44+
4045
return (
41-
html
46+
content
47+
// Remove the breadcrumb div (first div with navigation links)
48+
.replace(/<div><a[^>]*>aztec-nr<\/a>[\s\S]*?<\/div>/i, "")
4249
// Remove script and style elements entirely
4350
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, "")
4451
.replace(/<style[^>]*>[\s\S]*?<\/style>/gi, "")

0 commit comments

Comments
 (0)