Skip to content

LLM markdown output contains raw Astro island hydration scripts #679

@decepulis

Description

@decepulis

Problem

The llms-markdown integration generates .md files from built HTML pages for LLM consumption. However, pages with interactive Astro islands (demos, tabs, copy button) include <script> and <style> tags in the built HTML that leak through into the markdown output.

This results in chunks of minified JavaScript (Astro island hydration code, Sentry init, etc.) appearing inline in the markdown — making it noisy and unhelpful for LLMs.

Example: https://vjs10-site.netlify.app/docs/framework/react/reference/play-button.md

Root Cause

The integration clones [data-llms-content] elements and strips [data-llms-ignore] descendants, but doesn't remove <script> or <style> tags before passing the HTML to Turndown. Astro islands render as <astro-island> custom elements with inline <script> tags for hydration — Turndown treats the script text content as regular text and includes it in the markdown.

Fix

Strip <script> and <style> tags from the cloned content before Turndown conversion. This removes the hydration scripts while preserving the server-rendered HTML children (including code samples inside tab panels).

PR: #678

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions