Automate llms.txt generation #1857

kacperlukawski · 2025-08-14T15:43:40Z

This PR contains a script that automatically generates llms.txt and llms-full.txt, so whenever we change anything in the content, it's automatically reflected in these files.

Key characteristics

GitHub Models are used to summarize the files for the llms.txt
The summarization is not applied to the URLs that already exist in LLMS.txt
The llms-full.txt is always regenerated

The current state of both files was generated with this script. I also added a GitHub action that should automatically update the state of them.

netlify · 2025-08-14T15:43:46Z

✅ Deploy Preview for condescending-goldwasser-91acf0 ready!

Name	Link
🔨 Latest commit	`f4a94af`
🔍 Latest deploy log	https://app.netlify.com/projects/condescending-goldwasser-91acf0/deploys/689e13fca2ef7700081dfa72
😎 Deploy Preview	https://deploy-preview-1857--condescending-goldwasser-91acf0.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot

Pull Request Overview

This PR introduces automated generation of llms.txt and llms-full.txt files for the Qdrant documentation. It uses GitHub Models for content summarization and ensures these files stay synchronized with documentation changes.

Adds a Python script that scans Hugo content and generates summaries using GitHub Models API
Creates a GitHub Actions workflow to automatically run the generation script on content changes
Updates configuration documentation comments to clarify the full_scan_threshold_kb parameter behavior

Reviewed Changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 5 comments.

File	Description
automation/generate-llms-txt.py	Core script that processes Hugo content and generates llms.txt files with AI summaries
.github/workflows/generate-llms-txt.yml	GitHub Actions workflow to automate the generation process on content changes
qdrant-landing/content/documentation/guides/configuration.md	Updated comments for full_scan_threshold_kb parameter
qdrant-landing/content/documentation/concepts/indexing.md	Updated comments for full_scan_threshold parameter

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

automation/generate-llms-txt.py

Copilot · 2025-08-14T15:50:46Z

automation/generate-llms-txt.py

+    # Load the current state of the llms.txt file to avoid duplicates
+    with open(os.path.join(OUTPUT_DIR, "llms.txt"), "r", encoding="utf-8") as llms_file:
+        existing_urls = {line.split("](")[1].split(")")[0] for line in llms_file if line.startswith("- [")}
+


The code attempts to read from llms.txt before checking if it exists. If the file doesn't exist on first run, this will raise a FileNotFoundError. Consider using a try-except block or checking file existence first.

Suggested change

try:

with open(os.path.join(OUTPUT_DIR, "llms.txt"), "r", encoding="utf-8") as llms_file:

existing_urls = {line.split("](")[1].split(")")[0] for line in llms_file if line.startswith("- [")}

except FileNotFoundError:

existing_urls = set()

Copilot · 2025-08-14T15:50:47Z

automation/generate-llms-txt.py

+
+    # Load the paths to all the published content in Hugo and process them sequentially
+    # to generate the llms.txt and llms-full.txt files.
+    with (open(os.path.join(OUTPUT_DIR, "llms.txt"), "a+", encoding="utf-8") as llms_file, \


Opening llms.txt in append mode ('a+') after reading existing URLs will result in duplicating content since the file pointer is at the end. Consider opening in write mode ('w') and rewriting the entire file, or handle the file pointer position correctly.

Suggested change

with (open(os.path.join(OUTPUT_DIR, "llms.txt"), "a+", encoding="utf-8") as llms_file, \

with (open(os.path.join(OUTPUT_DIR, "llms.txt"), "w", encoding="utf-8") as llms_file, \

.github/workflows/generate-llms-txt.yml

Copilot · 2025-08-14T15:50:48Z

qdrant-landing/content/documentation/concepts/indexing.md

+    # `full_scan_threshold_kb`, the query planner will use full-scan search instead of HNSW index
+    # traversal for better performance.
    # Note: 1Kb = 1 vector of size 256
    full_scan_threshold: 10000


The parameter name 'full_scan_threshold' is inconsistent with the configuration guide which uses 'full_scan_threshold_kb'. This should likely be 'full_scan_threshold_kb' for consistency.

Suggested change

full_scan_threshold: 10000

full_scan_threshold_kb: 10000

…ugo content

Co-authored-by: Copilot <[email protected]>

generall · 2025-08-28T13:09:38Z

automation/generate-llms-txt.py

+    # Call the GitHub Models API to generate a summary
+    client = openai.OpenAI(
+        api_key=os.environ.get("GITHUB_TOKEN"),
+        base_url="https://models.github.ai/inference",


Do you propose to call language model as a part of CI process?

Yes, but it is supposed to work only on the newly added Hugo content, except for the first run. I assumed the overall meaning of a doc should not change much over time, so the summary should only be created for the new subpages.

kacperlukawski requested review from Anush008, Copilot and generall August 14, 2025 15:47

Copilot AI reviewed Aug 14, 2025

View reviewed changes

kacperlukawski and others added 6 commits August 14, 2025 18:32

Add script and workflow to generate llms.txt and llms-full.txt from H…

73dbb2f

…ugo content

Avoid duplicate URLs

caf67d3

Initial state of the llms.txt and llms-full.txt

0bc4837

Add support for code snippets

619b1e9

Update model reference

6c1753a

Apply suggestions from code review

7498193

Co-authored-by: Copilot <[email protected]>

kacperlukawski force-pushed the automate-llms-txt-generation branch from 0765fe1 to 7498193 Compare August 14, 2025 16:34

kacperlukawski and others added 5 commits August 14, 2025 18:37

Remove pip caching from Python setup in generate-llms-txt.yml

bd234c1

Update Hugo version to 0.148.2

e13b451

Refactor llms.txt generation to use git-auto-commit-action

54e359c

Add write permissions for contents in llms.txt generation workflow

32210c3

Update llms.txt and llms-full.txt

f4a94af

generall reviewed Aug 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automate llms.txt generation #1857

Automate llms.txt generation #1857

Uh oh!

kacperlukawski commented Aug 14, 2025 •

edited

Loading

Uh oh!

netlify bot commented Aug 14, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Aug 14, 2025

Uh oh!

Copilot AI Aug 14, 2025

Uh oh!

Uh oh!

Copilot AI Aug 14, 2025

Uh oh!

generall Aug 28, 2025

Uh oh!

kacperlukawski Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	with (open(os.path.join(OUTPUT_DIR, "llms.txt"), "a+", encoding="utf-8") as llms_file, \
	with (open(os.path.join(OUTPUT_DIR, "llms.txt"), "w", encoding="utf-8") as llms_file, \

Automate llms.txt generation #1857

Are you sure you want to change the base?

Automate llms.txt generation #1857

Uh oh!

Conversation

kacperlukawski commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key characteristics

Uh oh!

netlify bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for condescending-goldwasser-91acf0 ready!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

generall Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

kacperlukawski Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kacperlukawski commented Aug 14, 2025 •

edited

Loading

netlify bot commented Aug 14, 2025 •

edited

Loading