A reusable theme component to manage and generate the site's robots.txt
.
- Features
- Demo
- Installation
- Configuration
- Compatibility
- Contributing
- Licensing, copyright
- Author information
- Automatically excludes all bots and crawlers in non-production environments by default.
- Sane and useful
Disallow:
defaults. - Supports crawler-specific blocking and per-path exclusions.
- Automatically manages sitemap references:
- If a sitemap is enabled (default:
sitemap.xml
), its URL is added torobots.txt
. - If a sitemap is disabled or renamed, the reference is updated or omitted accordingly.
- If a sitemap is enabled (default:
Clone the repository and run the included example content (requires Hugo, Go, and Git):
git clone https://github.com/foundata/hugo-component-robotstxt.git
cd ./hugo-component-robotstxt/exampleSite
HUGO_MODULE_WORKSPACE=hugo.work hugo server --ignoreVendorPaths "**"
Or look at the following pages using this theme component:
Add the following module path(s) to your theme:
configuration:
theme:
- "golang.foundata.com/hugo-component-robotstxt"
Hugo automatically fetches and import theme module paths as Go/Hugo modules, so you do not need to list them under module.imports
manually. Using modules requires Hugo, Go, and Git to be installed on your system.
From the root directory of your Hugo site, initialize a new Git repository (if you haven't already), then add the theme as a Git submodule:
git submodule add https://github.com/foundata/hugo-component-robotstxt.git themes/robotstxt
Now reference the theme directory name in your theme:
configuration:
theme:
- "robotstxt"
ℹ️ Heads-up: You have to set
enableRobotsTXT: true
(which isfalse
by default) and make surerobotstxt
is not listed atdisableKinds
(which should be OK by default). Otherwise, norobots.txt
will be created.
Example:
# Enable generation of robots.txt file.
enableRobotsTXT: true
params:
robotsTxt:
# Block all user agents ("Disallow: /") in non-production environments.
excludeNonProduction: true
exclude:
# Version control
- "/.git/"
# System and metadata dirs
- "/.well-known/"
# Log and temp files
- "/*.log$"
- "/*.tmp$"
- "/*.bak$"
excludeCrawlers:
- "GPTBot" # OpenAI / ChatGPT indexing
- "ChatGPT-User" # OpenAI / ChatGPT plugins, used for direct actions in the name of a ChatGPT user
This section documents the theme options you can place under params.robotsTxt
in your Hugo configuration. The example configurations and are safe to copy-paste. All keys are optional and the theme falls back to sensible behavior unless otherwise noted.
- Type: Boolean.
- Default:
true
- Purpose: When
true
, the template adds the following directive in non-production builds:Production detection is based on either:User-agent: * Disallow: /
hugo.IsProduction
.Site.Params.env == "production"
- Example (config):
params: robotsTxt: excludeNonProduction: true
- Type: List of strings.
- Default:
["/.git/", "/*.log$", "/*.tmp$", "/*.bak$", "/.well-known/"]
- Purpose:
- List of path patterns.
- Each entry becomes a
Disallow:
rule for all .crawlers (User-agent: *
).
- Example (config):
becomes the following in
params: robotsTxt: exclude: - "/download/" - "*.asc$"
robots.txt
:User-agent: * Disallow: /download/ Disallow: *.asc$
- Type: List of strings.
- Default:
[]
(empty list) - Purpose:
- List of crawler user-agent names to exclude. Most companies provide some kind of list, e.g.:
- Reminder:
robots.txt
is an advisory mechanism. It prevents compliant crawlers from fetching URLs, but does not protect sensitive files from direct access.
- Example (config): Each entry creates a crawler-specific block:
becomes the following in
params: robotsTxt: excludeCrawlers: - "ia_archiver" - "GPTBot"
robots.txt
:User-agent: ia_archiver Disallow: / User-agent: GPTBot Disallow: /
There is nothing to configure. But the component is aware of Hugo's sitemap configuration:
- By default Hugo generates the Sitemap as
/sitemap.xml
. - If disabled (
disableKinds = ["sitemap"]
) or ifsitemap.filename
is set to an empty string, noSitemap:
line is emitted. - If a custom filename is set (e.g.
sitemap.filename = "mysite-map.xml"
), the generatedrobots.txt
will correctly reference it.
This project is compatible with Hugo (extended) ≥ v0.148.0 and should always work with the latest Hugo release (we usually run the latest Hugo ourselves and fix issues promptly). It has been tested at least with:
If your version isn't listed, it might still work. Just give it a try.
See CONTRIBUTING.md
if you want to get involved.
This projects's functionality is mature, so there might be little activity on the repository in the future. Don't get fooled by this, the project is under active maintenance and used daily by the maintainers.
Copyright (c) 2025 foundata GmbH (https://foundata.com)
This project is licensed under the GNU General Public License v3.0 or later (SPDX-License-Identifier: GPL-3.0-or-later
), see LICENSES/GPL-3.0-or-later.txt
for the full text.
The REUSE.toml
file provides detailed licensing and copyright information in a human- and machine-readable format. This includes parts that may be subject to different licensing or usage terms, such as third-party components. The repository conforms to the REUSE specification. You can use reuse spdx
to create a SPDX software bill of materials (SBOM).
This project was created and is maintained by foundata.