Skip to content

foundata/hugo-component-robotstxt

Repository files navigation

Hugo theme component: hugo-component-robotstxt (manage robots.txt)

A reusable theme component to manage and generate the site's robots.txt.

Table of contents

Features

  • Automatically excludes all bots and crawlers in non-production environments by default.
  • Sane and useful Disallow: defaults.
  • Supports crawler-specific blocking and per-path exclusions.
  • Automatically manages sitemap references:
    • If a sitemap is enabled (default: sitemap.xml), its URL is added to robots.txt.
    • If a sitemap is disabled or renamed, the reference is updated or omitted accordingly.

Demo

Clone the repository and run the included example content (requires Hugo, Go, and Git):

git clone https://github.com/foundata/hugo-component-robotstxt.git
cd ./hugo-component-robotstxt/exampleSite
HUGO_MODULE_WORKSPACE=hugo.work hugo server --ignoreVendorPaths "**"

Or look at the following pages using this theme component:

Installation

Using Hugo modules

Add the following module path(s) to your theme: configuration:

theme:
  - "golang.foundata.com/hugo-component-robotstxt"

Hugo automatically fetches and import theme module paths as Go/Hugo modules, so you do not need to list them under module.imports manually. Using modules requires Hugo, Go, and Git to be installed on your system.

Using Git submodules

From the root directory of your Hugo site, initialize a new Git repository (if you haven't already), then add the theme as a Git submodule:

git submodule add https://github.com/foundata/hugo-component-robotstxt.git themes/robotstxt

Now reference the theme directory name in your theme: configuration:

theme:
  - "robotstxt"

Configuration

ℹ️ Heads-up: You have to set enableRobotsTXT: true (which is false by default) and make sure robotstxt is not listed at disableKinds (which should be OK by default). Otherwise, no robots.txt will be created.

Example:

# Enable generation of robots.txt file.
enableRobotsTXT: true

params:
  robotsTxt:
    # Block all user agents ("Disallow: /") in non-production environments.
    excludeNonProduction: true
    exclude:
      # Version control
      - "/.git/"
      # System and metadata dirs
      - "/.well-known/"
      # Log and temp files
      - "/*.log$"
      - "/*.tmp$"
      - "/*.bak$"
    excludeCrawlers:
      - "GPTBot" # OpenAI / ChatGPT indexing
      - "ChatGPT-User" # OpenAI / ChatGPT plugins, used for direct actions in the name of a ChatGPT user

Settings

This section documents the theme options you can place under params.robotsTxt in your Hugo configuration. The example configurations and are safe to copy-paste. All keys are optional and the theme falls back to sensible behavior unless otherwise noted.

excludeNonProduction

  • Type: Boolean.
  • Default: true
  • Purpose: When true, the template adds the following directive in non-production builds:
    User-agent: *
    Disallow: /
    
    Production detection is based on either:
    • hugo.IsProduction
    • .Site.Params.env == "production"
  • Example (config):
    params:
      robotsTxt:
        excludeNonProduction: true

exclude

  • Type: List of strings.
  • Default: ["/.git/", "/*.log$", "/*.tmp$", "/*.bak$", "/.well-known/"]
  • Purpose:
    • List of path patterns.
    • Each entry becomes a Disallow: rule for all .crawlers (User-agent: *).
  • Example (config):
    params:
      robotsTxt:
        exclude:
          - "/download/"
          - "*.asc$"
    becomes the following in robots.txt:
    User-agent: *
    Disallow: /download/
    Disallow: *.asc$
    

excludeCrawlers

Sitemap handling

There is nothing to configure. But the component is aware of Hugo's sitemap configuration:

  • By default Hugo generates the Sitemap as /sitemap.xml.
  • If disabled (disableKinds = ["sitemap"]) or if sitemap.filename is set to an empty string, no Sitemap: line is emitted.
  • If a custom filename is set (e.g. sitemap.filename = "mysite-map.xml"), the generated robots.txt will correctly reference it.

Compatibility

This project is compatible with Hugo (extended) ≥ v0.148.0 and should always work with the latest Hugo release (we usually run the latest Hugo ourselves and fix issues promptly). It has been tested at least with:

If your version isn't listed, it might still work. Just give it a try.

Contributing

See CONTRIBUTING.md if you want to get involved.

This projects's functionality is mature, so there might be little activity on the repository in the future. Don't get fooled by this, the project is under active maintenance and used daily by the maintainers.

Licensing, copyright

Copyright (c) 2025 foundata GmbH (https://foundata.com)

This project is licensed under the GNU General Public License v3.0 or later (SPDX-License-Identifier: GPL-3.0-or-later), see LICENSES/GPL-3.0-or-later.txt for the full text.

The REUSE.toml file provides detailed licensing and copyright information in a human- and machine-readable format. This includes parts that may be subject to different licensing or usage terms, such as third-party components. The repository conforms to the REUSE specification. You can use reuse spdx to create a SPDX software bill of materials (SBOM).

REUSE status

Author information

This project was created and is maintained by foundata.