Skip to content

Stamp sitemap index entries with per-file lastmod#16837

Open
jdevalk wants to merge 1 commit into
withastro:mainfrom
jdevalk:sitemap-index-per-file-lastmod
Open

Stamp sitemap index entries with per-file lastmod#16837
jdevalk wants to merge 1 commit into
withastro:mainfrom
jdevalk:sitemap-index-per-file-lastmod

Conversation

@jdevalk
Copy link
Copy Markdown

@jdevalk jdevalk commented May 22, 2026

Fixes #16838.

Changes

The problem (#16838): @astrojs/sitemap writes per-URL <lastmod> into the child sitemaps but never into the <sitemap> entries of sitemap-index.xml. The index gets a <lastmod> only if you set the global lastmod option, and then every entry carries the same date. So the index cannot tell a crawler which child sitemap actually changed — even though the freshness data is already computed and sitting in the child sitemaps.

This PR derives each index entry's <lastmod> from the child sitemap it points to:

  • Each <sitemap> entry is stamped with the most recent <lastmod> among the URLs that land in that file. URLs are written in source order, limit per file, so the date is computed from items.slice(i * limit, (i + 1) * limit).
  • Works for both chunked (chunks) and non-chunked output, and stays accurate when a sitemap overflows into multiple numbered files.
  • When a child sitemap has no per-URL lastmod, the entry falls back to the configured lastmod option — existing behaviour preserved.
  • customSitemaps entries keep using the global lastmod (there are no items to derive a date from).

Before / after, for the reproduction in #16838:

<!-- before -->
<sitemap><loc>https://example.com/sitemap-0.xml</loc></sitemap>

<!-- after -->
<sitemap><loc>https://example.com/sitemap-0.xml</loc><lastmod>2024-09-15T00:00:00.000Z</lastmod></sitemap>

The changeset is patch. It is a behaviour change for anyone setting per-URL lastmod via serialize (their index now carries accurate per-file dates), so happy to bump to minor if preferred.

Testing

New test/index-lastmod.test.ts:

  • Chunked — distinct lastmod values across blog/glossary chunks; asserts each index entry surfaces the newest date in its child sitemap, and that a chunk with no per-URL lastmod falls back to the configured lastmod.
  • Non-chunked, multiple filesentryLimit: 1 so each URL gets its own file; asserts every index entry's lastmod equals the date in the child sitemap it points to (exercises the per-file slicing for i > 0).

Full @astrojs/sitemap suite passes (40/40). biome, eslint, knip, and tsc -b are clean.

Docs

No docs change needed — this refines the existing lastmod behaviour with no new or changed API surface. The lastmod option keeps working as a fallback for child sitemaps without per-URL dates.

The sitemap index gave every `<sitemap>` entry the same global `lastmod`,
so crawlers could not tell which child sitemaps actually changed. Each
index entry is now stamped with the newest `lastmod` of the URLs in the
child sitemap it points to, falling back to the configured `lastmod`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 22, 2026

🦋 Changeset detected

Latest commit: 0b6f306

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 8 packages
Name Type
@astrojs/sitemap Patch
@test/sitemap-chunks Patch
@test/sitemap-dynamic Patch
@test/sitemap-i18n-fallback Patch
@test/sitemap-ssr Patch
@test/sitemap-static Patch
@test/sitemap-trailing-slash Patch
@test/astro-vercel-integration-assets Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions Bot added the pkg: integration Related to any renderer integration (scope) label May 22, 2026
@ematipico
Copy link
Copy Markdown
Member

The problem with this PR is that there's no issue filed that actually shows there's a problem. And I admit, the listed changes don't help framing the issue. Hence, I don't know what i'm reviewing.

@jdevalk
Copy link
Copy Markdown
Author

jdevalk commented May 22, 2026

Thanks @ematipico, fair — there was no issue and the framing was thin. Fixed both:

The short version of what's being reviewed: index <lastmod> should reflect the child sitemap it points to (so crawlers can tell which child changed), and today it never does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pkg: integration Related to any renderer integration (scope)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

@astrojs/sitemap: <lastmod> is missing from sitemap-index.xml entries

2 participants