Stamp sitemap index entries with per-file lastmod#16837
Open
jdevalk wants to merge 1 commit into
Open
Conversation
The sitemap index gave every `<sitemap>` entry the same global `lastmod`, so crawlers could not tell which child sitemaps actually changed. Each index entry is now stamped with the newest `lastmod` of the URLs in the child sitemap it points to, falling back to the configured `lastmod`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 0b6f306 The changes in this PR will be included in the next version bump. This PR includes changesets to release 8 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Member
|
The problem with this PR is that there's no issue filed that actually shows there's a problem. And I admit, the listed changes don't help framing the issue. Hence, I don't know what i'm reviewing. |
1 task
Author
|
Thanks @ematipico, fair — there was no issue and the framing was thin. Fixed both:
The short version of what's being reviewed: index |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #16838.
Changes
The problem (#16838):
@astrojs/sitemapwrites per-URL<lastmod>into the child sitemaps but never into the<sitemap>entries ofsitemap-index.xml. The index gets a<lastmod>only if you set the globallastmodoption, and then every entry carries the same date. So the index cannot tell a crawler which child sitemap actually changed — even though the freshness data is already computed and sitting in the child sitemaps.This PR derives each index entry's
<lastmod>from the child sitemap it points to:<sitemap>entry is stamped with the most recent<lastmod>among the URLs that land in that file. URLs are written in source order,limitper file, so the date is computed fromitems.slice(i * limit, (i + 1) * limit).chunks) and non-chunked output, and stays accurate when a sitemap overflows into multiple numbered files.lastmod, the entry falls back to the configuredlastmodoption — existing behaviour preserved.customSitemapsentries keep using the globallastmod(there are no items to derive a date from).Before / after, for the reproduction in #16838:
The changeset is
patch. It is a behaviour change for anyone setting per-URLlastmodviaserialize(their index now carries accurate per-file dates), so happy to bump tominorif preferred.Testing
New
test/index-lastmod.test.ts:lastmodvalues acrossblog/glossarychunks; asserts each index entry surfaces the newest date in its child sitemap, and that a chunk with no per-URLlastmodfalls back to the configuredlastmod.entryLimit: 1so each URL gets its own file; asserts every index entry'slastmodequals the date in the child sitemap it points to (exercises the per-file slicing fori > 0).Full
@astrojs/sitemapsuite passes (40/40).biome,eslint,knip, andtsc -bare clean.Docs
No docs change needed — this refines the existing
lastmodbehaviour with no new or changed API surface. Thelastmodoption keeps working as a fallback for child sitemaps without per-URL dates.