-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically create sitemap.xml #4060
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
just1602
approved these changes
Dec 16, 2024
This was referenced Dec 16, 2024
Closed
Merged
veganstraightedge
added a commit
that referenced
this pull request
Dec 17, 2024
Same data as `/sitemap.xml`, but as a flat file list of URLs - #4060 The purpose is for a simple way for an archivist to make a backup of the whole site using cURL/wget/similar means. # TODO - add URLs of CSS/JS files - add URLs of images (!!!) - add URLS of PDFs (downloads of zines, posters, etc)
veganstraightedge
added a commit
that referenced
this pull request
Dec 17, 2024
Cleanup following: - #4155 - #4060 # Summary - remove `sitemap_generator` config initializer - remove `sitemap_generator` in `Procfile` and a test - remove `sitemap_generator` gem - make xml/txt formats explicit in the routes (`curl .../sitemap.xml` was getting the `.txt` version mistakenly) - remove duplicate `/tce` URL in both sitemaps
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issues
closes #3780
closes #1192
An investigation and discoveries
So, it turns out that we maybe never had a live sitemap.xml or .xml.gz in production, this whole time. 🤦🏻
The reason that it seemed to work in development and seemed to succeed in the deploy/release in production but not actually available in production is because… Heroku's ephemeral filesystem.
So, what was happening was in development, running
bundle exec rails sitemap:create
orsitemap:refresh
etc, would create thesitemap.xml.gz
in our local/public
folders. And stay there. Seems good.In production, during the release stage of a production deploy (as defined in the
Procfile
), thesitemap:refresh
would "succeed", but then the/public
folder it was created in wouldn't necessarily be in the actual dyno/s serving any real requests. AFAICT.My preferred requirements
When working on this, I went round and round trying to make it work with all of these conditions:
robots.txt
The gem suggests and has functionality to store the generated file somewhere else (say, S3), but I'd like to keep it in its well known location.
Conclusion
In the end, I decided to create a sitemaps controller and dynamically create the file to:
/sitemap.xml
robots.txt
(a separate issue/pr to do!)The challenge and risk, of course, is performance. Namely around articles and some of the tools (zines, etc) which have the biggest tables to scan. Especially since most items in the sitemap never change.
But that not ever really changing-ness is what allowed me to use Rails' fragment caching around each
<url>
item in the long list of<urlset>
and reduce the page load time from ~1s to ~200ms (depending on warm cache, etc). Even at 1s, it's not the end of the world, since (I'm suspecting) that this file doesn't get read a ton.TODO follow up
sitemap:refresh
fromProcfile
config/sitemap.rb
sitemap_generator
gem