Skip to content

Commit aae5061

Browse files
Dynamically create sitemap.xml (#4060)
# Issues - #3780 - #1192 closes #3780 closes #1192 # An investigation and discoveries So, it turns out that we maybe never had a live sitemap.xml or .xml.gz in production, this whole time. 🤦🏻 The reason that it seemed to work in development and seemed to succeed in the deploy/release in production but not actually available in production is because… Heroku's ephemeral filesystem. > [Ephemeral Disk](https://devcenter.heroku.com/articles/active-storage-on-heroku#ephemeral-disk) > Heroku has an “ephemeral” hard drive, this means that you can write files to disk, but those files will not persist after the application is restarted. So, what was happening was in development, running `bundle exec rails sitemap:create` or `sitemap:refresh` etc, would create the `sitemap.xml.gz` in our local `/public` folders. And stay there. Seems good. In production, during the release stage of a production deploy (as defined in the [`Procfile`](https://github.com/crimethinc/website/blob/5c88026595d2ad22ea872a8427063ff617e851fe/Procfile#L1)), the `sitemap:refresh` would "succeed", but then the `/public` folder it was created in wouldn't necessarily be in the actual dyno/s serving any real requests. AFAICT. --- # My preferred requirements When working on this, I went round and round trying to make it work with all of these conditions: - [sitemap_generator](https://github.com/kjvarga/sitemap_generator) gem (great gem, love it, would use in non-Heroku deployed apps) - Heroku's ephemeral filesystem - and keeping its location on the same domain at the root, where it can be auto-discovered without first reading `robots.txt` The gem [suggests and has functionality](https://github.com/kjvarga/sitemap_generator?tab=readme-ov-file#upload-sitemaps-to-a-remote-host-using-adapters) to store the generated file somewhere else (say, S3), but I'd like to keep it in its _well known_ location. --- # Conclusion In the end, I decided to create a sitemaps controller and dynamically create the file to: - keep it at `/sitemap.xml` - not have to deal with Heroku's ephemeral filesystem - keep its discovery well known and not dependent on `robots.txt` (a separate issue/pr to do!) The challenge and risk, of course, is performance. Namely around articles and some of the tools (zines, etc) which have the biggest tables to scan. Especially since most items in the sitemap never change. But that _not ever really changing-ness_ is what allowed me to use Rails' fragment caching around each `<url>` item in the long list of `<urlset>` and reduce the page load time from ~1s to ~200ms (depending on warm cache, etc). Even at 1s, it's not the end of the world, since (I'm suspecting) that this file doesn't get read a ton. # TODO follow up - [ ] remove `sitemap:refresh` from `Procfile` - [ ] remove `config/sitemap.rb` - [ ] remove `sitemap_generator` gem - [ ] keep an eye on this in production - [ ] add any files/paths that i missed (or skipped for now because i ran out of steam)
1 parent 5c88026 commit aae5061

File tree

7 files changed

+315
-8
lines changed

7 files changed

+315
-8
lines changed

app/controllers/sitemap_controller.rb

+107
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
class SitemapController < ApplicationController
2+
STATIC_PATHS = %w[
3+
about
4+
arts/submission-guidelines
5+
books/into-libraries
6+
books/lit-kit
7+
contact
8+
faq
9+
games/j20
10+
kickstarter/2017
11+
library
12+
start
13+
steal-something-from-work-day
14+
store
15+
tce
16+
tools
17+
].freeze
18+
19+
TO_CHANGE_EVERYTHING_LANGUAGES = [
20+
# in YAML files
21+
ToChangeEverythingController::TO_CHANGE_ANYTHING_YAMLS.dup,
22+
# in /public folder
23+
%w[czech deutsch polski slovenscina slovensko]
24+
].flatten.freeze
25+
26+
def show
27+
@latest_article = Article.published.english.first
28+
@last_modified = @latest_article&.updated_at || Time.current
29+
@urls = []
30+
31+
# articles feed, for all languages with articles
32+
@localized_feeds = Locale.unscoped.order(name_in_english: :asc)
33+
34+
# categories
35+
@categories = Category.all
36+
37+
# articles
38+
@articles = live_published_articles
39+
40+
# articles by year
41+
@article_years = (1996..Time.zone.today.year).to_a
42+
43+
# static-ish pages
44+
@static_paths = STATIC_PATHS
45+
46+
# To Change Everything (TCE)
47+
@to_change_everything_languages = TO_CHANGE_EVERYTHING_LANGUAGES
48+
49+
# languages
50+
@locales = languages
51+
52+
# tools
53+
# books
54+
@books = Book.published.live
55+
56+
# logos
57+
@logos = Logo.published.live
58+
59+
# posters
60+
@posters = Poster.published.live
61+
62+
# stickers
63+
@stickers = Sticker.published.live
64+
65+
# videos
66+
@videos = Video.published.live
67+
68+
# zines
69+
@zines = Zine.published.live
70+
71+
# journals / issues
72+
@journals = Journal.published.live
73+
@issues = Issue.published.live
74+
75+
# podcasts / episodes
76+
@podcasts = Podcast.published.live
77+
@episodes = Episode.published.live
78+
79+
# TODO: add @TOOL_latest_modified to each tool, used in view for lastmod: in url tag partial
80+
# TODO: add contradictionary definitions pages to sitemap
81+
# TODO: add tags index and show pages to sitemap
82+
# TODO: add steal-something-from-work-day localized pages
83+
end
84+
85+
private
86+
87+
def sitemap_url = Data.define(:loc, :lastmod)
88+
89+
# languages
90+
def languages
91+
Locale.live.each do |locale|
92+
unicode_url = language_url locale: locale.name.downcase.tr(' ', '-')
93+
slug_url = language_url locale: locale.slug.to_sym
94+
english_url = language_url locale: locale.name_in_english.downcase.tr(' ', '-')
95+
96+
[unicode_url, slug_url, english_url].uniq
97+
end
98+
end
99+
100+
def live_published_articles
101+
Rails.cache.fetch([:sitemap, @latest_article, :live_published_articles], expires_in: 12.hours) do
102+
Article.live
103+
.published
104+
.select(:id, :updated_at, :draft_code, :published_at, :publication_status, :slug)
105+
end
106+
end
107+
end

app/middlewares/rack/clean_path.rb

+1-1
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ def redirect location
7777
end
7878

7979
def exit_early?
80-
@req.path == '/' || @req.path.starts_with?('/assets')
80+
@req.path == '/' || @req.path.starts_with?('/assets') || @req.path.include?('sitemap')
8181
end
8282

8383
# get path before Unicode character got smooshed into it

app/models/podcast.rb

+6
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ class Podcast < ApplicationRecord
66

77
validates :slug, presence: true, uniqueness: true
88

9+
# hardcoding .published & .live to find all, since Podcast doesn't include Publishable
10+
# rubocop:disable Rails/DuplicateScope
11+
scope :published, -> { where.not(id: nil) }
12+
scope :live, -> { where.not(id: nil) }
13+
# rubocop:enable Rails/DuplicateScope
14+
915
def path
1016
"/podcasts/#{slug}"
1117
end

app/views/sitemap/_url.xml.erb

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
<%# locals: (loc:, lastmod: Time.current, changefreq: :weekly, priority: 0.5) %>
2+
3+
<url>
4+
<loc><%= loc %></loc>
5+
<lastmod><%= lastmod.iso8601 %></lastmod>
6+
<changefreq><%= changefreq %></changefreq>
7+
<priority><%= priority %></priority>
8+
</url>

app/views/sitemap/show.xml.erb

+182
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
<?xml version='1.0' encoding='utf-8'?>
2+
<urlset xmlns:xsi = 'http://www.w3.org/2001/XMLSchema-instance'
3+
xmlns = 'http://www.sitemaps.org/schemas/sitemap/0.9'
4+
xmlns:image = 'http://www.google.com/schemas/sitemap-image/1.1'
5+
xmlns:video = 'http://www.google.com/schemas/sitemap-video/1.1'
6+
xmlns:news = 'http://www.google.com/schemas/sitemap-news/0.9'
7+
xmlns:mobile = 'http://www.google.com/schemas/sitemap-mobile/1.0'
8+
xmlns:pagemap = 'http://www.google.com/schemas/sitemap-pagemap/1.0'
9+
xmlns:xhtml = 'http://www.w3.org/1999/xhtml'
10+
xsi:schemaLocation = 'http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd'>
11+
12+
<%= cache @last_modified do %>
13+
<%# homepage %>
14+
<%= render 'sitemap/url', loc: root_url, lastmod: @last_modified, changefreq: :daily, priority: 1.0 %>
15+
16+
<%# default articles feed, for english articles %>
17+
<%= render 'sitemap/url', loc: feed_url, lastmod: @last_modified, changefreq: :daily, priority: 1.0 %>
18+
<% end %>
19+
20+
<%# Atom feeds discovery page, for all languages with articles %>
21+
<%= render 'sitemap/url', loc: feeds_url, lastmod: @last_modified %>
22+
23+
<%# articles feed, for all languages with articles %>
24+
<% @localized_feeds.each do |locale| %>
25+
<%# Atom feed %>
26+
<%= render 'sitemap/url', loc: feed_url(locale.abbreviation), lastmod: @last_modified %>
27+
28+
<%# JSON feed (https://jsonfeed.org) %>
29+
<%= render 'sitemap/url', loc: json_feed_url(locale.abbreviation), lastmod: @last_modified %>
30+
<% end %>
31+
32+
<%# categories %>
33+
<%= render 'sitemap/url', loc: categories_url, lastmod: @last_modified %>
34+
<% @categories.each do |category| %>
35+
<%# category Atom feeds %>
36+
<%= render 'sitemap/url', loc: category_feed_url(category.slug), lastmod: @last_modified %>
37+
<%# category JSON feeds %>
38+
<%= render 'sitemap/url', loc: category_json_feed_url(category.slug), lastmod: @last_modified %>
39+
<%# category pages %>
40+
<%= render 'sitemap/url', loc: category_url(category.slug), lastmod: @last_modified %>
41+
<% end %>
42+
43+
<%# static-ish pages %>
44+
<% @static_paths.each do |path| %>
45+
<%= render 'sitemap/url', loc: [root_url, path].join('/'), lastmod: @last_modified %>
46+
<% end %>
47+
48+
<%# article years %>
49+
<% @article_years.each do |year| %>
50+
<% lastmod = DateTime.new(year).end_of_day %>
51+
<% lastmod = @last_modified if year == Time.current.year %>
52+
53+
<%= render 'sitemap/url', loc: article_archives_url(year: year), lastmod: lastmod %>
54+
<% end %>
55+
56+
<%# To Change Everything (TCE) %>
57+
<%= render 'sitemap/url', loc: to_change_everything_url, lastmod: @last_modified %>
58+
59+
<% @to_change_everything_languages.each do |tce_language| %>
60+
<%= render 'sitemap/url', loc: to_change_everything_url(lang: tce_language), lastmod: @last_modified %>
61+
<%= render 'sitemap/url', loc: [to_change_everything_url(lang: tce_language), '/get'].join, lastmod: @last_modified %>
62+
<% end %>
63+
64+
<%# articles %>
65+
<% @articles.find_each do |article| %>
66+
<% cache article do %>
67+
<%= render 'sitemap/url', loc: [root_url, article.path].join, lastmod: article.updated_at %>
68+
<% end %>
69+
<% end %>
70+
71+
<%# language pages %>
72+
<%= render 'sitemap/url', loc: languages_url, lastmod: @last_modified %>
73+
74+
<% Locale.live.each do |locale| %>
75+
<%
76+
# TODO: move these URLs to routes/model/helper
77+
unicode_url = language_url locale.name.downcase.tr(' ', '-')
78+
slug_url = language_url locale.slug.to_sym
79+
english_url = language_url locale.name_in_english.downcase.tr(' ', '-')
80+
81+
urls = [unicode_url, slug_url, english_url].uniq
82+
%>
83+
84+
<% urls.each do |url| %>
85+
<%= render 'sitemap/url', loc: url, lastmod: @last_modified %>
86+
<% end %>
87+
<% end %>
88+
89+
<%# support %>
90+
<%= render 'sitemap/url', loc: support_url, lastmod: @last_modified %>
91+
92+
<%# search %>
93+
<%= render 'sitemap/url', loc: search_url, lastmod: @last_modified %>
94+
<%= render 'sitemap/url', loc: advanced_search_url, lastmod: @last_modified %>
95+
96+
<%# tools %>
97+
<%# books %>
98+
<%= render 'sitemap/url', loc: books_url, lastmod: @last_modified %>
99+
<%= render 'sitemap/url', loc: books_extras_url(:work), lastmod: @last_modified %>
100+
<% @books.find_each do |book| %>
101+
<% cache book do %>
102+
<%= render 'sitemap/url', loc: book_url(book.slug), lastmod: book.updated_at %>
103+
<% end %>
104+
<% end %>
105+
106+
<%# logos %>
107+
<%= render 'sitemap/url', loc: logos_url, lastmod: @last_modified %>
108+
<% @logos.find_each do |logo| %>
109+
<% cache logo do %>
110+
<%= render 'sitemap/url', loc: logo_url(logo.slug), lastmod: logo.updated_at %>
111+
<% end %>
112+
<% end %>
113+
114+
<%# posters %>
115+
<%= render 'sitemap/url', loc: posters_url, lastmod: @last_modified %>
116+
<% @posters.find_each do |poster| %>
117+
<% cache poster do %>
118+
<%= render 'sitemap/url', loc: poster_url(poster.slug), lastmod: poster.updated_at %>
119+
<% end %>
120+
<% end %>
121+
122+
<%# stickers %>
123+
<%= render 'sitemap/url', loc: stickers_url, lastmod: @last_modified %>
124+
<% @stickers.find_each do |sticker| %>
125+
<% cache sticker do %>
126+
<%= render 'sitemap/url', loc: sticker_url(sticker.slug), lastmod: sticker.updated_at %>
127+
<% end %>
128+
<% end %>
129+
130+
<%# videos / music %>
131+
<%= render 'sitemap/url', loc: music_url, lastmod: @last_modified %>
132+
<%= render 'sitemap/url', loc: videos_url, lastmod: @last_modified %>
133+
<% @videos.find_each do |video| %>
134+
<% cache video do %>
135+
<%= render 'sitemap/url', loc: video_url(video.slug), lastmod: video.updated_at %>
136+
<% end %>
137+
<% end %>
138+
139+
<%# zines %>
140+
<%= render 'sitemap/url', loc: zines_url, lastmod: @last_modified %>
141+
<% @zines.find_each do |zine| %>
142+
<% cache zine do %>
143+
<%= render 'sitemap/url', loc: zine_url(zine.slug), lastmod: zine.updated_at %>
144+
<% end %>
145+
<% end %>
146+
147+
<%# journals / issues %>
148+
<%= render 'sitemap/url', loc: journals_url, lastmod: @last_modified %>
149+
<% @journals.find_each do |journal| %>
150+
<% cache journal do %>
151+
<%= render 'sitemap/url', loc: journal_url(journal.slug), lastmod: journal.updated_at %>
152+
153+
<% journal.issues.each do |issue| %>
154+
<% cache issue do %>
155+
<%= render 'sitemap/url',
156+
loc: issue_url(slug: journal.slug, issue_number: issue.issue),
157+
lastmod: issue.updated_at %>
158+
<% end %>
159+
<% end %>
160+
<% end %>
161+
<% end %>
162+
163+
<%# podcasts / episodes %>
164+
<%= render 'sitemap/url', loc: podcasts_url, lastmod: @last_modified %>
165+
<% @podcasts.find_each do |podcast| %>
166+
<% cache podcast do %>
167+
<%= render 'sitemap/url', loc: podcast_url(podcast.slug), lastmod: podcast.updated_at %>
168+
169+
<% podcast.episodes.each do |episode| %>
170+
<% cache episode do %>
171+
<%= render 'sitemap/url',
172+
loc: episode_url(slug: podcast.slug, episode_number: episode.episode_number),
173+
lastmod: episode.updated_at %>
174+
<%= render 'sitemap/url',
175+
loc: episode_transcript_url(slug: podcast.slug, episode_number: episode.episode_number),
176+
lastmod: episode.updated_at %>
177+
<% end %>
178+
<% end %>
179+
<% end %>
180+
<% end %>
181+
182+
</urlset>

config/routes.rb

+9-6
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
require 'sidekiq/web'
22

33
Rails.application.routes.draw do
4+
get 'sitemap.xml', to: 'sitemap#show', as: :sitemap
5+
46
# TODO: After switching the site auth to Devise, enable this auth protected route
57
# # Sidekiq admin interface to monitor background jobs
68
# authenticate :user, ->(user) { user.admin? } do
@@ -82,10 +84,11 @@
8284
get 'articles/:id_or_slug/collection_posts', to: 'collection_posts#index'
8385

8486
# Categories
85-
get 'categories', to: 'categories#index', as: :categories
86-
get 'categories/:slug/page(/1)', to: redirect { |path_params, _| "/categories/#{path_params[:slug]}" }
87-
get 'categories/:slug(/page/:page)', to: 'categories#show', as: :category
88-
get 'categories/:slug/feed(/:lang)', to: 'categories#feed', defaults: { format: 'atom' }, as: :category_feed
87+
get 'categories', to: 'categories#index', as: :categories
88+
get 'categories/:slug/page(/1)', to: redirect { |path_params, _| "/categories/#{path_params[:slug]}" }
89+
get 'categories/:slug(/page/:page)', to: 'categories#show', as: :category
90+
get 'categories/:slug/feed(/:lang).json', to: 'categories#feed', defaults: { format: 'json' }, as: :category_json_feed
91+
get 'categories/:slug/feed(/:lang)', to: 'categories#feed', defaults: { format: 'atom' }, as: :category_feed
8992

9093
# Tags
9194
get 'tags/:slug/page(/1)', to: redirect { |path_params, _| "/tags/#{path_params[:slug]}" }
@@ -144,8 +147,8 @@
144147
get 'random', to: 'tools#random', as: :random_tool
145148

146149
# Site search
147-
get 'search', to: 'search#index'
148-
get 'search/advanced', to: redirect('/search')
150+
get 'search', to: 'search#index', as: :search
151+
get 'search/advanced', to: redirect('/search'), as: :advanced_search
149152

150153
# Support
151154
get 'support', to: 'support#new', as: :support

spec/requests/clean_paths_spec.rb

+2-1
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@
1616
end
1717

1818
it 'does not strip .xml extension from sitemap.xml' do
19+
# TODO: update this test when we remove sitemap_generator gem and scheduled jobs
1920
Rake::Task['sitemap:refresh:no_ping'].invoke
2021

21-
get 'http://example.com/sitemap.xml.gz'
22+
get 'http://example.com/sitemap.xml'
2223

2324
expect(response).to have_http_status(:ok)
2425
expect(response.header['Location']).to be_nil

0 commit comments

Comments
 (0)