Add simple Hugo configuration for static PINVAL generation#40
Conversation
61dc406 to
13f1c00
Compare
13f1c00 to
eed6430
Compare
| @@ -0,0 +1,278 @@ | |||
| --- | |||
There was a problem hiding this comment.
This is an example of the Markdown frontmatter that Hugo needs for a multi-card PIN in order to render a PINVAL report. The higher-level vision here is that eventually we'll have a GitHub workflow (#37) that runs a Python script (#38) that queries the PINVAL tables in Athena (ccao-data/data-architecture#793) and generates one of these Markdown files in the hugo/content/ directory for every PIN in a tri (or in a list of input PINs); then it will call the hugo command to compile those Markdown files into output HTML pages using the template defined in the hugo/layouts/ subdir below.
There was a problem hiding this comment.
Makes sense to me. It's a sort of params file through which our html layout dynamically generates the static webpage html file.
| @@ -0,0 +1,144 @@ | |||
| --- | |||
There was a problem hiding this comment.
Here's a single-card example. The main difference is that the cards array only has one element.
| baseURL = 'http://example.org/' | ||
| languageCode = 'en-us' | ||
| title = 'PINVAL' | ||
| disableKinds = ['sitemap', 'rss'] |
There was a problem hiding this comment.
This is just Hugo boilerplate. It's not really important at this point.
| <!-- Boilerplate for mobile responsiveness --> | ||
| <meta name="viewport" content="width=device-width, initial-scale=1"> | ||
|
|
||
| <title>{{ .Title }}</title> |
There was a problem hiding this comment.
This is the basic syntax for taking a Hugo parameter that's defined in the Markdown frontmatter and templating it into an HTML layout. Note that while most frontmatter parameters are stored on the top level .Params object, .Title is a reserved parameter that is stored separately.
You'll often see parameters and variables referenced using leading dots (as here) and dollar signs in this template. Those are used to reference the context in which the variable should appear. The notion of context is a little tricky, but I thought this explanation was helpful for grounding it in the perspective of Go templates, which is the foundational technology on which Hugo is built.
There was a problem hiding this comment.
Thanks for the context here. From what I understand the . represents the current context and includes our .Params object which includes our frontmatter (the yaml info in our md file). The $ is something that declared and used within the go template itself to assign variables, build logic, etc.
| </div> | ||
| </div> | ||
|
|
||
| {{ $is_multicard := gt (len .Params.cards) 1 }} |
There was a problem hiding this comment.
gt is a builtin Hugo function that works similarly to the > operator in R or Python. Per the docs I linked above, we prefix the variable name with a dollar sign prior to assignment because we want it to be available in the template's global context.
| {{ if $is_multicard }} | ||
| <!-- Multicard: Create a tab for each card to show its comps --> | ||
| <ul class="nav nav-tabs" id="propertyCardTabs" role="tablist"> | ||
| {{ range $index, $card := .Params.cards }} |
There was a problem hiding this comment.
range is a Hugo builtin analogous to a for loop.
| role="tabpanel" | ||
| aria-labelledby="card-{{ $index }}-tab" | ||
| > | ||
| {{ template "card-content" (dict "card" $card "is_multicard" true) }} |
There was a problem hiding this comment.
template is a Hugo builtin that allows us to pass some parameters into a template and render that template inside the current template. We use it here to abstract out the content of the comps section, which is basically identical for all cards, but needs to be rendered multiple times for multi-cards.
| </div> | ||
|
|
||
| <!-- Define a template for card content --> | ||
| {{ define "card-content" }} |
There was a problem hiding this comment.
Here we use the define builtin to define the card-content template that we rendered above.
There was a problem hiding this comment.
TIL that hugo parses all of the define blocks before other parts of the code, so that you can you call a template earlier in the code than the line in which it is defined
| <!-- Comp map --> | ||
| <h2 class="mb-3">Top 5 comparable sales</h2> | ||
| <div | ||
| id="map-{{ if .is_multicard }}{{ .card.card_num }}{{ else }}main{{ end }}" | ||
| class="map-container" | ||
| > | ||
| </div> |
There was a problem hiding this comment.
The map is mostly defined using JavaScript in the <script> tags below. Here, we create a container for it it that we can target using our JS code.
| document.addEventListener("DOMContentLoaded", function() { | ||
| {{ if $is_multicard }} | ||
| {{ range $index, $card := .Params.cards }} | ||
| initializeMap("map-{{ $card.card_num }}", {{ $index }}); | ||
| initializeTable("comp-table-{{ $card.card_num }}"); | ||
| {{ end }} | ||
| renderMapsOnDisplay(mapRegistry); | ||
| {{ else }} | ||
| initializeMap("map-main", 0); | ||
| initializeTable("comp-table-main"); | ||
| {{ end }} | ||
| }); |
There was a problem hiding this comment.
The interaction between Hugo template code and JavaScript code is a little bit sneaky here, because the Hugo template code is run at build time, while the JavaScript code is run at run time. In other words, the Hugo template code is executed when Hugo is is compiling the template into an output HTML page, while the JavaScript code is executed when the user's browser loads the compiled page. So when we're building the site in our build step, Hugo will output one of two different versions of this function depending on the value of the $is_multicard variable, for example:
// Example 1: is_multicard === true, 2 cards
document.addEventListener("DOMContentLoaded", function() {
initializeMap("map-0", 0);
initializeTable("comp-table-0");
initializeMap("map-1", 1);
initializeTable("comp-table-1");
renderMapsOnDisplay(mapRegistry);
});
// Example 2: is_multicard == false
document.addEventListener("DOMContentLoaded", function() {
initializeMap("map-main", 0);
initializeTable("comp-table-main");
});For any given PIN, only one of these code blocks will exist in the output HTML file. Then, when a user loads the file in their browser, that block will execute the function according to the JavaScript code block that exists in the template. Does that make sense?
There was a problem hiding this comment.
This makes sense to me. Essentially highlighting the difference between build of the html file and then the javascript code within that built html file?
| - "char_fbath" | ||
| - "char_hbath" | ||
| - card_num: 2 | ||
| location: |
There was a problem hiding this comment.
So our new multi-card strategy for comps spits outs the same comps for all cards within 2-3 card properties. I was going to suggest some sort of condensed data structure here, but as I started typing this I realized we will have different comps for 4+ multi-card PINs. So I think a bit of duplication in the 2-3 card case is ideal here
There was a problem hiding this comment.
I would be open to a condensed data structure! I didn't spend too much time thinking about the 2-3 card case, which I'm saving for #31.
wagnerlmichael
left a comment
There was a problem hiding this comment.
Looks good to me. This mock-up is more than enough for me to incorporate it into the python script and start developing. Nice work
This PR adds a simple Hugo site that we can use to generate PINVAL reports. For now, the Hugo configuration is defined in parallel to the existing Quarto configuration, so that we can maintain the legacy Quarto process while we continue work on the Hugo process. Once the Hugo process is production-ready, I'll put up a follow-up PR that removes the legacy Quarto doc for the sake of cleanliness.
Testing
To test out the Hugo site:
pinvalreposudo snap install hugohugo versionto confirm that it installed correctlypinval/pinval/subdirectoryhugo servehttp://localhost:1313/example-single-card/to view the sample single-card reporthttp://localhost:1313/example-multi-card/to view the sample multi-card reportBenchmarking
Build time
I was curious how fast Hugo can build this report, and how much memory/CPU we'd need to build a full tri worth of reports. I used a quick command like this one to copy the single card report N times:
I tried running on 500k reports but ended up running out of memory on my 16GB laptop. Public repo GitHub runners have 16GB RAM while private repo runners have 7GB RAM, so this indicates that we probably can't build an entire tri in one shot.
I started at 100k reports and incrementally increased the number of reports until I ran out of memory. On my machine, memory starts to max out and things start to slow down around 250k reports, but 250k reports run very fast:
The important number there is the
realwall clock time: 90 seconds to generate 200k reports.While this benchmarking means we likely can't build a whole tri in one shot on a GitHub public repo runner (let alone a private repo runner), we do have one option to increase the number of reports we generate in one go without increasing the memory allocation on our runners: Segmenting our reports by township. Since the biggest town (Lake) has ~190k PINs, public repo runners should have enough RAM to generate the biggest town in one shot, so we can segment by town and then call
hugo buildwith the--renderSegmentsflag for each town in the tri in order to build an entire tri in one job. (If we want to get even fancier/more performant, we could parallelize each township in a separate GitHub workflow using dynamic job matrices.) If this seems good to you, I'll open a follow-up issue to perform this segmenting and we can pick it up once we're further along with #37.Disk usage
While I was benchmarking performance, I was also curious about realistic disk usage.
200k reports use roughly 7.7GB of storage on disk:
This is because each report is about 28KB:
We can cut about a third off of that using the
--minifyargument tohugo build:This suggests a total disk usage of about 500k X 20KB ~= 10GB per tri. However, we should expect this to be larger, because not all predictors are yet present in the characteristics table. I would guess the final report will be on the order of 1.5x as large as this test report, or ~15GB per tri.