Skip to content

Add simple Hugo configuration for static PINVAL generation#40

Merged
jeancochrane merged 12 commits intomainfrom
jeancochrane/36-convert-quarto-doc-to-hugo-template
May 9, 2025
Merged

Add simple Hugo configuration for static PINVAL generation#40
jeancochrane merged 12 commits intomainfrom
jeancochrane/36-convert-quarto-doc-to-hugo-template

Conversation

@jeancochrane
Copy link
Member

@jeancochrane jeancochrane commented May 5, 2025

This PR adds a simple Hugo site that we can use to generate PINVAL reports. For now, the Hugo configuration is defined in parallel to the existing Quarto configuration, so that we can maintain the legacy Quarto process while we continue work on the Hugo process. Once the Hugo process is production-ready, I'll put up a follow-up PR that removes the legacy Quarto doc for the sake of cleanliness.

Testing

To test out the Hugo site:

  • Open WSL
  • Clone or navigate to the pinval repo
  • Make sure you have Hugo installed: sudo snap install hugo
    • This will prompt you for your terminal user password
    • Run hugo version to confirm that it installed correctly
      • You might need to open a new shell before the CLI is available in your path
  • Navigate to the pinval/pinval/ subdirectory
  • Run hugo serve
  • Navigate to http://localhost:1313/example-single-card/ to view the sample single-card report
  • Navigate to http://localhost:1313/example-multi-card/ to view the sample multi-card report

Benchmarking

Build time

I was curious how fast Hugo can build this report, and how much memory/CPU we'd need to build a full tri worth of reports. I used a quick command like this one to copy the single card report N times:

$ export N=500000
$ time for i in $(seq 1 $N); do cp content/example-single-card.md "content/example-single-card-${i}.md"; done               

I tried running on 500k reports but ended up running out of memory on my 16GB laptop. Public repo GitHub runners have 16GB RAM while private repo runners have 7GB RAM, so this indicates that we probably can't build an entire tri in one shot.

I started at 100k reports and incrementally increased the number of reports until I ran out of memory. On my machine, memory starts to max out and things start to slow down around 250k reports, but 250k reports run very fast:

$ time hugo build
Start building sites …
hugo v0.147.0-7d0039b86ddd6397816cc3383cb0cfa481b15f32+extended linux/amd64 BuildDate=2025-04-25T15:26:28Z VendorInfo=snap:0.147.0

                   |   EN
-------------------+---------
  Pages            | 250002
  Paginator pages  |      0
  Non-page files   |      0
  Static files     |      7
  Processed images |      0
  Aliases          |      0
  Cleaned          |      0

Total in 88588 ms

real    1m30.243s
user    12m14.501s
sys     1m50.962s

The important number there is the real wall clock time: 90 seconds to generate 200k reports.

While this benchmarking means we likely can't build a whole tri in one shot on a GitHub public repo runner (let alone a private repo runner), we do have one option to increase the number of reports we generate in one go without increasing the memory allocation on our runners: Segmenting our reports by township. Since the biggest town (Lake) has ~190k PINs, public repo runners should have enough RAM to generate the biggest town in one shot, so we can segment by town and then call hugo build with the --renderSegments flag for each town in the tri in order to build an entire tri in one job. (If we want to get even fancier/more performant, we could parallelize each township in a separate GitHub workflow using dynamic job matrices.) If this seems good to you, I'll open a follow-up issue to perform this segmenting and we can pick it up once we're further along with #37.

Disk usage

While I was benchmarking performance, I was also curious about realistic disk usage.

200k reports use roughly 7.7GB of storage on disk:

$ du -sh public
7.7G    public

This is because each report is about 28KB:

$ du -sh public/example-single-card/index.html
28K     public/example-single-card/index.html

We can cut about a third off of that using the --minify argument to hugo build:

$ hugo build --minify
...
Total in 99ms
$ du -sh public/example-single-card/index.html
20K     public/example-single-card/index.html

This suggests a total disk usage of about 500k X 20KB ~= 10GB per tri. However, we should expect this to be larger, because not all predictors are yet present in the characteristics table. I would guess the final report will be on the order of 1.5x as large as this test report, or ~15GB per tri.

@jeancochrane jeancochrane linked an issue May 5, 2025 that may be closed by this pull request
@jeancochrane jeancochrane force-pushed the jeancochrane/36-convert-quarto-doc-to-hugo-template branch from 61dc406 to 13f1c00 Compare May 6, 2025 16:13
@jeancochrane jeancochrane force-pushed the jeancochrane/36-convert-quarto-doc-to-hugo-template branch from 13f1c00 to eed6430 Compare May 6, 2025 16:13
@@ -0,0 +1,278 @@
---
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of the Markdown frontmatter that Hugo needs for a multi-card PIN in order to render a PINVAL report. The higher-level vision here is that eventually we'll have a GitHub workflow (#37) that runs a Python script (#38) that queries the PINVAL tables in Athena (ccao-data/data-architecture#793) and generates one of these Markdown files in the hugo/content/ directory for every PIN in a tri (or in a list of input PINs); then it will call the hugo command to compile those Markdown files into output HTML pages using the template defined in the hugo/layouts/ subdir below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. It's a sort of params file through which our html layout dynamically generates the static webpage html file.

@@ -0,0 +1,144 @@
---
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a single-card example. The main difference is that the cards array only has one element.

Comment on lines +1 to +4
baseURL = 'http://example.org/'
languageCode = 'en-us'
title = 'PINVAL'
disableKinds = ['sitemap', 'rss']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just Hugo boilerplate. It's not really important at this point.

<!-- Boilerplate for mobile responsiveness -->
<meta name="viewport" content="width=device-width, initial-scale=1">

<title>{{ .Title }}</title>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the basic syntax for taking a Hugo parameter that's defined in the Markdown frontmatter and templating it into an HTML layout. Note that while most frontmatter parameters are stored on the top level .Params object, .Title is a reserved parameter that is stored separately.

You'll often see parameters and variables referenced using leading dots (as here) and dollar signs in this template. Those are used to reference the context in which the variable should appear. The notion of context is a little tricky, but I thought this explanation was helpful for grounding it in the perspective of Go templates, which is the foundational technology on which Hugo is built.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context here. From what I understand the . represents the current context and includes our .Params object which includes our frontmatter (the yaml info in our md file). The $ is something that declared and used within the go template itself to assign variables, build logic, etc.

</div>
</div>

{{ $is_multicard := gt (len .Params.cards) 1 }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gt is a builtin Hugo function that works similarly to the > operator in R or Python. Per the docs I linked above, we prefix the variable name with a dollar sign prior to assignment because we want it to be available in the template's global context.

{{ if $is_multicard }}
<!-- Multicard: Create a tab for each card to show its comps -->
<ul class="nav nav-tabs" id="propertyCardTabs" role="tablist">
{{ range $index, $card := .Params.cards }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range is a Hugo builtin analogous to a for loop.

role="tabpanel"
aria-labelledby="card-{{ $index }}-tab"
>
{{ template "card-content" (dict "card" $card "is_multicard" true) }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template is a Hugo builtin that allows us to pass some parameters into a template and render that template inside the current template. We use it here to abstract out the content of the comps section, which is basically identical for all cards, but needs to be rendered multiple times for multi-cards.

</div>

<!-- Define a template for card content -->
{{ define "card-content" }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we use the define builtin to define the card-content template that we rendered above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that hugo parses all of the define blocks before other parts of the code, so that you can you call a template earlier in the code than the line in which it is defined

Comment on lines +365 to +371
<!-- Comp map -->
<h2 class="mb-3">Top 5 comparable sales</h2>
<div
id="map-{{ if .is_multicard }}{{ .card.card_num }}{{ else }}main{{ end }}"
class="map-container"
>
</div>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map is mostly defined using JavaScript in the <script> tags below. Here, we create a container for it it that we can target using our JS code.

Comment on lines +495 to +506
document.addEventListener("DOMContentLoaded", function() {
{{ if $is_multicard }}
{{ range $index, $card := .Params.cards }}
initializeMap("map-{{ $card.card_num }}", {{ $index }});
initializeTable("comp-table-{{ $card.card_num }}");
{{ end }}
renderMapsOnDisplay(mapRegistry);
{{ else }}
initializeMap("map-main", 0);
initializeTable("comp-table-main");
{{ end }}
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interaction between Hugo template code and JavaScript code is a little bit sneaky here, because the Hugo template code is run at build time, while the JavaScript code is run at run time. In other words, the Hugo template code is executed when Hugo is is compiling the template into an output HTML page, while the JavaScript code is executed when the user's browser loads the compiled page. So when we're building the site in our build step, Hugo will output one of two different versions of this function depending on the value of the $is_multicard variable, for example:

    // Example 1: is_multicard === true, 2 cards
    document.addEventListener("DOMContentLoaded", function() {
      initializeMap("map-0", 0);
      initializeTable("comp-table-0");
      initializeMap("map-1", 1);
      initializeTable("comp-table-1");
      renderMapsOnDisplay(mapRegistry);
    });

    // Example 2: is_multicard == false
    document.addEventListener("DOMContentLoaded", function() {
      initializeMap("map-main", 0);
      initializeTable("comp-table-main");
    });

For any given PIN, only one of these code blocks will exist in the output HTML file. Then, when a user loads the file in their browser, that block will execute the function according to the JavaScript code block that exists in the template. Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. Essentially highlighting the difference between build of the html file and then the javascript code within that built html file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

@jeancochrane jeancochrane changed the title [WIP] Add simple Hugo configuration for static PINVAL generation Add simple Hugo configuration for static PINVAL generation May 7, 2025
- "char_fbath"
- "char_hbath"
- card_num: 2
location:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So our new multi-card strategy for comps spits outs the same comps for all cards within 2-3 card properties. I was going to suggest some sort of condensed data structure here, but as I started typing this I realized we will have different comps for 4+ multi-card PINs. So I think a bit of duplication in the 2-3 card case is ideal here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be open to a condensed data structure! I didn't spend too much time thinking about the 2-3 card case, which I'm saving for #31.

Copy link
Member

@wagnerlmichael wagnerlmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. This mock-up is more than enough for me to incorporate it into the python script and start developing. Nice work

@jeancochrane jeancochrane merged commit 15336bd into main May 9, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Convert Quarto doc to Hugo template

2 participants