Skip to content

Metadata propagation: context-bound fields (grapher_config, presentation Jinja) leak into non-dimensional downstream variables #5931

@lucasrodes

Description

@lucasrodes

One-liner

When a dimensioned variable with Jinja-templated presentation.grapher_config (e.g. un_wpp.population referencing << age >>/<< sex >>) is combined via arithmetic with another variable, or consumed by a downstream step that produces a non-dimensional output, its context-bound metadata can survive into the result — and then explode at grapher render time with UndefinedError: 'age' is undefined.

Context

Variable metadata today is designed to be propagated through pandas arithmetic and YAML overlays, which is the right default for origins, sources, licenses — those are legitimately inherited. But presentation (and especially presentation.grapher_config) is a context-bound field: its Jinja templates only make sense in the exact dimensional context where they were authored. Propagating them silently into a different context is a footgun.

Two mechanisms contribute:

  1. Runtime combine in lib/catalog/owid/catalog/core/indicators.py (combine_indicators_metadata, ~L881–993):

    • origins/sources/licenses → combined (unique union) ✅ sensible.
    • display and presentation → kept only if all operands are identical, else None (L808–824, L827–844).
    • Division (/) keeps only the first operand's metadata (L625–630).
    • Net effect: if one operand has rich presentation and the other has None, they're not identical — so presentation is dropped. But if operands happen to share it (or one path funnels un_wpp into the result), the Jinja survives with no sanity check against the output's dimensions.
  2. YAML overlay merge in lib/catalog/owid/catalog/core/yaml_metadata.py (_merge_variable_metadata, L148–180):

    • merge_fields=[\"presentation\", \"grapher_config\"] (L152) → deep-merge semantics.
    • A downstream *.meta.yml that overrides presentation.title_public does not clear sibling presentation.grapher_config.subtitle inherited at runtime. The author has to know to explicitly write subtitle: \"\" to clear it.

Together these make it very easy to ship a grapher variable whose subtitle Jinja can't resolve in the new context.

Example

un_wpp.population defines a Jinja subtitle tied to age/sex:

# etl/steps/data/garden/un/2024-07-12/un_wpp.meta.yml
presentation:
  grapher_config:
    subtitle: |-
      <%- if age == '0' %>
      <%- if sex == 'all' %>Children under 1 year old.<%- elif sex == 'female' %>Girls...<%- endif %>
      <%- elif age == '18+' %>...<%- endif %> {definitions.global.projections}

malnutrition/2024-12-16/malnutrition.py multiplies a WDI rate by un_wpp.population:

tb_under_five = tb_population[(tb_population[\"age\"] == \"0-4\") & ...]
tb = pr.merge(tb, tb_under_five, on=[\"country\", \"year\"])
for col in COLUMNS:
    tb[COLUMNS[col]] = ((tb[col] / 100) * tb[\"population\"]).round(0).astype(\"Int64\")
tb = tb.drop(columns=[..., \"sex\", \"age\", \"variant\"])
tb = tb.format([\"country\", \"year\"], short_name=\"malnutrition\")

malnutrition.meta.yml overrides title_public but not grapher_config.subtitle. At grapher time:

jinja2.exceptions.UndefinedError: 'age' is undefined
ValueError: Error expanding Jinja in metadata for column 'number_of_stunted_children' with dim values: {}.

The subtitle in the failing metadata is byte-for-byte the template from un_wpp.population — inherited all the way through despite the output having no dimensions.

A similar fix pattern already exists at etl/steps/data/garden/demography/2024-07-12/un_wpp_historical.meta.yml (commit 482b09c), where authors manually set subtitle: \"\" / note: \"\" to neutralize inherited Jinja. That pattern is the workaround — it's also evidence that the default is wrong.

Solution space (for discussion)

A few directions, roughly ordered by blast radius:

  1. Don't propagate presentation across arithmetic by default. Treat it like title/description_short (which already reset) rather than origins. Motivation: presentation describes how a specific indicator should be presented; it's not a property of the data. Downstream authors who want it must set it explicitly. This is my preferred option — it matches what users expect.

  2. Strip Jinja-templated presentation.grapher_config fields when dimensions change. At combine time, detect Jinja syntax (<%, {{) in string fields and drop them if the operand context differs. More surgical but more magic.

  3. Validate Jinja renders at garden save time, not at grapher time. Attempt to render every templated metadata field against the current table's dimensions; fail fast with a clear error pointing at the authoring step rather than 3 steps downstream. Doesn't fix the propagation, but makes the footgun cheap to debug.

  4. Change YAML overlay semantics for presentation: if a downstream .meta.yml specifies anything under presentation, treat it as a full replace rather than deep merge (opt-in deep-merge via an explicit key?). Riskier — lots of existing YAML relies on partial overrides of presentation.topic_tags etc.

  5. Make "dimensioned Jinja" a first-class concept. Mark grapher_config.subtitle fields that reference dimensions with metadata (e.g. _dims_required: [age, sex]), and drop them automatically when those dims are not in the output table's index. Solves it cleanly but requires tagging.

I lean toward (1) + (3): stop propagating presentation across arithmetic (it's almost never the right thing), and add a save-time Jinja lint so the failure mode moves from "grapher step 500 lines deep" to "garden step tells you which field is broken."

Happy to prototype (1) behind a feature flag in combine_indicators_metadata if there's appetite.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions