Skip to content

Commit 5a83374

Browse files
authored
docs: topic-based filtering metadata for publications (#1310) (#1314)
* docs: topic-based filtering of publications (#1310) The publications pages used to render a single flat bibliography with no way to navigate by subject and no stable anchors for downstream sites to deep-link into. Each SUEWS and community bib entry now carries a topic slug from an 8-label controlled vocabulary derived from the corpus (energy-balance, water-balance, storage-heat, radiation, anthropogenic-heat, carbon-flux, building-energy, model-infrastructure). The pages render per-topic subsections with stable .. _pub-<slug>: anchors and keep an "All publications" flat fallback. Every entry also carries an abstract so the corpus is browsable without re-fetching. A new /audit-refs project skill enforces the convention without network access and can optionally enrich abstracts via WoS / Crossref for future additions, with a --crossref-only fallback for collaborators without a WoS key. The rule at .claude/rules/docs/bib-topic-tags.md documents the vocabulary and the multi-topic policy. Also corrects year/month on two bib entries to match authoritative Crossref and WoS print-publication metadata: - A11 (Allen et al., joc.2210): 2010 sep -> 2011 nov - I11 (Iamarino et al., joc.2390): 2011 jul -> 2012 sep * chore: rename audit-refs skill to curate-refs Avoids naming collision with the existing audit-pr skill, and better captures the scope of the tool: enforce topic-tag convention and backfill missing metadata via WoS/Crossref. * docs: narrow GH#1310 scope to metadata only Revert the publications-page layout changes so this PR carries only the bib metadata work (topic slugs, abstracts, print-year corrections) plus the curate-refs skill and the convention rule. Per-topic RST subsections and stable pub-<slug> anchors move to a follow-up PR once the landing site is ready to consume them. Restores: - docs/source/related_publications.rst (flat :all: bibliography) - docs/source/community_publications.rst (original submission note) - docs/source/workflow.rst (raw Recent_publications link, unchanged)
1 parent bf09d71 commit 5a83374

File tree

6 files changed

+807
-15
lines changed

6 files changed

+807
-15
lines changed
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Bibliography topic tags
2+
3+
Rules for the `keywords` field on entries in `docs/source/assets/refs/refs-SUEWS.bib` and `docs/source/assets/refs/refs-community.bib`.
4+
5+
These bib files drive `docs/source/related_publications.rst` and `docs/source/community_publications.rst`, which render per-topic subsections with stable `.. _pub-<slug>:` anchors for external deep-linking. The curator at `.claude/skills/curate-refs/` enforces this convention and backfills missing metadata.
6+
7+
---
8+
9+
## Every entry MUST carry at least one topic slug
10+
11+
```bibtex
12+
@article{KEY,
13+
...
14+
keywords = {energy-balance, water-balance},
15+
}
16+
```
17+
18+
Slug format:
19+
20+
- Lowercase only.
21+
- Hyphen-separated (`energy-balance`, not `energy_balance` or `EnergyBalance`).
22+
- No spaces, no uppercase.
23+
- Drawn from the controlled vocabulary below. Do not invent new slugs without updating the vocabulary.
24+
25+
## Controlled vocabulary
26+
27+
- `energy-balance` — surface energy balance partitioning, flux schemes (Q*, QE, QH).
28+
- `water-balance` — urban hydrology: evapotranspiration, snow, irrigation, runoff, densification impacts.
29+
- `storage-heat` — delta-QS parameterisation lineage (OHM, AnOHM).
30+
- `radiation` — net all-wave radiation (NARP), SOLWEIG, mean radiant temperature, aerosol radiative effects.
31+
- `anthropogenic-heat` — QF modelling (LUCY, GreaterQF), building/traffic/metabolism emissions.
32+
- `carbon-flux` — urban CO₂ exchange, biogenic vs anthropogenic sources, tree sequestration.
33+
- `building-energy` — urban meteorology for building energy simulations (vertical profiles, uTMY).
34+
- `model-infrastructure` — SUEWS and SuPy code, coupling with atmospheric models (WRF, CBL), reanalysis forcing workflows.
35+
36+
Keep this list in sync with the header comment of both bib files and the vocabulary list in `scripts/audit.py`.
37+
38+
## Multi-topic policy
39+
40+
Papers can (and should) carry multiple slugs when they substantively contribute across themes. They appear in every relevant topic section on the docs pages; the "All publications" section de-duplicates. Target average is ~1.5–2 tags per paper; don't stretch to include themes the paper only mentions in passing.
41+
42+
## Expanding the vocabulary
43+
44+
When adding a new slug:
45+
46+
1. Update the vocabulary list above.
47+
2. Update the header comment of both bib files.
48+
3. Update `VOCAB` in `.claude/skills/curate-refs/scripts/audit.py`.
49+
4. Add a new topic section in `docs/source/related_publications.rst` with a `.. _pub-<slug>:` anchor and filtered bibliography directive.
50+
5. Rerun `/curate-refs` to confirm all entries still pass.
51+
52+
## Programmatic enforcement
53+
54+
Run before committing any bib change:
55+
56+
```
57+
/curate-refs
58+
```
59+
60+
The skill documentation at `.claude/skills/curate-refs/SKILL.md` covers:
61+
62+
- Base audit (no network, no API key required) — convention check only.
63+
- `/curate-refs --enrich` — optionally fetch missing `abstract` fields via WoS/Crossref cascade (requires `WOS_EXPANDED_API_KEY` or `WOS_API_KEY`; `--crossref-only` fallback for collaborators without a WoS key).
64+
65+
The existing user-level `refs-checker` skill handles DOI-to-metadata verification against Crossref/WoS — complementary and different purpose.
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
name: curate-refs
3+
description: Curate SUEWS bib files — enforce topic-tag convention and backfill missing metadata. Base mode checks every entry for a valid topic slug from the controlled vocabulary; optional --enrich mode fetches missing abstracts via WoS/Crossref. Use before committing any change to docs/source/assets/refs/refs-SUEWS.bib or refs-community.bib, or whenever asked to "curate refs", "check bib tags", "verify topic slugs". Complementary to the user-level refs-checker skill which verifies DOI-to-metadata correctness.
4+
---
5+
6+
# curate-refs
7+
8+
Reference-library curator for SUEWS publication bib files. Enforces the topic-tag convention (every entry carries a valid slug from the controlled vocabulary defined in `.claude/rules/docs/bib-topic-tags.md`) and backfills missing metadata. Pairs with `docs/source/related_publications.rst` which renders per-topic subsections via `sphinxcontrib-bibtex`'s `:filter:` directive on the `keywords` field.
9+
10+
## When to run
11+
12+
- Before committing any change to `refs-SUEWS.bib` or `refs-community.bib`.
13+
- After adding a new bib entry or expanding the vocabulary.
14+
- Whenever the user asks to "curate refs", "check bib tags", "verify topic slugs".
15+
16+
## Scope
17+
18+
- This skill checks **convention compliance**: every entry has a `keywords` field, every slug is in the approved vocabulary, slug format is lowercase-hyphen, no duplicate citation keys, required fields present, abstracts populated (informational, not a failure).
19+
- It does **not** verify DOI-to-paper correctness — use the user-level `refs-checker` skill for that (`/Users/tingsun/.claude/scripts/bib_audit.py`).
20+
21+
## Base invocation (no network, no API key)
22+
23+
```bash
24+
uv run --no-project --with requests python .claude/skills/curate-refs/scripts/audit.py \
25+
docs/source/assets/refs/refs-SUEWS.bib \
26+
docs/source/assets/refs/refs-community.bib
27+
```
28+
29+
Exit code is non-zero if any convention violation is found. Missing abstracts are reported as warnings only.
30+
31+
## Enrichment (WoS/Crossref)
32+
33+
Populate missing `abstract` fields in place. Idempotent — entries already carrying a non-empty abstract are skipped.
34+
35+
```bash
36+
# With Ting's WoS key (set WOS_EXPANDED_API_KEY or WOS_API_KEY in env)
37+
uv run --no-project --with requests python .claude/skills/curate-refs/scripts/enrich.py \
38+
docs/source/assets/refs/refs-SUEWS.bib \
39+
docs/source/assets/refs/refs-community.bib
40+
41+
# Without a WoS key (for collaborators)
42+
uv run --no-project --with requests python .claude/skills/curate-refs/scripts/enrich.py \
43+
docs/source/assets/refs/refs-SUEWS.bib \
44+
docs/source/assets/refs/refs-community.bib \
45+
--crossref-only
46+
```
47+
48+
Cascade: WoS Expanded → WoS Starter → Crossref → OpenAlex. Flags:
49+
50+
- `--crossref-only` — skip WoS (for collaborators without an API key).
51+
- `--dry-run` — report sources without modifying files.
52+
- `--delay SECONDS` — pause between API calls (default 0.3).
53+
54+
If no key is set and `--crossref-only` is absent, the script prints a one-line warning and still runs using Crossref + OpenAlex.
55+
56+
## Typical workflow
57+
58+
1. Add a new bib entry (with `keywords` populated per the vocabulary).
59+
2. Run the base audit to catch slug typos or missing fields.
60+
3. If the new entry lacks an abstract, run the enrichment pass.
61+
4. Commit the bib file with the populated abstract and keyword slug.
62+
63+
## Controlled vocabulary
64+
65+
Source of truth: `.claude/rules/docs/bib-topic-tags.md`. Kept in sync with the header comment of both bib files and the `VOCAB` set in `scripts/audit.py`. Expanding the vocabulary is a four-file edit documented in the rule.
66+
67+
## Complementary skills
68+
69+
- `refs-checker` (user-level): verifies DOI-to-paper metadata via WoS/Crossref. Catches the "wrong DOI points to a plausible-sounding paper" failure mode that convention audit can't see.
70+
- `sync-docs` (project): checks doc-code content consistency.
71+
- `lint-code` (project): checks code style.
72+
73+
Run `refs-checker` for citation correctness, `curate-refs` for topic-tag convention and metadata backfill, `sync-docs` for doc-code consistency.
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
#!/usr/bin/env python3
2+
"""Audit SUEWS bib files for topic-tag convention compliance.
3+
4+
No network, no API key required. Every entry must carry a non-empty
5+
`keywords` field whose values are drawn from the controlled vocabulary
6+
defined below. Slugs must be lowercase, hyphen-separated, no spaces,
7+
no uppercase.
8+
9+
Usage:
10+
uv run --no-project --with requests python audit.py <bib-file> [<bib-file>...]
11+
12+
Exit codes:
13+
0 all convention checks passed (abstract warnings may appear)
14+
1 one or more convention violations found
15+
"""
16+
from __future__ import annotations
17+
18+
import argparse
19+
import re
20+
import sys
21+
from pathlib import Path
22+
23+
VOCAB: set[str] = {
24+
"energy-balance",
25+
"water-balance",
26+
"storage-heat",
27+
"radiation",
28+
"anthropogenic-heat",
29+
"carbon-flux",
30+
"building-energy",
31+
"model-infrastructure",
32+
}
33+
34+
SLUG_RE = re.compile(r"^[a-z][a-z0-9]*(-[a-z0-9]+)*$")
35+
ENTRY_START = re.compile(r"^@[A-Za-z]+\{([^,\s]+)\s*,", re.MULTILINE)
36+
37+
REQUIRED_FIELDS = ("title", "author", "year", "doi")
38+
39+
40+
def _line_number(text: str, offset: int) -> int:
41+
return text.count("\n", 0, offset) + 1
42+
43+
44+
def _match_field(body: str, name: str) -> tuple[int, int, str] | None:
45+
pattern = re.compile(rf"(^|\n)\s*{name}\s*=\s*\{{", re.IGNORECASE)
46+
m = pattern.search(body)
47+
if not m:
48+
return None
49+
open_brace = m.end() - 1
50+
depth = 1
51+
i = open_brace + 1
52+
while i < len(body) and depth > 0:
53+
c = body[i]
54+
if c == "{":
55+
depth += 1
56+
elif c == "}":
57+
depth -= 1
58+
i += 1
59+
if depth != 0:
60+
return None
61+
return m.start() + len(m.group(1)), i - 1, body[open_brace + 1:i - 1]
62+
63+
64+
def extract_field(body: str, name: str) -> str | None:
65+
match = _match_field(body, name)
66+
return match[2] if match else None
67+
68+
69+
def parse_slugs(raw: str) -> list[str]:
70+
return [s.strip() for s in raw.split(",") if s.strip()]
71+
72+
73+
def audit_entry(entry: dict, file_path: str, all_keys: dict[str, str],
74+
violations: list[str], warnings: list[str]) -> None:
75+
key = entry["key"]
76+
body = entry["body"]
77+
line = entry["line"]
78+
prefix = f"{file_path}:{line} [{key}]"
79+
80+
# Duplicate citation key check (across all files)
81+
if key in all_keys and all_keys[key] != f"{file_path}:{line}":
82+
violations.append(f"{prefix}: duplicate citation key (also at {all_keys[key]})")
83+
all_keys[key] = f"{file_path}:{line}"
84+
85+
# keywords field
86+
keywords_raw = extract_field(body, "keywords")
87+
if keywords_raw is None:
88+
violations.append(f"{prefix}: missing `keywords` field")
89+
else:
90+
slugs = parse_slugs(keywords_raw)
91+
if not slugs:
92+
violations.append(f"{prefix}: `keywords` field is empty")
93+
for slug in slugs:
94+
if not SLUG_RE.match(slug):
95+
violations.append(
96+
f"{prefix}: invalid slug format `{slug}` "
97+
"(lowercase, hyphen-separated, no spaces)"
98+
)
99+
elif slug not in VOCAB:
100+
violations.append(
101+
f"{prefix}: slug `{slug}` not in controlled vocabulary "
102+
f"(allowed: {', '.join(sorted(VOCAB))})"
103+
)
104+
105+
# Required fields
106+
for field in REQUIRED_FIELDS:
107+
val = extract_field(body, field)
108+
if val is None or not val.strip():
109+
violations.append(f"{prefix}: missing or empty `{field}`")
110+
111+
# Abstract (warning only — collaborators without WoS access can still pass)
112+
abstract = extract_field(body, "abstract")
113+
if abstract is None or not abstract.strip():
114+
warnings.append(f"{prefix}: missing `abstract` (run `/curate-refs --enrich` if you have WoS/Crossref access)")
115+
116+
117+
def find_entries(text: str) -> list[dict]:
118+
starts = list(ENTRY_START.finditer(text))
119+
entries = []
120+
for i, m in enumerate(starts):
121+
start = m.start()
122+
end = starts[i + 1].start() if i + 1 < len(starts) else len(text)
123+
entries.append({
124+
"key": m.group(1),
125+
"start": start,
126+
"end": end,
127+
"body": text[start:end],
128+
"line": _line_number(text, start),
129+
})
130+
return entries
131+
132+
133+
def audit_file(path: Path, all_keys: dict[str, str]) -> tuple[int, list[str], list[str]]:
134+
text = path.read_text(encoding="utf-8")
135+
entries = find_entries(text)
136+
violations: list[str] = []
137+
warnings: list[str] = []
138+
for entry in entries:
139+
audit_entry(entry, str(path), all_keys, violations, warnings)
140+
return len(entries), violations, warnings
141+
142+
143+
def main() -> int:
144+
ap = argparse.ArgumentParser(description=__doc__.splitlines()[0] if __doc__ else "")
145+
ap.add_argument("paths", nargs="+", help="Bib files to audit")
146+
ap.add_argument("--quiet", action="store_true",
147+
help="Suppress per-file summary (only show violations/warnings/total)")
148+
args = ap.parse_args()
149+
150+
all_keys: dict[str, str] = {}
151+
all_violations: list[str] = []
152+
all_warnings: list[str] = []
153+
total_entries = 0
154+
155+
for p in args.paths:
156+
path = Path(p)
157+
if not path.exists():
158+
print(f"[error] {p} not found", file=sys.stderr)
159+
return 1
160+
n, violations, warnings = audit_file(path, all_keys)
161+
total_entries += n
162+
all_violations.extend(violations)
163+
all_warnings.extend(warnings)
164+
if not args.quiet:
165+
print(f" {p}: {n} entries, {len(violations)} violations, {len(warnings)} warnings")
166+
167+
if all_warnings:
168+
print("\n=== warnings ===")
169+
for w in all_warnings:
170+
print(f" {w}")
171+
172+
if all_violations:
173+
print("\n=== violations ===")
174+
for v in all_violations:
175+
print(f" {v}")
176+
print(f"\n[FAIL] {len(all_violations)} violation(s) across {total_entries} entries")
177+
return 1
178+
179+
print(f"\n[OK] {total_entries} entries pass convention audit"
180+
f" ({len(all_warnings)} warning(s))")
181+
return 0
182+
183+
184+
if __name__ == "__main__":
185+
sys.exit(main())

0 commit comments

Comments
 (0)