UMEP-dev
diff --git a/‎.claude/rules/docs/bib-topic-tags.md‎
Lines changed: 65 additions & 0 deletions b/‎.claude/rules/docs/bib-topic-tags.md‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎.claude/skills/curate-refs/SKILL.md‎
Lines changed: 73 additions & 0 deletions b/‎.claude/skills/curate-refs/SKILL.md‎
Lines changed: 73 additions & 0 deletions
diff --git a/‎.claude/skills/curate-refs/scripts/audit.py‎
Lines changed: 185 additions & 0 deletions b/‎.claude/skills/curate-refs/scripts/audit.py‎
Lines changed: 185 additions & 0 deletions
@@ -0,0 +1,65 @@
+# Bibliography topic tags
+
+Rules for the `keywords` field on entries in `docs/source/assets/refs/refs-SUEWS.bib` and `docs/source/assets/refs/refs-community.bib`.
+
+These bib files drive `docs/source/related_publications.rst` and `docs/source/community_publications.rst`, which render per-topic subsections with stable `.. _pub-<slug>:` anchors for external deep-linking. The curator at `.claude/skills/curate-refs/` enforces this convention and backfills missing metadata.
+
+---
+
+## Every entry MUST carry at least one topic slug
+
+```bibtex
+@article{KEY,
+   ...
+   keywords = {energy-balance, water-balance},
+}
+```
+
+Slug format:
+
+- Lowercase only.
+- Hyphen-separated (`energy-balance`, not `energy_balance` or `EnergyBalance`).
+- No spaces, no uppercase.
+- Drawn from the controlled vocabulary below. Do not invent new slugs without updating the vocabulary.
+
+## Controlled vocabulary
+
+- `energy-balance` — surface energy balance partitioning, flux schemes (Q*, QE, QH).
+- `water-balance` — urban hydrology: evapotranspiration, snow, irrigation, runoff, densification impacts.
+- `storage-heat` — delta-QS parameterisation lineage (OHM, AnOHM).
+- `radiation` — net all-wave radiation (NARP), SOLWEIG, mean radiant temperature, aerosol radiative effects.
+- `anthropogenic-heat` — QF modelling (LUCY, GreaterQF), building/traffic/metabolism emissions.
+- `carbon-flux` — urban CO₂ exchange, biogenic vs anthropogenic sources, tree sequestration.
+- `building-energy` — urban meteorology for building energy simulations (vertical profiles, uTMY).
+- `model-infrastructure` — SUEWS and SuPy code, coupling with atmospheric models (WRF, CBL), reanalysis forcing workflows.
+
+Keep this list in sync with the header comment of both bib files and the vocabulary list in `scripts/audit.py`.
+
+## Multi-topic policy
+
+Papers can (and should) carry multiple slugs when they substantively contribute across themes. They appear in every relevant topic section on the docs pages; the "All publications" section de-duplicates. Target average is ~1.5–2 tags per paper; don't stretch to include themes the paper only mentions in passing.
+
+## Expanding the vocabulary
+
+When adding a new slug:
+
+1. Update the vocabulary list above.
+2. Update the header comment of both bib files.
+3. Update `VOCAB` in `.claude/skills/curate-refs/scripts/audit.py`.
+4. Add a new topic section in `docs/source/related_publications.rst` with a `.. _pub-<slug>:` anchor and filtered bibliography directive.
+5. Rerun `/curate-refs` to confirm all entries still pass.
+
+## Programmatic enforcement
+
+Run before committing any bib change:
+
+```
+/curate-refs
+```
+
+The skill documentation at `.claude/skills/curate-refs/SKILL.md` covers:
+
+- Base audit (no network, no API key required) — convention check only.
+- `/curate-refs --enrich` — optionally fetch missing `abstract` fields via WoS/Crossref cascade (requires `WOS_EXPANDED_API_KEY` or `WOS_API_KEY`; `--crossref-only` fallback for collaborators without a WoS key).
+
+The existing user-level `refs-checker` skill handles DOI-to-metadata verification against Crossref/WoS — complementary and different purpose.
@@ -0,0 +1,73 @@
+---
+name: curate-refs
+description: Curate SUEWS bib files — enforce topic-tag convention and backfill missing metadata. Base mode checks every entry for a valid topic slug from the controlled vocabulary; optional --enrich mode fetches missing abstracts via WoS/Crossref. Use before committing any change to docs/source/assets/refs/refs-SUEWS.bib or refs-community.bib, or whenever asked to "curate refs", "check bib tags", "verify topic slugs". Complementary to the user-level refs-checker skill which verifies DOI-to-metadata correctness.
+---
+
+# curate-refs
+
+Reference-library curator for SUEWS publication bib files. Enforces the topic-tag convention (every entry carries a valid slug from the controlled vocabulary defined in `.claude/rules/docs/bib-topic-tags.md`) and backfills missing metadata. Pairs with `docs/source/related_publications.rst` which renders per-topic subsections via `sphinxcontrib-bibtex`'s `:filter:` directive on the `keywords` field.
+
+## When to run
+
+- Before committing any change to `refs-SUEWS.bib` or `refs-community.bib`.
+- After adding a new bib entry or expanding the vocabulary.
+- Whenever the user asks to "curate refs", "check bib tags", "verify topic slugs".
+
+## Scope
+
+- This skill checks **convention compliance**: every entry has a `keywords` field, every slug is in the approved vocabulary, slug format is lowercase-hyphen, no duplicate citation keys, required fields present, abstracts populated (informational, not a failure).
+- It does **not** verify DOI-to-paper correctness — use the user-level `refs-checker` skill for that (`/Users/tingsun/.claude/scripts/bib_audit.py`).
+
+## Base invocation (no network, no API key)
+
+```bash
+uv run --no-project --with requests python .claude/skills/curate-refs/scripts/audit.py \
+    docs/source/assets/refs/refs-SUEWS.bib \
+    docs/source/assets/refs/refs-community.bib
+```
+
+Exit code is non-zero if any convention violation is found. Missing abstracts are reported as warnings only.
+
+## Enrichment (WoS/Crossref)
+
+Populate missing `abstract` fields in place. Idempotent — entries already carrying a non-empty abstract are skipped.
+
+```bash
+# With Ting's WoS key (set WOS_EXPANDED_API_KEY or WOS_API_KEY in env)
+uv run --no-project --with requests python .claude/skills/curate-refs/scripts/enrich.py \
+    docs/source/assets/refs/refs-SUEWS.bib \
+    docs/source/assets/refs/refs-community.bib
+
+# Without a WoS key (for collaborators)
+uv run --no-project --with requests python .claude/skills/curate-refs/scripts/enrich.py \
+    docs/source/assets/refs/refs-SUEWS.bib \
+    docs/source/assets/refs/refs-community.bib \
+    --crossref-only
+```
+
+Cascade: WoS Expanded → WoS Starter → Crossref → OpenAlex. Flags:
+
+- `--crossref-only` — skip WoS (for collaborators without an API key).
+- `--dry-run` — report sources without modifying files.
+- `--delay SECONDS` — pause between API calls (default 0.3).
+
+If no key is set and `--crossref-only` is absent, the script prints a one-line warning and still runs using Crossref + OpenAlex.
+
+## Typical workflow
+
+1. Add a new bib entry (with `keywords` populated per the vocabulary).
+2. Run the base audit to catch slug typos or missing fields.
+3. If the new entry lacks an abstract, run the enrichment pass.
+4. Commit the bib file with the populated abstract and keyword slug.
+
+## Controlled vocabulary
+
+Source of truth: `.claude/rules/docs/bib-topic-tags.md`. Kept in sync with the header comment of both bib files and the `VOCAB` set in `scripts/audit.py`. Expanding the vocabulary is a four-file edit documented in the rule.
+
+## Complementary skills
+
+- `refs-checker` (user-level): verifies DOI-to-paper metadata via WoS/Crossref. Catches the "wrong DOI points to a plausible-sounding paper" failure mode that convention audit can't see.
+- `sync-docs` (project): checks doc-code content consistency.
+- `lint-code` (project): checks code style.
+
+Run `refs-checker` for citation correctness, `curate-refs` for topic-tag convention and metadata backfill, `sync-docs` for doc-code consistency.
@@ -0,0 +1,185 @@
+#!/usr/bin/env python3
+"""Audit SUEWS bib files for topic-tag convention compliance.
+
+No network, no API key required. Every entry must carry a non-empty
+`keywords` field whose values are drawn from the controlled vocabulary
+defined below. Slugs must be lowercase, hyphen-separated, no spaces,
+no uppercase.
+
+Usage:
+    uv run --no-project --with requests python audit.py <bib-file> [<bib-file>...]
+
+Exit codes:
+    0   all convention checks passed (abstract warnings may appear)
+    1   one or more convention violations found
+"""
+from __future__ import annotations
+
+import argparse
+import re
+import sys
+from pathlib import Path
+
+VOCAB: set[str] = {
+    "energy-balance",
+    "water-balance",
+    "storage-heat",
+    "radiation",
+    "anthropogenic-heat",
+    "carbon-flux",
+    "building-energy",
+    "model-infrastructure",
+}
+
+SLUG_RE = re.compile(r"^[a-z][a-z0-9]*(-[a-z0-9]+)*$")
+ENTRY_START = re.compile(r"^@[A-Za-z]+\{([^,\s]+)\s*,", re.MULTILINE)
+
+REQUIRED_FIELDS = ("title", "author", "year", "doi")
+
+
+def _line_number(text: str, offset: int) -> int:
+    return text.count("\n", 0, offset) + 1
+
+
+def _match_field(body: str, name: str) -> tuple[int, int, str] | None:
+    pattern = re.compile(rf"(^|\n)\s*{name}\s*=\s*\{{", re.IGNORECASE)
+    m = pattern.search(body)
+    if not m:
+        return None
+    open_brace = m.end() - 1
+    depth = 1
+    i = open_brace + 1
+    while i < len(body) and depth > 0:
+        c = body[i]
+        if c == "{":
+            depth += 1
+        elif c == "}":
+            depth -= 1
+        i += 1
+    if depth != 0:
+        return None
+    return m.start() + len(m.group(1)), i - 1, body[open_brace + 1:i - 1]
+
+
+def extract_field(body: str, name: str) -> str | None:
+    match = _match_field(body, name)
+    return match[2] if match else None
+
+
+def parse_slugs(raw: str) -> list[str]:
+    return [s.strip() for s in raw.split(",") if s.strip()]
+
+
+def audit_entry(entry: dict, file_path: str, all_keys: dict[str, str],
+                violations: list[str], warnings: list[str]) -> None:
+    key = entry["key"]
+    body = entry["body"]
+    line = entry["line"]
+    prefix = f"{file_path}:{line} [{key}]"
+
+    # Duplicate citation key check (across all files)
+    if key in all_keys and all_keys[key] != f"{file_path}:{line}":
+        violations.append(f"{prefix}: duplicate citation key (also at {all_keys[key]})")
+    all_keys[key] = f"{file_path}:{line}"
+
+    # keywords field
+    keywords_raw = extract_field(body, "keywords")
+    if keywords_raw is None:
+        violations.append(f"{prefix}: missing `keywords` field")
+    else:
+        slugs = parse_slugs(keywords_raw)
+        if not slugs:
+            violations.append(f"{prefix}: `keywords` field is empty")
+        for slug in slugs:
+            if not SLUG_RE.match(slug):
+                violations.append(
+                    f"{prefix}: invalid slug format `{slug}` "
+                    "(lowercase, hyphen-separated, no spaces)"
+                )
+            elif slug not in VOCAB:
+                violations.append(
+                    f"{prefix}: slug `{slug}` not in controlled vocabulary "
+                    f"(allowed: {', '.join(sorted(VOCAB))})"
+                )
+
+    # Required fields
+    for field in REQUIRED_FIELDS:
+        val = extract_field(body, field)
+        if val is None or not val.strip():
+            violations.append(f"{prefix}: missing or empty `{field}`")
+
+    # Abstract (warning only — collaborators without WoS access can still pass)
+    abstract = extract_field(body, "abstract")
+    if abstract is None or not abstract.strip():
+        warnings.append(f"{prefix}: missing `abstract` (run `/curate-refs --enrich` if you have WoS/Crossref access)")
+
+
+def find_entries(text: str) -> list[dict]:
+    starts = list(ENTRY_START.finditer(text))
+    entries = []
+    for i, m in enumerate(starts):
+        start = m.start()
+        end = starts[i + 1].start() if i + 1 < len(starts) else len(text)
+        entries.append({
+            "key": m.group(1),
+            "start": start,
+            "end": end,
+            "body": text[start:end],
+            "line": _line_number(text, start),
+        })
+    return entries
+
+
+def audit_file(path: Path, all_keys: dict[str, str]) -> tuple[int, list[str], list[str]]:
+    text = path.read_text(encoding="utf-8")
+    entries = find_entries(text)
+    violations: list[str] = []
+    warnings: list[str] = []
+    for entry in entries:
+        audit_entry(entry, str(path), all_keys, violations, warnings)
+    return len(entries), violations, warnings
+
+
+def main() -> int:
+    ap = argparse.ArgumentParser(description=__doc__.splitlines()[0] if __doc__ else "")
+    ap.add_argument("paths", nargs="+", help="Bib files to audit")
+    ap.add_argument("--quiet", action="store_true",
+                    help="Suppress per-file summary (only show violations/warnings/total)")
+    args = ap.parse_args()
+
+    all_keys: dict[str, str] = {}
+    all_violations: list[str] = []
+    all_warnings: list[str] = []
+    total_entries = 0
+
+    for p in args.paths:
+        path = Path(p)
+        if not path.exists():
+            print(f"[error] {p} not found", file=sys.stderr)
+            return 1
+        n, violations, warnings = audit_file(path, all_keys)
+        total_entries += n
+        all_violations.extend(violations)
+        all_warnings.extend(warnings)
+        if not args.quiet:
+            print(f"  {p}: {n} entries, {len(violations)} violations, {len(warnings)} warnings")
+
+    if all_warnings:
+        print("\n=== warnings ===")
+        for w in all_warnings:
+            print(f"  {w}")
+
+    if all_violations:
+        print("\n=== violations ===")
+        for v in all_violations:
+            print(f"  {v}")
+        print(f"\n[FAIL] {len(all_violations)} violation(s) across {total_entries} entries")
+        return 1
+
+    print(f"\n[OK] {total_entries} entries pass convention audit"
+          f" ({len(all_warnings)} warning(s))")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())