hawkrobe
diff --git a/‎CHANGELOG.md‎
Lines changed: 25 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎Linglib.lean‎
Lines changed: 0 additions & 2 deletions b/‎Linglib.lean‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎Linglib/Core/Agent/RationalAction.lean‎
Lines changed: 6 additions & 66 deletions b/‎Linglib/Core/Agent/RationalAction.lean‎
Lines changed: 6 additions & 66 deletions
diff --git a/‎Linglib/Core/InformationTheory.lean‎
Lines changed: 183 additions & 0 deletions b/‎Linglib/Core/InformationTheory.lean‎
Lines changed: 183 additions & 0 deletions
diff --git a/‎Linglib/Phenomena/Imprecision/Studies/EgreEtAl2023.lean‎
Lines changed: 2 additions & 2 deletions b/‎Linglib/Phenomena/Imprecision/Studies/EgreEtAl2023.lean‎
Lines changed: 2 additions & 2 deletions
@@ -4,6 +4,31 @@ The release clock (`v4.29.1`, ...) tracks Lean/mathlib compatibility and is what
 
 ## [Unreleased]
 
+## [0.230.537] - 2026-04-29
+
+### RSA/Divergence.lean dissolved into Core/InformationTheory.lean
+
+Follow-up to 0.230.536: the slimmed Hellinger-only `Theories/Pragmatics/RSA/Divergence.lean` had no RSA-specific content (Bhattacharyya, Hellinger, and the Bretagnolle–Huber sorry are info-theory primitives), and its `two_hellingerDistSq_le_klFinite` theorem references `Core.klFinite` — which forced `klFinite` itself to migrate too (RationalAction already imported InformationTheory, so going the other way would cycle).
+
+- **Moved KL machinery from `Core/Agent/RationalAction.lean` to `Core/InformationTheory.lean`.** `klFinite`, `kl_eq_sum_klFun`, `kl_nonneg`, `kl_nonneg'`, `klFinite_eq_negEntropy_sub_crossEntropy`, `klFinite_pi_single_eq_neg_log`, `expected_log_eq_neg_klFinite_plus_negEntropy` all now live in `namespace Core.InformationTheory`. RationalAction's Gibbs-variational and entropy sections continue to use them via `open Core.InformationTheory (klFinite kl_nonneg kl_nonneg' kl_eq_sum_klFun)`.
+- **Moved Hellinger family from `Theories/Pragmatics/RSA/Divergence.lean` to `Core/InformationTheory.lean`.** `bhattacharyyaCoeff`, `hellingerDistSq`, `hellingerDist`, `hellingerDistSq_nonneg_of_bc_le_one`, `two_hellingerDistSq_le_klFinite` (Bretagnolle–Huber, sorry+TODO).
+- **Deleted `Theories/Pragmatics/RSA/Divergence.lean`** and its import line in `Linglib.lean`.
+- **Consumer renames.** TTG: `Core.klFinite` → `Core.InformationTheory.klFinite`. EgreEtAl: docstring ref. HerbstrittFranke2019: `import RSA.Divergence` → `import Core.InformationTheory`; `open RSA.Divergence` → `open Core.InformationTheory`.
+- **Architectural payoff.** Single home for finite-distribution divergences alongside entropy / MI / JSD / ΔP. RationalAction.lean is back to being about agents; KL is info theory. Linglib's `Core.InformationTheory` namespace now mirrors mathlib's `Mathlib.InformationTheory` shape (entropy + KL + Hellinger as siblings).
+- **Build.** 5629 jobs green (one less than before — Divergence.lean is gone). Only sorry: the Bretagnolle–Huber TODO.
+
+## [0.230.536] - 2026-04-29
+
+### RSA/Divergence.lean mathlib-shape refactor (4-agent audit driven)
+
+- **Consolidated triple KL stipulation.** Three independent finite-KL definitions existed: `RSA.Divergence.klDivergence` (no zero-guard), `Core.klFinite` (with `if p i = 0 then 0` guard, mathlib-routed via `klFun`), `ChannelCapacity.gibbs_inequality` (third KL spelling). Deleted the local `klDivergence`; the single canonical finite-KL is now `Core.klFinite` in `Core/Agent/RationalAction.lean`, which already bridges to mathlib's `InformationTheory.klFun`.
+- **Promoted three RSA-relevant theorems to `Core.klFinite`.** `klFinite_eq_negEntropy_sub_crossEntropy` (cross-entropy decomposition; relaxed hypothesis `∀ i, 0 < q i` → absolute continuity `∀ i, p i ≠ 0 → q i ≠ 0`), `klFinite_pi_single_eq_neg_log` (point-mass KL = negative log = surprisal), `expected_log_eq_neg_klFinite_plus_negEntropy` (Frank-Goodman ↔ Goodman-Stuhlmüller derivation, citing Scontras-Tessler-Franke ProbLang v2 App-02).
+- **`RSA/Divergence.lean` reduced to Hellinger-only (276 → 109 LOC).** Bhattacharyya, `hellingerDistSq`, `hellingerDist`, plus `hellingerDistSq_nonneg_of_bc_le_one` and the new `two_hellingerDistSq_le_klFinite` (Bretagnolle–Huber, **sorry+TODO** with proof sketch). The §5 prose claim "H² ≤ KL" is now a Lean theorem statement, making the Hellinger-speaker-permissiveness over the KL-speaker a proved corollary.
+- **Dropped low-value defs.** `negHellingerDist` (no algebraic content; inlined as `-hellingerDist` at HerbstrittFranke2019:366) and `pointMass` (uses `Pi.single` instead).
+- **Fixed historical attribution error.** Module table no longer lists Frank-Goodman 2012 as a KL user (F&G 2012 is a 1-page Science paper presenting only surprisal; the KL framing is retrospective).
+- **Bibliography.** Added missing `@cite{herbstritt-franke-2019}` (Cognition 186, DOI `10.1016/j.cognition.2018.11.014`).
+- **Consumer updates.** TesslerTenenbaumGoodman2022 + EgreEtAl2023 docstring references switched from `RSA.Divergence.klDivergence` / `kl_eq_neg_crossEntropy_plus_negEntropy` to `Core.klFinite` / `klFinite_eq_negEntropy_sub_crossEntropy`. HerbstrittFranke2019 switched to `-hellingerDist` inline. `lake build`: 5630 jobs green; only sorry added is the new §4 Bretagnolle–Huber TODO.
+
 ## [0.230.535] - 2026-04-29
 
 ### Phenomena/X/Typology.lean dissolution campaign + Complementation audit + substrate sweep
 
@@ -21,7 +21,6 @@ import Linglib.Core.UD
 import Linglib.Core.Tree
 import Linglib.Features.Coordination
 import Linglib.Core.Logic.Duality
-import Linglib.Core.Logic.NonBivalence
 import Linglib.Core.Logic.Quantification
 import Linglib.Core.Logic.Quantification.Defs
 import Linglib.Core.Logic.Quantification.Generators
@@ -306,7 +305,6 @@ import Linglib.Core.Probability.DirichletMultinomial
 import Linglib.Core.Probability.PolyaUrn
 import Linglib.Core.Probability.PitmanYor
 import Linglib.Core.Probability.Gaussian
-import Linglib.Theories.Pragmatics.RSA.Divergence
 import Linglib.Theories.Pragmatics.InformationTheory.Channel
 import Linglib.Theories.Pragmatics.InformationTheory.ChannelCapacity
 import Linglib.Theories.Pragmatics.AsymmetricCommunication
 
@@ -846,16 +846,15 @@ theorem RationalAction.fromSoftmax_policy_eq [Nonempty A]
   have hne : ∑ j : A, exp (α * utility s j) ≠ 0 := ne_of_gt hpos
   simp only [hne, ↓reduceIte]
 
--- ============================================================================
--- §3. KL Divergence and Gibbs Variational Principle
--- ============================================================================
-
 /-!
-## KL Divergence and the Gibbs Variational Principle
+## Gibbs Variational Principle
 
 The softmax distribution uniquely maximizes entropy + expected score
 on the probability simplex. This is the mathematical foundation for
-RSA convergence (@cite{zaslavsky-hu-levy-2020}, Proposition 1).
+RSA convergence (@cite{zaslavsky-hu-levy-2020}, Proposition 1). The KL
+machinery used in the proof — `klFinite`, `kl_nonneg`, `kl_nonneg'`,
+`kl_eq_sum_klFun`, and the cross-entropy decomposition — lives in
+`Core.InformationTheory` and is opened below.
 
 ### Proof strategy
 
@@ -869,66 +868,7 @@ Combining: H(p) + α⟨p,s⟩ + KL = log Z = H(q) + α⟨q,s⟩, so KL ≥ 0 ⟹
 
 -/
 
-section KLDivergence
-
-variable {ι : Type*} [Fintype ι]
-
-/-- Finite KL divergence: KL(p ‖ q) = Σ pᵢ · log(pᵢ / qᵢ).
-    Convention: 0 · log(0/q) = 0. -/
-noncomputable def klFinite (p q : ι → ℝ) : ℝ :=
-  ∑ i, if p i = 0 then 0 else p i * Real.log (p i / q i)
-
-/-- When q > 0, each KL term can be written via klFun:
-    p · log(p/q) = q · klFun(p/q) + (p - q). -/
-private theorem kl_term_eq_klFun {p_i q_i : ℝ} (hq : 0 < q_i) (_hp : 0 ≤ p_i) :
-    (if p_i = 0 then (0 : ℝ) else p_i * log (p_i / q_i)) =
-    q_i * InformationTheory.klFun (p_i / q_i) + (p_i - q_i) := by
-  by_cases hp0 : p_i = 0
-  · simp only [hp0, ↓reduceIte, zero_div, InformationTheory.klFun_zero, mul_one, zero_sub,
-               add_neg_cancel]
-  · simp only [hp0, ↓reduceIte]
-    unfold InformationTheory.klFun
-    have hq_ne : q_i ≠ 0 := ne_of_gt hq
-    field_simp
-    ring
-
-/-- Finite KL divergence equals Σ qᵢ · klFun(pᵢ/qᵢ) when Σpᵢ = Σqᵢ. -/
-theorem kl_eq_sum_klFun (p q : ι → ℝ) (hq : ∀ i, 0 < q i) (hp : ∀ i, 0 ≤ p i)
-    (hsum : ∑ i, p i = ∑ i, q i) :
-    klFinite p q = ∑ i, q i * InformationTheory.klFun (p i / q i) := by
-  unfold klFinite
-  have hterm : ∀ i, (if p i = 0 then (0 : ℝ) else p i * log (p i / q i)) =
-      q i * InformationTheory.klFun (p i / q i) + (p i - q i) :=
-    λ i => kl_term_eq_klFun (hq i) (hp i)
-  simp_rw [hterm]
-  rw [Finset.sum_add_distrib]
-  have hcancel : ∑ i, (p i - q i) = 0 := by
-    rw [Finset.sum_sub_distrib, hsum, sub_self]
-  linarith
-
-/-- **Gibbs' inequality (finite form)**: KL(p ‖ q) ≥ 0.
-
-    For distributions p, q with qᵢ > 0, pᵢ ≥ 0, and Σpᵢ = Σqᵢ:
-      Σᵢ pᵢ · log(pᵢ/qᵢ) ≥ 0
-
-    Proof via Mathlib's `klFun_nonneg`: klFun(x) ≥ 0 for x ≥ 0. -/
-theorem kl_nonneg (p q : ι → ℝ) (hq : ∀ i, 0 < q i) (hp : ∀ i, 0 ≤ p i)
-    (hsum : ∑ i, p i = ∑ i, q i) :
-    0 ≤ klFinite p q := by
-  rw [kl_eq_sum_klFun p q hq hp hsum]
-  apply Finset.sum_nonneg
-  intro i _
-  apply mul_nonneg (le_of_lt (hq i))
-  exact InformationTheory.klFun_nonneg (div_nonneg (hp i) (le_of_lt (hq i)))
-
-/-- Alternative KL non-negativity with distribution hypotheses. -/
-theorem kl_nonneg' [Nonempty ι] {p q : ι → ℝ}
-    (hp_nonneg : ∀ i, 0 ≤ p i) (hq_pos : ∀ i, 0 < q i)
-    (hp_sum : ∑ i, p i = 1) (hq_sum : ∑ i, q i = 1) :
-    0 ≤ klFinite p q :=
-  kl_nonneg p q hq_pos hp_nonneg (by rw [hp_sum, hq_sum])
-
-end KLDivergence
+open Core.InformationTheory (klFinite kl_nonneg kl_nonneg' kl_eq_sum_klFun)
 
 -- ============================================================================
 -- §3a. Gibbs Variational Principle
 
@@ -1,7 +1,9 @@
 import Mathlib.Algebra.Order.Ring.Rat
 import Mathlib.Analysis.SpecialFunctions.Log.NegMulLog
+import Mathlib.Analysis.SpecialFunctions.Sqrt
 import Mathlib.Algebra.BigOperators.Field
 import Mathlib.Data.Fintype.BigOperators
+import Mathlib.InformationTheory.KullbackLeibler.KLFun
 
 /-!
 # Information-Theoretic Primitives
@@ -39,6 +41,8 @@ return `(Finset α, α → ℝ)` directly).
 - `jsdOf s p q`: Jensen-Shannon divergence
 - `deltaP`: ΔP directional association measure (ℚ-valued, no log)
 - `deltaPCounts`: ΔP from a 2×2 contingency table (ℚ-valued, no log)
+- `klFinite p q`: discrete KL divergence `Σᵢ pᵢ · log(pᵢ/qᵢ)` (with `0·log(0/q)=0` guard)
+- `bhattacharyyaCoeff`, `hellingerDistSq`, `hellingerDist`: Hellinger family
 -/
 
 namespace Core.InformationTheory
@@ -176,4 +180,183 @@ theorem deltaP_eq_zero_of_independent (pX pY : ℚ)
     rw [mul_sub, mul_one, mul_comm pY pX]]
   rw [mul_div_cancel_right₀ pY hne1, sub_self]
 
+/-! ## Kullback–Leibler divergence (finite, ℝ-valued)
+
+Discrete-finite specialization of mathlib's `InformationTheory.klDiv`,
+routed through mathlib's `klFun (x) = x · log x + 1 − x`. -/
+
+section KLDivergence
+
+variable {ι : Type*} [Fintype ι]
+
+/-- Finite KL divergence: `KL(p ‖ q) = Σᵢ pᵢ · log(pᵢ / qᵢ)`.
+    Convention: `0 · log(0/q) = 0` (via the `if`-guard). -/
+noncomputable def klFinite (p q : ι → ℝ) : ℝ :=
+  ∑ i, if p i = 0 then 0 else p i * Real.log (p i / q i)
+
+/-- When `q > 0`, each KL term can be written via `klFun`:
+    `p · log(p/q) = q · klFun(p/q) + (p − q)`. -/
+private theorem kl_term_eq_klFun {p_i q_i : ℝ} (hq : 0 < q_i) (_hp : 0 ≤ p_i) :
+    (if p_i = 0 then (0 : ℝ) else p_i * log (p_i / q_i)) =
+    q_i * _root_.InformationTheory.klFun (p_i / q_i) + (p_i - q_i) := by
+  by_cases hp0 : p_i = 0
+  · simp only [hp0, ↓reduceIte, zero_div, _root_.InformationTheory.klFun_zero, mul_one, zero_sub,
+               add_neg_cancel]
+  · simp only [hp0, ↓reduceIte]
+    unfold _root_.InformationTheory.klFun
+    have hq_ne : q_i ≠ 0 := ne_of_gt hq
+    field_simp
+    ring
+
+/-- Finite KL divergence equals `Σᵢ qᵢ · klFun(pᵢ/qᵢ)` when `Σpᵢ = Σqᵢ`. -/
+theorem kl_eq_sum_klFun (p q : ι → ℝ) (hq : ∀ i, 0 < q i) (hp : ∀ i, 0 ≤ p i)
+    (hsum : ∑ i, p i = ∑ i, q i) :
+    klFinite p q = ∑ i, q i * _root_.InformationTheory.klFun (p i / q i) := by
+  unfold klFinite
+  have hterm : ∀ i, (if p i = 0 then (0 : ℝ) else p i * log (p i / q i)) =
+      q i * _root_.InformationTheory.klFun (p i / q i) + (p i - q i) :=
+    λ i => kl_term_eq_klFun (hq i) (hp i)
+  simp_rw [hterm]
+  rw [Finset.sum_add_distrib]
+  have hcancel : ∑ i, (p i - q i) = 0 := by
+    rw [Finset.sum_sub_distrib, hsum, sub_self]
+  linarith
+
+/-- **Gibbs' inequality (finite form)**: `KL(p ‖ q) ≥ 0`.
+
+    For distributions `p, q` with `qᵢ > 0`, `pᵢ ≥ 0`, and `Σpᵢ = Σqᵢ`:
+    `Σᵢ pᵢ · log(pᵢ/qᵢ) ≥ 0`. Proof via mathlib's `klFun_nonneg`. -/
+theorem kl_nonneg (p q : ι → ℝ) (hq : ∀ i, 0 < q i) (hp : ∀ i, 0 ≤ p i)
+    (hsum : ∑ i, p i = ∑ i, q i) :
+    0 ≤ klFinite p q := by
+  rw [kl_eq_sum_klFun p q hq hp hsum]
+  apply Finset.sum_nonneg
+  intro i _
+  apply mul_nonneg (le_of_lt (hq i))
+  exact _root_.InformationTheory.klFun_nonneg (div_nonneg (hp i) (le_of_lt (hq i)))
+
+/-- Alternative KL non-negativity with distribution hypotheses. -/
+theorem kl_nonneg' [Nonempty ι] {p q : ι → ℝ}
+    (hp_nonneg : ∀ i, 0 ≤ p i) (hq_pos : ∀ i, 0 < q i)
+    (hp_sum : ∑ i, p i = 1) (hq_sum : ∑ i, q i = 1) :
+    0 ≤ klFinite p q :=
+  kl_nonneg p q hq_pos hp_nonneg (by rw [hp_sum, hq_sum])
+
+/-- Cross-entropy decomposition:
+    `KL(p ‖ q) = (Σ pᵢ log pᵢ) − (Σ pᵢ log qᵢ)`
+
+    The hypothesis is **absolute continuity** `p ≪ q`: wherever `p` puts
+    mass, `q` does too. Strictly weaker than `∀ i, 0 < q i`. -/
+theorem klFinite_eq_negEntropy_sub_crossEntropy (p q : ι → ℝ)
+    (hAC : ∀ i, p i ≠ 0 → q i ≠ 0) :
+    klFinite p q = (∑ i, p i * log (p i)) - (∑ i, p i * log (q i)) := by
+  unfold klFinite
+  rw [← Finset.sum_sub_distrib]
+  refine Finset.sum_congr rfl fun i _ => ?_
+  by_cases hP : p i = 0
+  · simp [hP]
+  · rw [if_neg hP, log_div hP (hAC i hP), mul_sub]
+
+/-- KL with a Dirac point mass reduces to negative log-probability (= surprisal):
+    `KL(δₛ ‖ Q) = −log Q(s)`.
+
+    Foundation of standard RSA speaker utility `U(u; s) = log L₀(s | u)`
+    (@cite{frank-goodman-2012}, @cite{goodman-stuhlmuller-2013}). -/
+theorem klFinite_pi_single_eq_neg_log [DecidableEq ι]
+    (s : ι) (q : ι → ℝ) (hQ : q s ≠ 0) :
+    klFinite (Pi.single s 1) q = -log (q s) := by
+  unfold klFinite
+  rw [Finset.sum_eq_single s]
+  · have h1 : Pi.single (M := fun _ => ℝ) s 1 s = 1 := Pi.single_eq_same s 1
+    rw [if_neg (by rw [h1]; exact one_ne_zero), h1, one_mul, one_div, log_inv]
+  · intro j _ hj
+    have h0 : Pi.single (M := fun _ => ℝ) s 1 j = 0 := Pi.single_eq_of_ne hj 1
+    rw [h0, if_pos rfl]
+  · intro h; exact (h (Finset.mem_univ s)).elim
+
+/-- Expected log-likelihood under uncertain beliefs equals negative KL plus
+    speaker entropy: `E_p[log q] = −KL(p ‖ q) + Σ p log p`.
+
+    Since `Σ p log p` is independent of `q`, softmax over utterances cancels
+    it, yielding the equivalence between Frank-Goodman surprisal `log L₀(s|u)`
+    and Goodman-Stuhlmüller belief-weighted utility. -/
+theorem expected_log_eq_neg_klFinite_plus_negEntropy (p q : ι → ℝ)
+    (hAC : ∀ i, p i ≠ 0 → q i ≠ 0) :
+    (∑ i, p i * log (q i)) = -klFinite p q + (∑ i, p i * log (p i)) := by
+  rw [klFinite_eq_negEntropy_sub_crossEntropy p q hAC]; ring
+
+end KLDivergence
+
+/-! ## Hellinger family (Bhattacharyya, squared-Hellinger, Hellinger distance)
+@cite{herbstritt-franke-2019}
+
+Finite-distribution Hellinger family used as an alternative speaker utility
+in RSA pragmatics: @cite{herbstritt-franke-2019} argue (footnote 8) that
+Hellinger distance is necessary for probability expressions because KL
+assigns infinite disutility to messages whose literal interpretation
+assigns zero probability to states the speaker considers possible.
+The §-Hellinger-vs-KL inequality `2 · H²(P, Q) ≤ KL(P ‖ Q)`
+(Bretagnolle–Huber, sorried) makes the Hellinger speaker's permissiveness
+over the KL speaker a proved corollary rather than a docstring claim. -/
+
+section Hellinger
+
+variable {ι : Type*} [Fintype ι]
+
+/-- Bhattacharyya coefficient: `BC(P, Q) = Σᵢ √(Pᵢ · Qᵢ)`.
+
+    For probability distributions `BC = 1 ↔ P = Q` and `BC = 0 ↔` disjoint
+    support. -/
+noncomputable def bhattacharyyaCoeff (P Q : ι → ℝ) : ℝ :=
+  ∑ i : ι, √(P i * Q i)
+
+/-- Squared Hellinger distance: `H²(P, Q) = 1 − BC(P, Q)`.
+
+    Ranges from 0 (identical distributions) to 1 (disjoint support).
+    Equivalent to the standard form `(1/2) Σᵢ (√Pᵢ − √Qᵢ)²` for
+    distributions summing to 1. -/
+noncomputable def hellingerDistSq (P Q : ι → ℝ) : ℝ :=
+  1 - bhattacharyyaCoeff P Q
+
+/-- Hellinger distance: `HD(P, Q) = √H²(P, Q)`.
+
+    Satisfies `0 ≤ HD ≤ 1` for probability distributions. Unlike KL,
+    Hellinger distance is always finite and is a proper metric (symmetric,
+    triangle inequality).
+
+    The @cite{herbstritt-franke-2019} speaker utility (their eq. 16) is
+    `EU(m, o, a) = −HD(P_belief, P_LL)`. -/
+noncomputable def hellingerDist (P Q : ι → ℝ) : ℝ :=
+  √(hellingerDistSq P Q)
+
+/-- Squared Hellinger distance is non-negative when `BC ≤ 1`.
+
+    For normalised distributions `Σ Pᵢ = Σ Qᵢ = 1`, Cauchy-Schwarz gives
+    `BC(P, Q) ≤ 1`, hence `H² ≥ 0`. -/
+theorem hellingerDistSq_nonneg_of_bc_le_one (P Q : ι → ℝ)
+    (h : bhattacharyyaCoeff P Q ≤ 1) :
+    0 ≤ hellingerDistSq P Q := by
+  unfold hellingerDistSq; linarith
+
+/-- **Bretagnolle–Huber inequality**: `2 · H²(P, Q) ≤ KL(P ‖ Q)`.
+
+    The standard sharp comparison between Hellinger and KL on probability
+    distributions. Combined with `H² ≥ 0`, yields `H²(P, Q) ≤ KL(P ‖ Q)`,
+    making the Hellinger speaker's choice set a **superset** of the KL
+    speaker's: any utterance the KL speaker can consider, the Hellinger
+    speaker can too — but not conversely.
+
+    **Proof sketch (TODO):** Pointwise `klFun(x) ≥ 2(√x − 1)²` (factor-of-2
+    convexity bound). Multiply by `qᵢ` and sum: `Σ qᵢ klFun(pᵢ/qᵢ) ≥
+    2 Σ (√pᵢ − √qᵢ)²`. The LHS equals `KL(p ‖ q)` via `kl_eq_sum_klFun`;
+    the RHS equals `2 · 2 · H²(p, q) = 4 H²` for normalised `p, q`, but only
+    `2 H²` is needed. Standard reference: Bretagnolle–Huber (1979). -/
+theorem two_hellingerDistSq_le_klFinite [Nonempty ι] (P Q : ι → ℝ)
+    (_hP_nonneg : ∀ i, 0 ≤ P i) (_hQ_pos : ∀ i, 0 < Q i)
+    (_hP_sum : ∑ i, P i = 1) (_hQ_sum : ∑ i, Q i = 1) :
+    2 * hellingerDistSq P Q ≤ klFinite P Q := by
+  sorry
+
+end Hellinger
+
 end Core.InformationTheory
@@ -486,7 +486,7 @@ def aroundWeight : Value → Nat
 
 Weight 2 at center, 1 at ±1, 0 elsewhere. Unnormalized weights preserve
 S1 ranking because exp is monotone and the normalization constant is
-independent of u (see `RSA.Divergence.expected_loglik_eq_neg_kl_plus_entropy`). -/
+independent of u (see `Core.InformationTheory.expected_log_eq_neg_klFinite_plus_negEntropy`). -/
 noncomputable def speakerBeliefR (observed w : Value) : ℝ :=
   let d := if observed.toNat ≥ w.toNat then observed.toNat - w.toNat
             else w.toNat - observed.toNat
@@ -501,7 +501,7 @@ triangular "around" posterior after normalization, matching `birWeight`.
 **S1** = KL speaker: the speaker with peaked beliefs chooses the message
 whose L0 posterior best matches those beliefs, measured by expected
 log-likelihood (= negative KL divergence up to constant entropy,
-see `RSA.Divergence.expected_loglik_eq_neg_kl_plus_entropy`). -/
+see `Core.InformationTheory.expected_log_eq_neg_klFinite_plus_negEntropy`). -/
 noncomputable def cfg : RSAConfig Utt Value where
   Latent := Unit
   meaning _ _ u w := match u with