feat: re-add Agora as recommended pipeline, fix Polis repness bugs by nicobao · Pull Request #99 · polis-community/red-dwarf

nicobao · 2025-08-20T15:38:57Z

Updated 2026-03-12: This PR expanded from Polis repness bug fixes into the full Agora pipeline implementation. See this comment for the full technical rationale.

Summary

Re-adds the Agora implementation as the recommended default pipeline, with principled statistical methods replacing Polis's ad-hoc heuristics. Also fixes several bugs in the Polis repness selection.

Polis bug fixes (in `select_representative_statements`)

Fix repful_for calculation to use correct test statistics (pat/pdt)
Remove buggy best-agree/best-of-agrees heuristic (was using disagree data for agree statements)
Fix format_comment_stats to output correct direction fields
Filter out zero-vote statements from significance testing

Agora implementation (new)

rank_representative_statements() — ranks ALL statements per group by effect size, with BH FDR selection using Simes' p-value combination
rank_consensus_statements() — ranks ALL statements by pa/pd with BH selection
compute_effective_agreement_gac() — prod(pa*(1-pd))^(1/n) penalizes divided groups
apply_bh_with_vote_filter() — shared BH helper excluding zero-vote statements
AgoraClusteringResult with ranked_repness, ranked_consensus, effective GAC

Documentation

README: Agora as recommended default, Polis vs Agora comparison table, roadmap
API reference: Agora functions and types, fix base section bug
CHANGELOG: all additions documented
agora-demo.ipynb notebook as recommended quickstart
Cram snapshot tests with Polis vs Agora selection comparison

Closes #73
Supersedes #105

Original PR description (August 2025)

Fixed:

representative opinion used to be formatted wrongly (using repful_for=disagree data instead of agree for example), especially when it comes to the "best-agree"
the way to select representative opinions and then select them for formatting was different, so it was leading to errors.
we're now sorting representative opinions by repness-test before using pick-max so we're sure we have the best ones

TODO:

best-agree support was temporarily removed for now, as the implementation was flawed
instead of the previous implementation we should select the best "agree" of the existing selected representative opinions after pick_max filter, if any, and then update the column to add "best-agree: true". There might be not best-agree sometimes when tehre is no "agree" representative opinions, and it's expected according to my experience I think sometimes there is no best-agree at all (it also makes sense in general I think, but I may be wrong)
add unit tests!

nicobao · 2025-08-20T15:42:49Z

Test fails because best-agree was removed

nicobao · 2025-08-25T23:20:36Z

Hi @patcon, any feedback, so we can merge?

nicobao · 2025-09-05T22:10:49Z

Hey @patcon, could you tell me how to go through sufficient_statements to add the following:

best_agree.update({"n-agree": best_agree["n-success"], "best-agree": True})

to the one statement among all the sufficient_statements which have maximum repness_test among those with repful_for=agree (if any)?

And if we use best_overall (sufficient_statements is empty) then do the same for best_overall.

(I am a noob when it comes to working with tf data objects)

(As I said earlier, it's possible we don't have any best-agree at all with that method if we only have disagree representative opinions, but that seems fine to me?)

nicobao · 2025-09-23T19:26:25Z

I need your help @patcon to understand why there are so many test errors

…erall

…idence

- Remove unused 'stat' import - Simplify significance checks by removing redundant vote count validations - Streamline repful_for calculation by removing nested conditionals - Lower minimum confidence threshold from 0.7 to 0.6 for statement selection - Improve confidence selection to prefer exact pick_max matches over near-misses - Remove best-agree flag assignment logic

The fallback path in select_representative_statements() was returning a raw pandas Series instead of a properly formatted PolisRepnessStatement dict. This caused downstream consumers (e.g. Zod schema validation) to reject the entire math update when any cluster triggered the fallback, because the output had raw column names (na, nd, statement_id) instead of the expected keys (tid, n-success, repful-for). The fallback now: - Formats through format_comment_stats() like the non-fallback path - Handles best_overall=None by producing an empty list instead of [None]

The group-aware consensus score used raw p_agree per group in the geometric mean, completely ignoring p_disagree. This meant a group genuinely divided (similar levels of agree and disagree) contributed the same score as an undivided group with the same agree level, allowing divided groups to be masked by other groups' strong agreement. Replace raw p_agree with "effective agreement": p_agree * (1 - p_disagree). This discounts each group's agreement by its disagreement, so a divided group naturally drags down the consensus score while still producing a continuous ranking. Also document all divergences from the original Polis algorithm: - Geometric mean normalization (existing) - Effective agreement (new) - Progressive confidence lowering for representative statements (existing) - Significance-based repful_for determination (existing)

## Why this change The Polis repness algorithm has several issues that motivated both targeted fixes to the Polis implementation and a new Agora implementation that takes a fundamentally different approach. ### Problems with Polis repness selection The stock Polis `select_representative_statements()` uses a cascade of ad-hoc heuristics: `beats_best_of_agrees()`, then `beats_best_by_repness_test()`, then `beats_best_of_agrees()` again, capped by `pick_max=5`. This approach: - Has no statistical foundation — `pick_max=5` is arbitrary and doesn't adapt to data. A conversation with 10 statements and one with 1000 get the same cutoff. - The `best-agree` heuristic was buggy — it used disagree data for agree statements due to a repful_for calculation error. - The three-regime cascade makes it impossible to produce a single ranked list of all statements, which is what library users need to build their own UIs and selection logic. ### Polis bug fixes (in select_representative_statements) These fix the existing Polis implementation directly: - Fix repful_for calculation to use correct test statistics (pat/pdt) - Remove buggy best-agree/best-of-agrees heuristic - Simplify selection cascade - Fix format_comment_stats to use correct direction fields - Filter out statements with no agree or disagree votes ### Why we keep both implementations The Polis implementation stays (with the above fixes) because it serves as a **reference baseline** — users need to verify results match stock Polis behavior. Agora is a separate implementation that shares the same core pipeline (PCA + KMeans) but replaces selection and consensus with principled statistics. ### Why Agora ranks ALL statements then selects Instead of a black-box that returns 5 statements, Agora returns every statement ranked by effect size with a `selected` flag from Benjamini-Hochberg. This lets library consumers: - Show the full ranking in a UI - Apply their own selection criteria - Understand why a statement was or wasn't selected (via adjusted p-values) ### Why Simes' p-value combination (not max, not Fisher) Each statement has two test statistics: a probability test (is this group's agreement rate significantly above 50%?) and a representativeness test (does this group agree more than others?). We need to combine them into one p-value for BH. - `max(p_prob, p_rep)` (intersection test) — our first attempt. Too conservative: requires BOTH tests independently significant. With m=42 hypotheses and BH fdr=0.10, rank 1 threshold is only 0.0024 (z>=2.81). Result: 0/0/1 statements selected across 3 groups. - Fisher's method — assumes independence between tests. But our tests are positively dependent (groups that agree more show higher probability AND higher representativeness). Too lenient: 22/0/17. - Simes' combination `min(2*p_min, p_max)` — valid under positive dependence (Sarkar & Chang 1997). This is exactly our case. Result: 4/0/8 (vs Polis 5/1/5 — comparable but data-driven). ### Why effective agreement GAC Polis GAC uses `prod(pa)^(1/n)` — geometric mean of agreement rates. A group where 80% agree AND 60% disagree scores the same as one where 80% agree and 10% disagree. Agora uses `prod(pa*(1-pd))^(1/n)` to discount agreement by disagreement, penalizing divided groups. ### Why zero-vote filtering (in both Polis and Agora) Statements with na=0 and nd=0 get p-values from Laplace smoothing noise. In Polis, they're now filtered by `is_statement_significant()`. In Agora, they're excluded from BH hypothesis count via `apply_bh_with_vote_filter()` to avoid inflating m, but still returned in the ranking with adjusted_p_value=1.0, selected=False. Closes polis-community#73

nicobao · 2026-03-12T15:47:34Z

Major update: Re-add Agora implementation as recommended pipeline

This PR has grown beyond the original repness bug fixes. It now includes a full Agora implementation that addresses the fundamental limitations of Polis statement selection, plus documentation positioning Agora as the recommended default.

Why we moved beyond the existing Polis algorithm

The original work on this branch fixed several Polis repness bugs (repful_for using wrong test statistics, best-agree heuristic using disagree data for agree statements, format_comment_stats outputting wrong fields). But as we dug deeper, the problems weren't just bugs — they were structural limitations of the heuristic approach:

pick_max=5 is arbitrary. A conversation with 10 statements and one with 1000 get the same cutoff. There's no statistical basis for "5".
The three-regime cascade (beats_best_of_agrees → beats_best_by_repness_test → beats_best_of_agrees) is opaque. It's impossible to extract a single ranked list from it. Library users who want to build their own UIs or apply custom selection criteria are stuck with a black box that returns 5 statements.
The GAC formula prod(pa)^(1/n) ignores disagreement. A group split 80% agree / 60% disagree gets the same consensus score as 80% agree / 10% disagree. Divided groups should score lower.

Why keep both implementations?

Polis stays as-is (with targeted bug fixes) because it's the reference baseline. We want users, builders and researchers to be able to verify their results against stock Polis output, as it is the de-facto standard. And compare with the newer approaches.

Agora is separate and shares the same core pipeline (PCA + KMeans) but replaces statement selection and consensus scoring with principled statistics. Also applies but fixes. More to come.

The journey to Simes' combination

Agora uses Benjamini-Hochberg FDR control instead of pick_max. Each statement has two test statistics (probability test + representativeness test) that need combining into one p-value. We explored three approaches:

Method	Assumption	Result (3 groups)	Verdict
`max(p_prob, p_rep)` (intersection)	None	0/0/1 selected	Too conservative — requires BOTH tests independently significant
Fisher's `χ² = -2Σln(p)`	Independence	22/0/17 selected	Too lenient — our tests aren't independent
Simes' `min(2·p_min, p_max)`	Positive dependence	4/0/8 selected	Just right — valid for our case

The key insight: probability and representativeness tests are positively dependent (groups that agree more show higher probability AND higher representativeness). Simes' method is proven valid under positive dependence (Sarkar & Chang 1997), making it the theoretically correct choice.

For comparison, Polis selects 5/1/5 on the same data — Agora's 4/0/8 is in the same ballpark but adapts to the data rather than using a fixed cap.

What's in this commit

Polis fixes (in select_representative_statements):

Fix repful_for to use correct test statistics (pat/pdt)
Remove buggy best-agree/best-of-agrees heuristic
Fix format_comment_stats to use correct direction fields
Filter out zero-vote statements from significance testing

Agora implementation (new):

rank_representative_statements() — ranks ALL statements by effect size, BH FDR selection with Simes' p-value combination
rank_consensus_statements() — ranks ALL statements by pa/pd, BH FDR selection
compute_effective_agreement_gac() — prod(pa*(1-pd))^(1/n) penalizes divided groups
apply_bh_with_vote_filter() — shared BH helper excluding zero-vote statements
AgoraClusteringResult dataclass with ranked_repness, ranked_consensus, effective GAC

Documentation:

README updated: Agora as recommended default, Polis vs Agora comparison table
API reference: Agora functions and types documented
CHANGELOG: full list of additions
agora-demo.ipynb notebook as recommended quickstart
Cram snapshot tests for Agora pipeline output with Polis comparison

Closes #73

nicobao changed the title ~~fix: representative opinions selection and stat formatting~~ fix: representative opinions selection and comment stats formatting Aug 20, 2025

nicobao mentioned this pull request Sep 8, 2025

feat(stats): allow users to rank all opinions by representativeness for each group #105

Closed

nicobao force-pushed the fix-repful-for branch from a06b6d8 to c67cb6e Compare September 23, 2025 19:43

nicobao added 22 commits February 28, 2026 23:19

fix: repful_for was wrongly attributed and the wrong fields were filled

d8d602f

fix: use absolute for comparing rat/rdt as it can be negative

fe7d961

fix: try to compare just probabilities

8cbc8ab

fix: try comparing prob without abs

92b333d

fix: testing always agree

305418c

fix: removing format_comment_stats unused argument

a7fa93e

fix: attempt to understand values

90c058d

fix: unify repful_for calc and get rid of bugged best-agree for now

0df1b79

fix: use pat and pdt correctly

0de609c

fix: always pick representative opinons up to pick_max, best first

7235da7

fix: typo in accessing repness-test

8839ac6

fix(repness): lower confidence instead of directly relying on best_ov…

19e5782

…erall

feat: try to add representative opinions till pick_max & max 0.6 conf…

d7b6b29

…idence

fix: remove unecessary sort

175d00d

fix(repness): actually choose all repness for a given confidence

f536da4

feat(repness): improve algorithm to find the best repness

6f7fb62

fix: add back best-agree

bdda19f

feat: what if both are significant

e1e2eeb

feat: improve alg

714edc9

fix: don't take "rep" statemnts with no agree or disagree

bfc2332

fix: syntax error

cd8d2e0

nicobao force-pushed the fix-repful-for branch from e6fd28d to cd8d2e0 Compare February 28, 2026 22:20

nicobao added 3 commits March 3, 2026 23:06

nicobao mentioned this pull request Mar 12, 2026

feat: Agora pipeline — rank all statements with BH selection, fix Polis repness bugs #73

Closed

nicobao changed the title ~~fix: representative opinions selection and comment stats formatting~~ feat: re-add Agora as recommended pipeline, fix Polis repness bugs Mar 12, 2026

nicobao merged commit 8bd5881 into polis-community:main Mar 12, 2026
8 checks passed

nicobao deleted the fix-repful-for branch March 12, 2026 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: re-add Agora as recommended pipeline, fix Polis repness bugs#99

feat: re-add Agora as recommended pipeline, fix Polis repness bugs#99
nicobao merged 25 commits into
polis-community:mainfrom
nicobao:fix-repful-for

nicobao commented Aug 20, 2025 •

edited

Loading

Uh oh!

nicobao commented Aug 20, 2025

Uh oh!

nicobao commented Aug 25, 2025

Uh oh!

nicobao commented Sep 5, 2025 •

edited

Loading

Uh oh!

nicobao commented Sep 23, 2025

Uh oh!

nicobao commented Mar 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicobao commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Polis bug fixes (in select_representative_statements)

Agora implementation (new)

Documentation

Uh oh!

nicobao commented Aug 20, 2025

Uh oh!

nicobao commented Aug 25, 2025

Uh oh!

nicobao commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicobao commented Sep 23, 2025

Uh oh!

nicobao commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Major update: Re-add Agora implementation as recommended pipeline

Why we moved beyond the existing Polis algorithm

Why keep both implementations?

The journey to Simes' combination

What's in this commit

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nicobao commented Aug 20, 2025 •

edited

Loading

Polis bug fixes (in `select_representative_statements`)

nicobao commented Sep 5, 2025 •

edited

Loading

nicobao commented Mar 12, 2026 •

edited

Loading