You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Deep Funding.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -106,12 +106,13 @@ Once the competition ends, extra comparisons could be gathered for projects that
106
106
- There are better and more modern methods to derive weights from [noisy pairwise comparisons](https://arxiv.org/abs/2510.09333) ([from multiple annotators](https://arxiv.org/abs/1612.04413))
107
107
-[Detect and correct for evaluators' bias in the task of ranking items from pairwise comparisons](https://link.springer.com/article/10.1007/s10618-024-01024-z)
108
108
- Use active ranking or dueling bandits to [speed up the data gathering process](https://projecteuclid.org/journals/annals-of-statistics/volume-47/issue-6/Active-ranking-from-pairwise-comparisons-and-when-parametric-assumptions-do/10.1214/18-AOS1772.pdf)
109
+
- Stop with a "budget stability" rule (expected absolute dollar change from one more batch is less than a threshold)
109
110
- Do some post processing to the weights:
110
111
- Report accuracy/Brier and use paired bootstrap to see if gap is statistically meaningful
111
112
- If gaps are not statistically meaningful, bucket rewards (using Zipf's law) so it feels fair
112
113
- If anyone (or jury selection is more relaxed) can rate you can remove low quality raters with heuristics or pick only the best N raters (crowd BT)
113
114
- To gather more comparisons, a top-k method could be used instead of pairwise. Show 6 projects. Ask for the top 3 (no need to order them).
114
-
- How would things look like if they were Bayesian instead of [classic Bradley-Terry](https://gwern.net/resorter)? Since comparisons are noisy and we have unreliable jurors, can we [compute distributions instead of "skills"](https://github.com/max-niederman/fullrank)?
115
+
- How would things look like if they were [Bayesian Bradley Terry](https://erichorvitz.com/crowd_pairwise.pdf) instead of [classic Bradley-Terry](https://gwern.net/resorter)? Since comparisons are noisy and we have unreliable jurors, can we [compute distributions instead of "skills"](https://github.com/max-niederman/fullrank)?
115
116
- Let the dependent set their weight percentage if they're around
116
117
- Instead of one canonical graph, allow different stakeholder groups (developers, funders, users) to maintain their own weight overlays on the same edge structure. Aggregate these views using quadratic or other mechanisms
117
118
- If there is a plurality of these "dependency graphs" (or just different set of weights), the funding organization can choose which one to use! The curators gain a % of the money for their service. This creates a market-like mechanism that incentivizes useful curation.
@@ -143,3 +144,4 @@ Once the competition ends, extra comparisons could be gathered for projects that
143
144
- Self declaration needs a "contest process" to resolve issues/abuse.
144
145
- Harberger Tax on self declarations? Bayesian Truth Serum for Weight Elicitation?
145
146
- Projects continuously auction off "maintenance contracts" where funders bid on keeping projects maintained. The auction mechanism reveals willingness-to-pay for continued operation. Dependencies naturally emerge as projects that lose maintenance see their dependents bid up their contracts
147
+
-[Explore Rank Centrality](https://arxiv.org/pdf/1209.1688). Theoretical and empirical results show that with a graph that has a decent spectral gap `O(n log(π))` pair samples suffice for accurate scores and ranking.
Copy file name to clipboardExpand all lines: Impact Evaluators.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,9 +65,11 @@ It's hard to do [[Public Goods Funding]], open-source software, research, etc. t
65
65
- Let funders choose which lenses align with their values.
66
66
- When collecting data, [pairwise comparisons and rankings are more reliable than absolute scoring](https://anishathalye.com/designing-a-better-judging-system/).
67
67
- Humans excel at relative judgments, but struggle with absolute judgments.
68
-
- Many algorithms can be used to convert pairwise comparisons into absolute scores.
68
+
-[Many algorithms can be used to convert pairwise comparisons into absolute scores](https://crowd-kit.readthedocs.io/en/latest/).
69
69
- Pairwise shines when all the context is in the UX.
70
70
-[Data is good at providing comprehensive coverage of things that are countable. Data is bad at dealing with nuances and qualitative concepts that experts intuitively understand.](https://gov.optimism.io/t/lessons-learned-from-two-years-of-retroactive-public-goods-funding/9239)
71
+
- Crowds bring natural diversity and help capture human semantics. [Disagreement is signal, not just noise](https://github.com/CrowdTruth/CrowdTruth-core/blob/master/tutorial/Part%20I_%20CrowdTruth%20Tutorial.pdf). There are niches of experts in the crowds.
72
+
- Collecting good pairwise data [is similar to collecting good ML/AI training data](https://github.com/cleanlab/cleanlab).
71
73
-**Design for composability**. Define clear data structures (graphs, weight vectors) as APIs between layers.
72
74
- Multiple communities could share measurement infrastructure.
73
75
- Different evaluation methods can operate on the same data.
0 commit comments