🧠 Added Retro Funding governance dataset (optimism.json) with extracted summaries (post1.txt, rpgf.txt); enriched AI and Data Culture notes (Software 2.0 verification insight, improvement framework, process-over-goals guidance); improved Public Goods Funding readability by bolding key criteria and linking Impact Evaluators

davidgasquez · davidgasquez · commit d4944a888f85 · 2025-11-19T07:02:35.000-03:00
diff --git a/Artificial Intelligence Models.md b/Artificial Intelligence Models.md
@@ -7,6 +7,7 @@
 - Learning to prompt is similar to learning to search in a search engine (you have to develop a sense of how and what to search for).
 - AI tools amplify existing expertise. The more skills and experience you have on a topic, the faster and better the results you can get from working with LLMs on that topic.
 - [LLMs are useful when exploiting the asymmetry between coming up with an answer and verifying the answer](https://vitalik.eth.limo/general/2025/02/28/aihumans.html) (similar to how a sudoku is difficult to solve, but it's easy to verify that a solution is correct).
+  - [Software 2.0 automates what we can verify](https://x.com/karpathy/status/1990116666194456651). If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well.
 - [LLMs are good at the things that computers are bad at, and bad at the things that computers are good at](https://www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem). Also good at things that don't have wrong answers.
 - Context is king. Managing the context window effectively is crucial for getting good results.
   - Add websites as context with [jina.ai](https://jina.ai/) or [pure.md](https://pure.md/)
diff --git a/Data Culture.md b/Data Culture.md
@@ -54,12 +54,16 @@
   - On the other hand, data analysis and data science are domain level problems and cannot be centralized.
 - Create a single space to [[Data Practices|share the results of analysis and decisions made based on them]].
   - Log changes so everyone can jump in and be aware of what's going on.
-  - Log assumptions and lessons learned somewhere. This information should loop back into the data product.
+  - [Log assumptions and lessons learned somewhere](https://commoncog.com/no-learning-dont-close-loops/). This information should loop back into the data product.
 - Make the warehouse the source of truth for all the teams.
   - What data is Finance/HR/Marketing using to set their OKRs? Put that on the warehouse and model it.
   - [[Metrics]] should be derived from the most realistic data sources. E.g: using internal databases instead of product tracking for "Users Created".
 - Do you want better data? Hire people interested in data!
   - Having managers tell the data team to "Find Insights" is a telltale mark of bad data management and organizational structure.
+- The method to improve _anything_ over time:
+  1. What are you trying to accomplish?
+  2. How will you know a change is an improvement?
+  3. What changes can you make that might lead to improvement?
 - Good use of data is, ultimately, a question of good epistemology. ("Is this true? What can we conclude? How do we know that?") Good epistemology is hard. It must be taught.
 - **When things are going well, no one cares about data**. The right time to present data is when things are starting to go bad. Use your early warning detection systems to understand when it looks like it's gonna be time for data to step in and save the day and then position data as a solution in the context of whatever meaning makes sense. The stakeholders are decision makers and they don't have a ton of time. They're looking to make decisions, they're looking to solve problems.
 - [So much of data work is about accumulating little bits of knowledge and building a shared context in your org so that it's possible to have the big, earth shattering revelations we all wish we could drive on a predictable schedule](https://twitter.com/imightbemary/status/1536368160961572864).
@@ -153,6 +157,7 @@
 - Progress in data isn't linear. As a research discipline, you might spend hours making no progress and then have a breakthrough. Or worse, prove your entire approach won't work.
 - [Apply a research mindset to data](https://jxnl.co/writing/2024/10/25/running-effective-ai-standups). Focus on input metrics, build scientific intuition, and embrace uncertainty.
   - [How can science – loosely, the production of facts – do more to "steer" the outcomes of these processes?](https://jscaseddon.co/2024/02/science-for-steering-vs-for-decision-making/)
+  - You don't hit a quantitative goal by focusing on the goal. You hit a quantitative goal by focusing on the process. Focus on finding the controllable input metrics and drive that.
 - Data is not superior or inferior to intuition or qualitative sensemaking; it is a third sense for operators. Effective decision-making uses all three: intuition, qualitative sensemaking, and data. [Data is just an added sense](https://commoncog.com/data-is-an-added-sense/). Treat data as a tool for building and verifying intuition, not as a replacement for it. Over-reliance on any single sense—data, intuition, or qualitative feedback—limits understanding.
 - Underlying most of the problems around data, there is the question: [how do we represent reality with data, without flattening it](https://denniseirorere.com/posts/graph-the-true-data-and-reality/)?
 
diff --git a/Public Goods Funding.md b/Public Goods Funding.md
@@ -11,16 +11,16 @@ Public goods are defined as goods that are both nonexcludable (it's infeasible t
 
 ## Desirable Criteria
 
-- Pareto Efficiency. The outcome achieved by the mechanism maximizes the overall welfare or some other desirable objective function.
-- Incentive Compatibility. Designing mechanisms so that participants are motivated to act truthfully, without gaining by misrepresenting their preferences.
-- Individual Rationality. Ensuring that every participant has a non-negative utility (or at least no worse off) by participating in the mechanism.
-- Budget Balance. The mechanism generates sufficient revenue to cover its costs or payouts, without running a net deficit.
-- Coalition-Proofness. Preventing groups of participants from conspiring to manipulate the mechanism to their advantage.
-- Provable Participation. Even if spending should be kept private, users may want to prove their participation in a funding mechanism in order to boost their reputation or as part of an agreement.
-- Identity and Reputation. To prevent sybil attacks, some form of identity is needed. If reputation is important, a public identity is preferred. If anonymity is required, zero-knowledge proofs or re-randomizable encryption may be necessary. Reputation is an important incentive to fund public goods. Some form of reputation score or record of participation can be useful for repeated games. These scores can help identify bad actors or help communities coalesce around a particular funding venue. [Identity-free mechanism can also be used](https://victorsintnicolaas.com/funding-public-goods-in-identity-free-systems/).
-- Verifiable Mechanisms. Users may want certain guarantees about a mechanism before or after participation, especially if the mechanism being used is concealed. Ex-ante, they may want to upper-bound their amount of spending towards the good, ex-post, they may require proof that a sufficient number of individuals contributed.
-- Anti-Collusion Infrastructure. Like secure voting systems, there is a threat of buying votes in a funding mechanism. Collusion can be discouraged by making it impossible for users to prove how they reported their preferences. This infrastructure must be extended to prevent collusion between the 3rd party and the users.
-- Predictable Schedules. Participants need to know when are they getting funded.
+- **Pareto Efficiency**. The outcome achieved by the mechanism maximizes the overall welfare or some other desirable objective function.
+- **Incentive Compatibility**. Designing mechanisms so that participants are motivated to act truthfully, without gaining by misrepresenting their preferences.
+- **Individual Rationality**. Ensuring that every participant has a non-negative utility (or at least no worse off) by participating in the mechanism.
+- **Budget Balance**. The mechanism generates sufficient revenue to cover its costs or payouts, without running a net deficit.
+- **Coalition-Proofness**. Preventing groups of participants from conspiring to manipulate the mechanism to their advantage.
+- **Provable Participation**. Even if spending should be kept private, users may want to prove their participation in a funding mechanism in order to boost their reputation or as part of an agreement.
+- **Identity and Reputation**. To prevent sybil attacks, some form of identity is needed. If reputation is important, a public identity is preferred. If anonymity is required, zero-knowledge proofs or re-randomizable encryption may be necessary. Reputation is an important incentive to fund public goods. Some form of reputation score or record of participation can be useful for repeated games. These scores can help identify bad actors or help communities coalesce around a particular funding venue. [Identity-free mechanism can also be used](https://victorsintnicolaas.com/funding-public-goods-in-identity-free-systems/).
+- **Verifiable Mechanisms**. Users may want certain guarantees about a mechanism before or after participation, especially if the mechanism being used is concealed. Ex-ante, they may want to upper-bound their amount of spending towards the good, ex-post, they may require proof that a sufficient number of individuals contributed.
+- **Anti-Collusion Infrastructure**. Like secure voting systems, there is a threat of buying votes in a funding mechanism. Collusion can be discouraged by making it impossible for users to prove how they reported their preferences. This infrastructure must be extended to prevent collusion between the 3rd party and the users.
+- **Predictable Schedules**. Participants need to know when are they getting funded.
 
 ## Resources
 
@@ -29,3 +29,4 @@ Public goods are defined as goods that are both nonexcludable (it's infeasible t
 - [List of Public Goods Funding Mechanisms](https://harsimony.wordpress.com/2022/02/10/list-of-public-goods-funding-mechanisms/)
 - [Funding public goods using the Nash product rule](https://victorsintnicolaas.com/funding-public-goods-using-the-nash-product-rule/)
 - [[Deep Funding]]
+- [[Impact Evaluators]]
diff --git a/optimism.json b/optimism.json
diff --git a/post1.txt b/post1.txt
@@ -0,0 +1,50 @@
+tl;dr-
+In 2023, Optimism ran two megarounds. In 2024, Optimism ran one round per domain per year. We’ve learned that mega rounds devolve into popularity contests and annual feedback loops are too slow. In 2025, Optimism should focus on fewer domains, iterate more rapidly, and refine what works.
+In 2023, Optimism had everyone vote on everything. In 2024, Optimism ran experiments around expertise-based voting and metrics-based voting. We’ve learned what humans are good at - and what data is good at. In 2025, Optimism should take the best of both (and set aside the rest).
+In both 2023 and 2024, we struggled at measuring the success of Retro Funding itself. The time for not knowing how well this works is over.
+Full post…
+This was supposed to be simple
+As Vitalik wrote when initially describing the mechanism: “The core principle behind the concept of retroactive public goods funding is simple: it’s easier to agree on what was useful than what will be useful.”
+Now, nearly two years into this experiment, we should be able to look back at the mechanism of Retro Funding itself and apply this same analysis.
+Specifically, we want to understand:
+Can people actually agree on what was useful?
+Under what circumstances does retroactive funding produce superior outcomes?
+Simple, right? Not entirely.
+My team at OSO has been trying to help Optimism answer some of these questions since Round 2 . In this post, we’ll look at a very high level at what’s happened across five rounds of Retro Funding (Rounds 2-5) and share some of our observations.
+The good news is there are some clear learnings about what people are good at agreeing on and how to create the right conditions for consensus. These learnings are the result of deliberate experiments that were undertaken in 2024, for instance, experimenting with metrics-based voting in Round 4 and expertise-based voting in Round 5.
+The not-so-good news is that we don’t (yet) have the data to show that retroactive funding produces superior outcomes.
+In our view, solving the measurement problem is the most important thing to get right in 2025. We need to build an engine for measuring the impact of each allocation cycle—one that gives us more than a vague sense that it’s working and one that gives the collective more than a single round (per domain) per year to see what’s working.
+Our recommendations include:
+Continuing to move away from project-based voting
+Finding the right balance of metrics and experts’ subjective assessments
+Focusing on a relatively narrow set of domains but with more rapid feedback cycles
+Basically, if 2024 was all about experimenting on expertise-based voting and metrics-based voting, then 2025 should be about combining the best of both. Once we discover the optimal combinations, we can expand the scope and complexity of rounds.
+What humans are good at (and what data is good at)
+Optimism has now completed five rounds that use badgeholder-based voting to allocate tokens.
+Each round had different design parameters and these have allowed us to learn different things about what humans and data are good at.
+Here’s a quick summary:
+Round Design Parameters Key Lessons Learned Round 2 Projects had to be nominated by a badgeholder in order to make the round. Badgeholders had to determine outright token allocations for each of the 200 projects with minimal tooling. Project nomination process was awkward and time-consuming for badgeholders. Badgeholders needed more structure in the voting process; everyone came up with their own scattershot method (eg, sharing spreadsheets to rate their favorite projects). Round 3 Any project could sign-up for one or more domains and self-report their metrics. Badgeholders would do a light review to filter out spam, but left everything else intact. Projects had to get at least 17 votes in order to receive rewards. Over 600 projects ended up getting approved. Badgeholders still had to determine outright token allocations for each of the 600+ projects. Badgeholders could also create “lists” recommending projects and outright allocations. Voters felt overwhelmed. It was very difficult to differentiate between weak projects and good ones with little reputation; borderline projects led campaigns to reach the quorum line. The domain categories were not strictly enforced and thus mostly useless to badgeholders. Impact metrics were not comparable. Every list was different and there was no quality control. Onchain projects received a smaller share of the token pool than most badgeholders felt they deserved. Round 4 Focused only on onchain builders with strict eligibility requirements (based on onchain activity). Badgeholders were provided metrics instead of projects to vote on. Over 200 projects were approved out of 400+ applicants. Projects could receive a multiplier if they were open source. Badgeholders found voting much easier. Results had a steep power law distribution, but voters generally felt the top projects were rewarded fairly. Metrics alone couldn’t capture quality, momentum, and other nuances, highlighting the need for more complex evaluation signals. Metrics did not work as well for certain “cutting-edge” sub-domains (eg, 4337 related apps). The open source multiplier was too complex to enforce consistently. Round 5 Focused only on OP Stack contributions with strict eligibility requirements (enforced by a small review team). Returned to project-based voting, grouping voters by expertise and assigning each to a single category of 20-30 projects, rather than hundreds. Results were very flat, and voters felt the top projects were not rewarded sufficiently. Perverse incentive to divide work across multiple smaller contributions / teams in order to receive more tokens. Grouping by expertise revealed significant differences between experts and non-experts both in project selection and allocation strategy. After seeing both allocations, voters preferred experts’ selections.
+In Round 6, which is currently underway, Optimism is experimenting with impact attestations and a more aggressive set of distribution options for badgeholders.
+Clearly, the optimal Retro Funding design hasn’t been found yet. But we do see some recurring themes:
+Humans are good at relative comparisons (what’s more valuable)
+Humans are bad at outright comparisons (how much more valuable)
+Data is good at providing comprehensive coverage of things that are countable
+Data is bad at dealing with nuances and qualitative concepts that experts intuitively understand
+We’ve also learned that people only reveal their true opinions after seeing the result. This follows basic product theory: you need to show people something and iterate based on their reactions in order to build something they actually want.
+Metrics-driven, experts in the loop
+These hard-earned lessons inform our recommendation for how Optimism approaches future round designs. The goal should be to combine the best parts of what humans and data are good at.
+Here is the basic framework we propose:
+We use metrics-based evaluations to propose initial token allocations within a domain.
+We let subject matter experts review, fine-tune the metrics, and choose the best allocations.
+Over time, we identify which metrics best align with experts’ qualitative assessments, refining models through consistent backtesting.
+This approach leverages data’s systematic reach with human intuition’s nuanced adjustments. Metrics establish a quantitative foundation, ensuring that projects are assessed objectively and fairly, while expert review adds layers of qualitative nuance, including quality, innovation, and momentum. An iterative feedback loop lets us adjust metrics based on expert insights—particularly valuable when experts consistently revise scores or highlight lower-scoring projects. This process is similar to RLHF (Reinforcement Learning with Human Feedback) in machine learning, but with an emphasis on retaining clear, interpretable inputs for expert adjustments.
+Practically, we implement this by establishing metrics within a domain, generating proposals for initial allocations (by weighting metrics into an evaluation algorithm), and refining the allocations with expert input. Such an approach should work best in domains with lots of verifiable data, e.g., onchain builders and software dependencies.
+This framework should also perform well in fast-evolving domains. Experts can adjust allocations to reflect Optimism’s shifting priorities (e.g., prioritizing interoperability transactions over standard transactions), address data blind spots (e.g., fine-tuning metrics for 4337-related projects), and reward innovation (e.g., favoring fast-growing, high-potential projects over more established but static ones).
+One essential element is running funding rounds even more frequently and continuously. Doing so lets us pinpoint metrics most correlated with desirable outcomes and backtest these metrics and evaluation algorithms against historical data. Each round remains an experiment, but the cumulative impact across rounds should reveal a clear, positive trend over time.
+Hard Choices → Easy Life
+The end goal remains super ambitious: to develop predictive models that guide economic policy for the collective. For instance, in 2025, we may want to discover that incentivizing certain behaviors predictably leads to increased interoper transaction volume.
+Reaching this goal won’t be easy. It requires focus, as the outcomes are highly path-dependent.
+Currently, hard choices need to be made around domain scopes. Any changes may be unpopular, especially among community members accustomed to existing Retro Funding patterns. However, narrowing the scope and committing to continuous improvement within scopes are essential steps in reaching the top of the mountain.
+image 800×661 27 KB
+Looking back, we’ll likely see some initial assumptions were overly optimistic or naive. But we can’t improve by continuing to “spray and pray.” Governance is ultimately about making hard choices with limited resources.
+Optimism has spent two important years learning, but it’s time to double-down on what works in 2025.
diff --git a/rpgf.txt b/rpgf.txt