Skip to content

Commit c2ad33b

Browse files
committed
fix: standardize tags
1 parent 7d17e4d commit c2ad33b

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

src/blog/onosaid.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,24 @@
22
name: On the OSAID Definition
33
excerpt: Some reflections on the Open Source AI definition
44
date: 2024-11-05
5-
tags: open source, responsible AI
5+
tags: responsible-ai
66
author: Tarunima
77
---
88

99
I'll start with some belated thoughts on the [Open Source AI definition](https://opensource.org/ai/open-source-ai-definition) that was released last week. At IndiaFOSS in September this year, I spoke about the OSAID and urged people to give feedback on it. I was concerned that people weren't following a very important development closely enough. A version of the talk was [published on the OSI blog](https://opensource.org/blog/data-transparency-in-open-source-ai-protecting-sensitive-datasets) that sparked some discussion on the [OSI forum](https://discuss.opensource.org/t/data-transparency-in-open-source-ai-protecting-sensitive-datasets/588 "Data Transparency in Open Source AI: Protecting Sensitive Datasets"), but this wasn't a conversation that I wanted to casually dip into. I am sharing here, some more considered reflections.
1010

11-
Some background- Tattle's work is open source and has always had a machine learning (AI) component to it. While the open source AI rhetoric reached a crescendo with the release of large language models, it has been background chatter as far back as I can remember. ML developers thinking about licenses have had to do the math of openness based on the components they've used. Questions we've asked ourselves in context of [Feluda](https://github.com/tattle-made/feluda)- is it open source if it relies on Resnet? BERT? What if we use Google Cloud Vision API as one layer of data processing? These were also questions that we had to answer when submitting Feluda to the [DPGA registry](https://www.digitalpublicgoods.net/submission-guide "Submission Guide » Digital Public Goods Alliance"). And then again, [for Uli](https://github.com/tattle-made/Uli "GitHub - tattle-made/Uli: Software and Resources for Mitigating Online Gender Based Violence in India").
11+
Some background- Tattle's work is open source and has always had a machine learning (AI) component to it. While the open source AI rhetoric reached a crescendo with the release of large language models, it has been background chatter as far back as I can remember. ML developers thinking about licenses have had to do the math of openness based on the components they've used. Questions we've asked ourselves in context of [Feluda](https://github.com/tattle-made/feluda)- is it open source if it relies on Resnet? BERT? What if we use Google Cloud Vision API as one layer of data processing? These were also questions that we had to answer when submitting Feluda to the [DPGA registry](https://www.digitalpublicgoods.net/submission-guide "Submission Guide » Digital Public Goods Alliance"). And then again, [for Uli](https://github.com/tattle-made/Uli "GitHub - tattle-made/Uli: Software and Resources for Mitigating Online Gender Based Violence in India").
1212

13-
I am not privy to the events that led OSI to start the consultation process. Some blogs implied that it was Meta's misuse of calling Llama Open Source. But from my perspective, there has been plenty of small scale (mis)use of the open source language in AI predating the genAI boom. It just didn't hit the media scrutiny and public debate scale. The curse of the success of open source is that it has become a common noun to refer to a whole range of things that it wasn't originally intended for. 'Open source → good' has been used as discursive technique to sidestep constitutional oversight of public infrastructure in India. Perhaps that makes me more sensitive to the misuse of the term open source. Even if I didn't want to call out other projects for what I saw was wrong use, I surely didn't want Tattle to lower the signal-to-noise ratio, by calling something open source when it wasn't clear what it meant. I have welcomed clarity on what open source AI means.
13+
I am not privy to the events that led OSI to start the consultation process. Some blogs implied that it was Meta's misuse of calling Llama Open Source. But from my perspective, there has been plenty of small scale (mis)use of the open source language in AI predating the genAI boom. It just didn't hit the media scrutiny and public debate scale. The curse of the success of open source is that it has become a common noun to refer to a whole range of things that it wasn't originally intended for. 'Open source → good' has been used as discursive technique to sidestep constitutional oversight of public infrastructure in India. Perhaps that makes me more sensitive to the misuse of the term open source. Even if I didn't want to call out other projects for what I saw was wrong use, I surely didn't want Tattle to lower the signal-to-noise ratio, by calling something open source when it wasn't clear what it meant. I have welcomed clarity on what open source AI means.
1414

1515
To be clear- that the definition requires developers to not open data makes me uncomfortable. I don't trust research that doesn't publish its data. It is also harder to understand research without its data. Ten minutes with a CSV dump is worth more than two hours on a dataset paper. The [aspirational position](https://sfconservancy.org/news/2024/oct/25/aspirational-on-llm-generative-ai-programming/ "SFC Announces Aspirational Statement on LLM-backed generative AI for Programming") that the Software Freedom Conservancy put out on the use of GenAI in programming is inspiring. There is a world in which the pressure to open source everything along the AI value chain will result in more responsible data collection, and maybe even alternative models of AI development. But pragmatically, we can't reverse the last decade of AI development trajectory. The data guzzling drive is some time from abating. And we can't open data for all domains- not on individuals' reproductive health, not on individual spending patterns.
1616

1717
The choices for an OSAID definition weren't great. Be maximalist about openness on all fronts and leave out whole range of AI applications. Compromise on openness of data and reduce the degree of four freedoms. But a definition means standing on solid ground rather than shifting sands. AI is a different technical artifact than software, making it difficult to come up with a clean definition. But it derives from (open source) software innovation. Entities- and not just Meta- were using open source to describe their work, even in the absence of a definition. Open source licenses are also mental shortcuts to understand something important about a software project. But, any invocation of open source in AI would confuse rather than clarify.
1818

1919
The flaws of the definition aside, I am relieved that we can now (for the most part) objectively evaluate claims. Even if the OSAID definition 'fails' long term, I think the process has been a success. Here are two possible 'failure' outcomes, which to me are still good outcomes:
2020

21-
1. Over time we realize that the OSAID definition doesn't imply the same goodness as the Open Source Definition (OSD). Some other process results in another definition and over the years it gathers the same social support as the OSD. It may or may not not be called the open source AI definition but, to quote Gallileo, the essence of things comes first. Names come after.
21+
1. Over time we realize that the OSAID definition doesn't imply the same goodness as the Open Source Definition (OSD). Some other process results in another definition and over the years it gathers the same social support as the OSD. It may or may not not be called the open source AI definition but, to quote Gallileo, the essence of things comes first. Names come after.
2222

2323
2. The rhetoric of openness in AI will lose weight. People find other/better ways of describing goodness and responsibility in AI. We all just give up on saying anything about open source in AI (and call out the ones who do). I don't think we could get to this point without having tried our hands at a definition.
2424

25-
For someone who wasn't in FOSS in early 2000s, it is hard to know if this process was more or less heated than the OSD consultation. Reading all the blogs and forums however, has reminded me about all that I love about the FOSS community. Working on online harms means that I am used to staring at the worst of human discourse. I don't take people disagreeing passionately yet respectfully, for granted. At present, AI appears to operate under a strong centripetal force of a few large corporations. But I trust the FOSS community to passionately push for a bigger space for public interest and get us to a better definition, if this doesn't serve the purpose. For now, we're ready to work with this one.
25+
For someone who wasn't in FOSS in early 2000s, it is hard to know if this process was more or less heated than the OSD consultation. Reading all the blogs and forums however, has reminded me about all that I love about the FOSS community. Working on online harms means that I am used to staring at the worst of human discourse. I don't take people disagreeing passionately yet respectfully, for granted. At present, AI appears to operate under a strong centripetal force of a few large corporations. But I trust the FOSS community to passionately push for a bigger space for public interest and get us to a better definition, if this doesn't serve the purpose. For now, we're ready to work with this one.

0 commit comments

Comments
 (0)