Add community.lexicon.preference.ai lexicon#72
Conversation
|
Exciting! lexicon.community could be a great home for this, esp since it stalled within Bluesky PBC. I'd have to think a bit more to fully grok the scopes and usage types, but my main first thought is, if we go to all the effort of a working group etc, maybe we shoould go ahead and include the other two intents in https://github.com/bluesky-social/proposals/blob/main/0008-user-intents/README.md too, bulk datasets and protocol bridging? |
|
LGTM besides the matter of default values and maybe how omission should be interpreted; Agree with @snarfed on other intents being in scope. Thanks for the quick turnaround |
|
I'm not certain this language is quite as clear as it should be; as models continue to change in terms of their creation primitives, what seems reasonable now to split between inference and training may not seem that way in 12-18 months. I think many of the thoughts in Focus on purpose of use rather than time of ingestion - IETF AIPREF WG #159as well as Replace current vocabulary with a display-based preferences vocabulary have very effective ideas. I'd like to see this concept fleshed out a bit, especially if the goal is to make this legible to regular users. Things like "scientific use: true" or "generative fiction use: false" may be more relevant than the time of ingestion frames currently used here. |
|
As mentioned above, the work at the IETF is highly relevant – not only with regard to the vocabulary itself, but also the attachment mechanisms. https://datatracker.ietf.org/doc/draft-ietf-aipref-attach/ Following the IETF meeting in Toronto a couple of days ago, a new editor’s draft is expected soon. It will include several improvements, in particular on the discoverability of content that has been opted out in the context of search. This is an important point. Many users may wish to opt out of AI training while still remaining discoverable in what the IETF may call “non-generative search” – meaning AI-assisted search that does not provide AI-generated summaries, synthetic answers, or other substitute outputs. In practice, this can be understood as a narrower form of the IETF Internet draft on display-based preferences mentioned already by @musicjunkieg. The use cases around RAG and inference – where content is used by AI systems after model training – are expected to be discussed at a future IETF meeting, likely in late summer. That discussion should help clarify whether the emerging IETF vocabulary will provide meaningful value for creators and rightsholders, who would like to have a say how content is used by AI systems post-training. Adding new AI preference expressions may be desirable, however it should be considered whether the AI model developers, AI system providers, or search engines will take them into account. I suggest a realistic approach in this regard. A second point concerns the attachment mechanism. Should AI preferences be applied only as general account-level settings, or should they rather be attachable to individual posts and media assets? This distinction matters. Content is frequently shared, quoted, or reposted by accounts that are not in a position to decide on rights reservations or permissions. For that reason, attaching such preferences solely at account level may create both practical and legal concerns. At Liccium, we are working on an asset-level approach in which AI preferences can be bound directly to the individual post and to the underlying media asset (blob) using ISCC fingerprints. This allows preferences to travel with the content itself, rather than depending only on the account or the platform through which it was shared. |
Summary
community.lexicon.preference.ailexicon for declaring user preferences regarding AI usage of their public dataglobalScope,entityScope, andcollectionScopeso users can set account-wide defaults and carve out exceptions for specific entities or collectionsDesign
Each preference is tri-state: allowed, denied, or undefined (omitted). The record at key
selfwithglobalScopeestablishes account-wide defaults. Additional records keyed by TID are scoped overrides that only need to declare the preferences they change — everything else falls through to the default.Consumer resolution order:
selfRelated work