Skip to content

Add community.lexicon.preference.ai lexicon#72

Merged
ngerakines merged 1 commit into
mainfrom
ngerakines/community.lexicon.preference.ai
Apr 25, 2026
Merged

Add community.lexicon.preference.ai lexicon#72
ngerakines merged 1 commit into
mainfrom
ngerakines/community.lexicon.preference.ai

Conversation

@ngerakines

Copy link
Copy Markdown
Contributor

Summary

  • Introduces the community.lexicon.preference.ai lexicon for declaring user preferences regarding AI usage of their public data
  • Decomposes AI usage into four distinct categories (training, inference, synthetic content generation, embedding), each with independent allow/deny controls
  • Supports scoped overrides via globalScope, entityScope, and collectionScope so users can set account-wide defaults and carve out exceptions for specific entities or collections

Design

Each preference is tri-state: allowed, denied, or undefined (omitted). The record at key self with globalScope establishes account-wide defaults. Additional records keyed by TID are scoped overrides that only need to declare the preferences they change — everything else falls through to the default.

Consumer resolution order:

  1. Entity-scoped override matching the consumer's DID or domain
  2. Collection-scoped override matching the content's NSID
  3. Global default at key self

Related work

@snarfed

snarfed commented Apr 4, 2026

Copy link
Copy Markdown
Member

Exciting! lexicon.community could be a great home for this, esp since it stalled within Bluesky PBC.

I'd have to think a bit more to fully grok the scopes and usage types, but my main first thought is, if we go to all the effort of a working group etc, maybe we shoould go ahead and include the other two intents in https://github.com/bluesky-social/proposals/blob/main/0008-user-intents/README.md too, bulk datasets and protocol bridging?

@rudyfraser

Copy link
Copy Markdown

LGTM besides the matter of default values and maybe how omission should be interpreted; Agree with @snarfed on other intents being in scope. Thanks for the quick turnaround

@musicjunkieg

Copy link
Copy Markdown

I'm not certain this language is quite as clear as it should be; as models continue to change in terms of their creation primitives, what seems reasonable now to split between inference and training may not seem that way in 12-18 months.

I think many of the thoughts in Focus on purpose of use rather than time of ingestion - IETF AIPREF WG #159as well as Replace current vocabulary with a display-based preferences vocabulary have very effective ideas. I'd like to see this concept fleshed out a bit, especially if the goal is to make this legible to regular users.

Things like "scientific use: true" or "generative fiction use: false" may be more relevant than the time of ingestion frames currently used here.

@sposth

sposth commented Apr 23, 2026

Copy link
Copy Markdown

As mentioned above, the work at the IETF is highly relevant – not only with regard to the vocabulary itself, but also the attachment mechanisms.

https://datatracker.ietf.org/doc/draft-ietf-aipref-attach/
https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/05/

Following the IETF meeting in Toronto a couple of days ago, a new editor’s draft is expected soon. It will include several improvements, in particular on the discoverability of content that has been opted out in the context of search.

This is an important point. Many users may wish to opt out of AI training while still remaining discoverable in what the IETF may call “non-generative search” – meaning AI-assisted search that does not provide AI-generated summaries, synthetic answers, or other substitute outputs. In practice, this can be understood as a narrower form of the IETF Internet draft on display-based preferences mentioned already by @musicjunkieg.

The use cases around RAG and inference – where content is used by AI systems after model training – are expected to be discussed at a future IETF meeting, likely in late summer. That discussion should help clarify whether the emerging IETF vocabulary will provide meaningful value for creators and rightsholders, who would like to have a say how content is used by AI systems post-training.

Adding new AI preference expressions may be desirable, however it should be considered whether the AI model developers, AI system providers, or search engines will take them into account. I suggest a realistic approach in this regard.

A second point concerns the attachment mechanism. Should AI preferences be applied only as general account-level settings, or should they rather be attachable to individual posts and media assets? This distinction matters. Content is frequently shared, quoted, or reposted by accounts that are not in a position to decide on rights reservations or permissions. For that reason, attaching such preferences solely at account level may create both practical and legal concerns.

At Liccium, we are working on an asset-level approach in which AI preferences can be bound directly to the individual post and to the underlying media asset (blob) using ISCC fingerprints. This allows preferences to travel with the content itself, rather than depending only on the account or the platform through which it was shared.

@ngerakines ngerakines merged commit 02044ea into main Apr 25, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants