Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions proposals/4456-safety-harms-appendix.md
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • 2+ MSCs in different areas using this. For example, search redirection and reporting.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically met by the following two MSCs, but I'd like to see each further along in the spec process before considering them true implementations of this MSC:

Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# MSC4456: Harms taxonomy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very dubious that we should be baking a taxonomy like this into the Matrix spec, because:

  1. The spec is already huge
  2. Harms can be very subjective and will encourage bikeshedding or bloat. e.g. where is Lese-majesty on the list?
  3. In practice we'll always need an 'other' fallback with a natural language explanation anyway - why not use natural language all along?
  4. Why do we care about semantic codes here at all?
  5. I suspect that we're going to see more and more LLM-based moderation functionality in future, which will be quite happy to process natural language reasons rather than trying to create a set of reason enumerations

At the least, i'd expect the reasons to sit in an external registry somewhere to avoid bloating the spec.

Copy link
Copy Markdown

@thetayloredman thetayloredman May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, many online reporting platforms have a flow for specifically selecting harms information which I believe to be the point of the MSC:
image

It feels reasonable to want to put a list of some common harms in to make the interface better and potentially aid in tooling without the intrinsic requirement for LLMs :D

I do agree that it might be better outside of the spec, but the question for me is where does this get defined, because it feels very necessary.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The spec is already huge

Building a better way to communicate necessarily involves a bunch of detail!

  1. Harms can be very subjective and will encourage bikeshedding or bloat.

That is a risk. Defining a baseline taxonomy rather than a comprehensive one may help to mitigate that risk.

  1. In practice we'll always need an 'other' fallback with a natural language explanation anyway - why not use natural language all along?
  2. Why do we care about semantic codes here at all?

With semantic codes, clients can more easily build better reporting flows, offering tailored advice based on the type of harm the user has experienced (e.g. direction to helplines, law enforcement, how to keep themselves safe). Servers and communities can use the codes to communicate why enforcement action was taken against a user or piece of content (a requirement in many safety laws). Safety teams can use user-provided codes to triage reports more effectively, both with human teams, and by routing to the most cost/time-effective automated flows. When all servers in a Matrix federation share a common taxonomy of harms, it simplifies sharing details of those harms over federation.

  1. I suspect that we're going to see more and more LLM-based moderation functionality in future, which will be quite happy to process natural language reasons rather than trying to create a set of reason enumerations

Using semantic reasons enables the use of more cost-effective & faster single purpose models rather than slower, more expensive general models, and enables routing to appropriate humans for review.

At the least, i'd expect the reasons to sit in an external registry somewhere to avoid bloating the spec.

An alternative: reference an external standard, as we do with RFCs elsewhere in the spec. Unfortunately, there doesn't appear to be a suitable standard that provides this taxonomy at present, but this could be something to explore and then replace this proposal down the line. The DTSP framework (ISO/IEC 25389) is an example of nascent work in safety standardisation, that doesn't do what we need here. The spec could also offer appendices for this type of content, if there are concerns? I think the spec should contain appropriate guidance to building safe servers and clients, so I'd be comfortable with us including it in the body of the spec.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest I can find for an existing reference are AT Proto's com.atproto.moderation.defs and tools.ozone.report.defs models. Obviously, these definitions are highly targeted at AT Proto's use cases, but the parallels in this MSC should be fairly evident as well :)

We may benefit from just copying AT Proto's definitions directly, or working with them to create an external standard that works for both of us. This MSC currently suggests we do something similar to what AT Proto did: create an appendix/definition that exists within their world and refer to it as needed.

The next closest I can find for an existing reference is the European Commission's Transparency Database API which describes content in two ways: a Category and a Category Specification. By nature of it being backed by the Digital Services Act (DSA), it's highly focused on that particular regulatory environment - it does not easily apply to other environments such as the UK, US, Australia, or Canada (despite these places copying most of each other's work in law creation).

Related work is from the World Economic Forum (WEF) which attempts to describe harms in ways that users can understand, but is hardly a "harm identifier" list. Their report can be found here.

The Trust & Safety Professional Association (TSPA) attempts to list the types of abuse, but also doesn't define machine-friendly identifiers for those abuse types. It may be possible to ask them to create a machine-friendly taxonomy for their list, though I expect it'll be too broad for our purposes in Matrix.

IFTAS is primarily used by ActivityPub, but is not a standards organization. They do however provide definitions for 3 types of harmful content (and how to deal with it): by actor, behaviour, or content. Like TSPA, we might be able to ask them to consider a machine-friendly specification for these types of harms. Being associated with ActivityPub might make them more applicable to Matrix too.

If we really don't want to host the list as Matrix, we can probably look to the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG) to establish a set of identifiers. The DPVCG might not take on the work because not all harms are privacy related.

OASIS might be able to help create a standard external to Matrix as well, though their primary output locations are the ISO and IEC. It may be faster/easier/different to go through a local national body instead, like the Standards Council of Canada (SCC). The DTSP Safe Framework Specification is hosted as ISO/IEC 25389 (as Jim mentions), so it's plausible that we could get a similar harms taxonomy specification there too.

DTSP might also be able to help create a standard to reference.


> [!WARNING]
> **Content Warning**: This proposal discuses and identifies harmful content, but does not attempt
> to describe the harm posed in detail. This includes identifiers for child safety, sexual abuse,
> self-harm, and other types of harm a user may encounter on the open internet.

*This MSC is part of "Reporting v2" - a project led by the Foundation’s T&S team to improve communication
and effectiveness of reports on Matrix.*

When a user reports something, they ideally are able to express their opinion for what kind of harm
they believe is caused by that something. Safety teams use this information to guide their initial
investigation, and can reclassify a report as needed to better match the harm actually caused. This
is in contrast to an (often blank) free-form report reason where the safety team needs to guess at
the harm caused.

Identified harms are also helpful when events or searches are rejected due to backend safety tooling,
like in [MSC4387](https://github.com/matrix-org/matrix-spec-proposals/pull/4387).

This proposal introduces a new [appendix](https://spec.matrix.org/v1.18/appendices/) to list out
harms common to safety legislation across the world. Other proposals are expected to actually use
this taxonomy - this proposal simply introduces them as a dependency for other MSCs.


## Proposal

The following standardized harm identifiers are expected to be used by Matrix safety tooling and features.
For example, when reporting something, the user can express their opinion of what kind of harm is caused
by the thing they're reporting. Similarly, a (policy) server might reject an event due to a common
harm.

The identifiers chosen represent the similarities between various safety legislations (UK, Australia,
Canada, EU, etc) and what other services offer to their users in reporting flows, especially Bluesky. The identifiers use
the [Common Namespaced Identifier Grammar](https://spec.matrix.org/v1.18/appendices/#common-namespaced-identifier-grammar),
and therefore allow custom harms to be represented. When a proposal uses harm identifiers, it SHOULD
ensure that custom identifiers are accompanied by a specified identifier for added clarity/compatibility.

The harms, their categories, and suggested names are:
Comment thread
turt2live marked this conversation as resolved.

**Spam**

* `m.spam` - General/Other
* `m.spam.fraud` - Fraud/Phishing
* `m.spam.impersonation` - Impersonation
* `m.spam.election_interference` - Election Interference
* `m.spam.flooding` - Flooding
Comment on lines +43 to +46
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worthwhile to have a separate m.misinformation category, especially to include other forms of synthetic media/deepfakes besides m.adult.deepfake (not all deepfakes are inherently sexual, so there could be e.g. m.misinformation.deepfake). "Election interference" feels like it wouldn't always be a subcategory of spam.

Having them be independent also means you can classify m.misinformation.fraud alongside m.spam where they come in a list.

Additionally, if the UX was designed to align with these categories, it wouldn't make sense for a user to view these under "spam," IMHO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at Bluesky (where the list is inspired in part from), they label "spam" as "Misleading - spam or other inauthentic behaviour or deception". We might want to adopt similar labeling for the "Spam" category we have here.

They also consider deepfakes to be primarily adult content. I suspect that if a user was reporting a deepfake that wasn't easily classified as adult content then they'd use "impersonation" or "other misleading content" (to use the Bluesky terms).


**Adult Content & Safety**

* `m.adult` - General/Other
* `m.adult.sexual_abuse` - Sexual Abuse
* `m.adult.ncii` - Non-Consensual Intimate Imagery
* Recognizing that these links deal with sexual abuse topics, more information about NCII can be
found at [StopNCII](https://stopncii.org/), [INHOPE](https://inhope.org/EN/articles/what-is-ncii),
and [Meta's Safety Center](https://www.meta.com/safety/topics/bullying-harassment/ncii/).
* `m.adult.deepfake` - Deepfake
* `m.adult.animal_sexual_abuse` - Animal Sexual Abuse
* `m.adult.sexual_violence` - Sexual Violence

**Harassment**

* `m.harassment` - General/Other
* `m.harassment.trolling` - Trolling
* `m.harassment.targeted` - Targeted
* `m.harassment.hate` - Hate
* `m.harassment.doxxing` - Doxxing/Personal Information

**Violence**

* `m.violence` - General/Other
* `m.violence.animal_welfare` - Animal Welfare
* `m.violence.threats` - Threatening/Threats
* `m.violence.graphic` - Graphic/Gore
* `m.violence.glorification` - Glorification/Promotion
* `m.violence.extremist` - Extremism
* `m.violence.human_trafficking` - Human Trafficking
* `m.violence.domestic` - Domestic/Intimate Partner

**Child Safety**

* `m.child_safety` - General/Other
* `m.child_safety.csam` - Child Sexual Abuse Material (CSAM)
* `m.child_safety.grooming` - Grooming
* `m.child_safety.privacy_violation` - Privacy
* `m.child_safety.harassment` - Harassment

**Dangers**

* `m.danger` - General/Other
* `m.danger.self_harm` - Self Harm
* `m.danger.eating_disorder` - Eating Disorder
* `m.danger.challenges` - Challenges, including Social Media Challenges
* `m.danger.substance_abuse` - Substance Abuse

**Terms of Service**

* `m.tos` - General/Other
* `m.tos.hacking` - Hacking/Computer Misuse
* `m.tos.prohibited` - Prohibited Items (Drugs, Weapons, etc)
* `m.tos.ban_evasion` - Ban Evasion

**Other**

* `m.other` - Other Concern

## Implementation considerations

* A reporting dialog using these harms might have a two-tier dropdown: one for the category (spam,
harassment, etc) and another for the specific harm caused (defaulting to "General/Other").

* Because `m.other` is the only harm under the "Other" category, a two-tier dropdown might skip the
second dropdown for this category.

* Clients should generally aim to keep definitions/titles of the harms brief to be as broadly applicable
as possible.

* Future MSCs are encouraged to explore using the harms list *safely*. [MSC4204](https://github.com/matrix-org/matrix-spec-proposals/pull/4204)
and [MSC4205](https://github.com/matrix-org/matrix-spec-proposals/pull/4205) both discuss challenges
related to classifying content against users.


## Potential issues

* The specified harms list might not be extensive enough to encompass all possible harms in the online
world. It's expected that custom identifiers for commonly used harms will become MSCs to expand the
list as needed. Other MSCs are expected to describe ways of expressing custom namespaced harms that
are usable in different environments.


## Alternatives

No significant alternatives. Having a standard taxonomy of harms is important for safety teams to
better handle reports and for safety tooling to express why it has rejected content.


## Security considerations

None relevant. Proposals which build upon the specified harm identifiers may have their own security
considerations.


## Unstable prefix

While this proposal is not considered stable, implementations should use `org.matrix.msc4456` in place
of `m` in the identifiers. For example, `m.spam` becomes `org.matrix.msc4456.spam`. This prefix change
is done to mitigate possible changes to the final identifiers list ahead of acceptance.


## Dependencies

No direct dependencies. Several other MSCs are expected to build upon this proposal, however.
Loading