Skip to content

FIRES Integration - Moderation Recommendations Feed and General Retractions#71

Draft
sgrigson wants to merge 9 commits intoeigenmagic:mainfrom
sgrigson:fires-integration
Draft

FIRES Integration - Moderation Recommendations Feed and General Retractions#71
sgrigson wants to merge 9 commits intoeigenmagic:mainfrom
sgrigson:fires-integration

Conversation

@sgrigson
Copy link
Copy Markdown
Contributor

@sgrigson sgrigson commented Mar 30, 2026

This PR intends to bring the FIRES protocol support into Fediblockhole.

It is hoped this will provide many benefits, primary among them is a sort of state management that hasn't existed in Fediblockhole before, where you can check for new updates since your last update and identify retractions as specific entity items.

Everything gets converted (for now) into Mastodon blocklist operations, so drop/reject maps to 'suspend' and 'filter' maps to 'silence'.

The FIRES Project is definitely worth checking out. If Fediblockhole supports RapidBlock, which I'm not even sure is used anymore, it should definitely support FIRES.

There's a production server here at https://fires.1sland.social

You can freely test against the datasets there, or use them in your own projects.

You should pass an 'Accept' header of application/json or the LD-json type to get back JSON data rather than HTML, if you're interested in querying the endpoints directly.

In addition, this builds on the override_private_comment I'd added previously to allow for retractions when something Fediblockhole has added, identified by the override_private_comment, is removed from a list. That's the less safe version of retractions.

You should check out the README in the PR for more notes on how this is intended to work.

@sgrigson sgrigson changed the title FIRES Integration - Moderation Recommendations Feed FIRES Integration - Moderation Recommendations Feed and General Retractions Mar 30, 2026
# Optional: max_severity to cap the highest severity applied.
blocklist_fires_sources = [
# { server = 'https://fires.example.com' }, # all datasets on this server
# { server = 'https://fires.example.com', datasets = ['dataset-uuid-1', 'dataset-uuid-2'] },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datasets cannot be addressed via UUID, only the absolute IRI for the dataset. You can go:

  1. GET /.well-known/nodeinfo -> /nodeinfo/2.1 -> metadata.fires.datasets
  2. GET metadata.fires.datasets URI as application/ld+json

And then get the individual datasets that way. This is currently I think best documented through the Conformance Test Suite that's almost ready: fedimod/fires#237 (it works, we're just needing to add a few more things)

The protocol for FediMod FIRES does not actually define any URL structure, instead it's "follow your nose" approach: all resources link to other resources.

Copy link
Copy Markdown
Contributor Author

@sgrigson sgrigson Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's cumbersome and I'm going to do more proper discovery of labels and other things.

I've pushed up some changes, but the .well-known/nodeinfo discovery is probably the way to go at the top level. Thanks.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future there'll be a labelsets key in metadata.fires from well-known, and labels will be deprecated (it'll refer to the first labelset create on the server)

@sgrigson sgrigson requested a review from ThisIsMissEm March 30, 2026 19:50
Copy link
Copy Markdown

@ThisIsMissEm ThisIsMissEm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a heap of comments. Some of them explain the rationale, and are things I haven't yet necessarily clearly stated in the documentation. (pull requests are really welcome! It's a huge amount of work for one person)

# 1. Server-wide: fetch all datasets from a FIRES server
# { server = 'https://fires.example.com' }
# 2. Cherry-pick: fetch specific datasets from a server by UUID
# { server = 'https://fires.example.com', datasets = ['uuid-1', 'uuid-2'] }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# { server = 'https://fires.example.com', datasets = ['uuid-1', 'uuid-2'] }
# { datasets = ['https://fires.example/datasets/uuid-1', 'https://fires.example/datasets/uuid-1'] }

# 2. Cherry-pick: fetch specific datasets from a server by UUID
# { server = 'https://fires.example.com', datasets = ['uuid-1', 'uuid-2'] }
# 3. Direct URL: paste a dataset URL directly
# { url = 'https://fires.example.com/datasets/uuid-1' }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# { url = 'https://fires.example.com/datasets/uuid-1' }
# { dataset = 'https://fires.example.com/datasets/uuid-1' }

Comment on lines +101 to +104
fires_allowlists = []
fires_retractions = set() # domains retracted by trusted FIRES sources
if not conf.no_fetch_fires:
fires_blocks, fires_allows, fires_retractions = fetch_from_fires(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally recommend doing it as a "fetch changes from this dataset", and then apply those changes in order, the changes endpoint is sorted by insertion time (internally each record is tracked with a UUID v7, which is time-ordered).

How you apply those changes is up to your software, but if the entries are (oldest to newest):

recommendation https://a.example recommendedPolicy=drop
recommendation https://a.example recommendedPolicy=filter, recommendedFilters=reject-reports
recommendation https://a.example recommendedPolicy=accept

Then the final result would be a single rule for https://a.example that is the policy of "accept", with no filters applied.

That is you apply the records in order, and they are not merges but overwrites

return blocklists


def _parse_dataset_url(url: str) -> tuple:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset URLs are not parseable. The reference server just uses this format, but other implementations may not.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To determine if something is or is not a FIRES server, make the request through nodeinfo to discover that information. Then follow your nose from there.

Comment on lines +353 to +357
for ds in datasets:
ds_id_url = ds.get("id", "")
if ds_id_url:
ds_id = ds_id_url.rstrip("/").split("/")[-1]
fetch_list.append((server_url, ds_id))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't parse the URLs, these are the same as id properties in ActivityPub, they are where the object lives, the structure of the URL does not imply any information.

README.md Outdated
Comment on lines +468 to +472
FIRES recommendations include labels from the
[IFTAS shared vocabulary](https://about.iftas.org/library/shared-vocabulary-labels/)
(e.g., "Hate Speech", "CSAM", "Spam"). These are mapped to the `public_comment`
field on domain blocks, so instance admins can see why a domain was recommended
for blocking.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just one known label vocabulary, based on the Digital Trust & Safety Partnerships' labels which they released as CC-BY license. I know for instance that garden fence has it's own label vocabulary. Others may exist in the future too.


This means an `accept` from a FIRES dataset acts as an override, the same as
adding a domain to a CSV allowlist. It does not call any instance API to
explicitly allow the domain — it simply prevents it from being blocked.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, if you're not doing federation policies and just the binary approach that Mastodon uses of domain blocks OR domain allows.

README.md Outdated

With `ignore_accept` enabled, `accept` recommendations are silently skipped.
Block recommendations (`drop`, `reject`, `filter`) and retractions still work
normally.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, as it is plausible and possible for a Recommendation of drop to become a Recommendation of accept without an intermediary Retraction

README.md Outdated
Block recommendations (`drop`, `reject`, `filter`) and retractions still work
normally.

### Retractions: removing blocks that are no longer recommended
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Retractions: removing blocks that are no longer recommended
### Retractions: removing data that is no longer recommended or advised

README.md Outdated
Comment on lines +511 to +514
This is the FIRES-native approach. When a trusted FIRES source explicitly
retracts a domain, the block is removed from your instance — **regardless of
who originally added it** — as long as no other source in your merged list still
recommends blocking it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would almost certainly want to do bookkeeping to check "did this dataset add this domain as a recommendation/advisory?" as to be able to know if the retraction is valid. Otherwise a retraction from one dataset may override a recommendation from another.

If there is a case of one dataset has the latest state for an entity as retraction or tombstone and another dataset has a advisory or recommendation, you have a few ways of merging that: manual merging, least permissive (i.e., most sever policy applies) or most permissive (least sever policy applies).

You could also do automatic merging and fall back to requesting an operator resolve a merge conflict were an appropriate final policy cannot be determined.

@sgrigson sgrigson requested a review from ThisIsMissEm March 31, 2026 13:46
Optional per-source keys:
max_severity -- cap the highest severity (default: 'suspend')
ignore_accept -- skip 'accept' policy entries (default: false)
retractions -- honor retractions from this source (default: false)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue this should probably be true but with user interaction

@jpwarren jpwarren marked this pull request as draft March 31, 2026 22:02
@jpwarren jpwarren added enhancement New feature or request kudos! You made the world a bit better. Thanks! labels Mar 31, 2026
@jpwarren jpwarren added this to the v0.5.0 milestone Mar 31, 2026
@jpwarren
Copy link
Copy Markdown
Member

Thanks very much for making this!

I've set the PR to draft while people are iterating on the code. Once it's stable, we can set it back to ready for merge.

I'll try to find some time in the next couple of days to review the changes and provide any guidance that might be useful on overall architecture or style things if I see any. I don't want to leave you hanging.

Copy link
Copy Markdown

@ThisIsMissEm ThisIsMissEm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments!

Comment on lines +311 to +312
if source_idx > 0:
time.sleep(2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent hitting 429 errors, hopefully. I suppose I could just rely on exponential backoff.

max_severity = source.get("max_severity", "suspend")
ignore_accept = source.get("ignore_accept", False)
honor_retractions = source.get("retractions", False)
language = source.get("language", "en")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the full list of locales the reference server uses, but really it's just any BCP-47 language tag: https://github.com/fedimod/fires/blob/main/components/fires-server/config/locales.ts (I use this limited list to make the UI approachable)

So this should probably be en-US as that's the default locale: https://github.com/fedimod/fires/blob/main/components/fires-server/start/env.ts#L75


# Check if this block was added by one of the datasets
# that is now retracting it
private_comment = getattr(serverblock, 'private_comment', '') or ''
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One day the Mastodon team will add like a correlation_id or something to domain blocks and domain allows to allow adding that metadata. Or just an arbitrary metadata json blob that is completely pass-through. I just won't be implementing it for the foreseeable future.

def __init__(self, base_url: str):
self.base_url = base_url.rstrip("/")
def __init__(self, dataset_url: str):
self.dataset_url = dataset_url.rstrip("/")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd trust the user's input as verbatim; whilst the reference server doesn't care about the trailing slash, other implementations might.

Comment on lines +137 to +141
if response.status_code == 429 and attempt < retries - 1:
wait = (attempt + 1) * 5
log.warning(f"FIRES: rate limited on {url}, waiting {wait}s (attempt {attempt + 1}/{retries})")
time.sleep(wait)
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, you met my rate limiter? What threshold triggered it?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, I've just added the documentation for the rate limit headers that are actually present, so you can follow those: adonisjs/v7-docs#30

Comment on lines +201 to +204
else:
# Try extracting a slug from a URL as fallback
slug = label_ref.rstrip("/").split("/")[-1]
names.append(slug)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need this, labels must have a name or nameMap

comment = (comment or "").strip()
if label_text and comment:
return f"{label_text} — {comment}"
return label_text or comment
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a max-length in mastodon for domain block comments, not sure about other software. It wouldn't be unreasonable for cap this at like 1k or 2k characters, or a max-graphemes count (if UTF-8)


# No policy means informational only — skip it
if not policy:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a log line here, the validator in the reference server does actually currently enforce all changes have a recommendedPolicy: https://github.com/fedimod/fires/blob/main/components/fires-server/app/validators/admin/dataset_change.ts#L80

I would keep this logic, but just log that you encountered a recommendation without a recommended policy.

domain=domain,
severity=severity,
public_comment=public_comment,
private_comment=f"FIRES:{dataset_url}",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may wanna add a sha256 hash of the change record's id too, which should be enough to deduplicate when tombstones are a thing.

log.info(f"FIRES: incremental update for dataset {dataset_id}")
snapshot = client.get_snapshot(dataset_id)
log.info(f"FIRES: incremental update for {dataset_url}")
snapshot = client.get_snapshot()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really get_changes here, I can see why you might merge those methods, but I'd probably keep them separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request kudos! You made the world a bit better. Thanks!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants