Skip to content

Improve handling of activities signed by non-resolvable actors #472

@jfietkau

Description

@jfietkau

Summary

Fedify currently cannot receive activities signed by non-fetchable actors. The HTTP 401 errors it emits in response to such actors cause remote servers to retry delivery of the same activity, in some cases indefinitely. Since some legitimate deletions of actors fall under this pattern, Fedify applications may miss out on information about deleted actors.

Problem

During normal operation of a Fedify application running in the wild (federating with lots of servers), my logs show several errors of the following pattern per minute:

06:41:02.815 ERR fedify·runtime·docloader Failed to fetch document: 404 "[deleted AP actor]" {
  [lots of HTTP headers]
}
06:41:02.819 ERR fedify·federation·inbox Failed to verify the request's HTTP Signatures.
06:41:02.819 WRN fedify·federation·http "POST" "/inbox": 401

They are difficult to investigate as (from what I can tell) the activity payload does not reach the inbox listeners, but my assumption is that many of these are Delete activities concerning remote actors.

The problem here (as I understand it) is that a signed activity is received from a remote actor which Fedify does not have cached. It attempts to fetch the signing key, which fails with an error code of 404 or 410 from the remote server. This causes the signature validation to fail. The inbox endpoint responds to the POST with a 401 error code, which in turn may prompt the remote server to keep retrying delivery.

The most common case of this pattern in the wild is actor deletions (i.e. a Delete activity where the same actor is the attributedTo and the object), but there may be other constellations where a signing actor is unavailable.

The result is that the Fedify application cannot process the activity, and there is a lot of pointless traffic.

Proposed Solution

The SWICG has a draft community report on “ActivityPub and HTTP Signatures” which contains a section Handling Deletes of actors. I believe that this is highly relevant guidance.

If an inbox receives a signed Delete activity where the signing actor (key not cached) is also the object of the Delete activity and an attempt to fetch the actor results in an error code of 410, signature verification should be skipped, the request should be answered with a status code from the 2xx family (I suppose 202 would make sense) and the Delete activity should be passed on to any existing inbox listeners. The 410 error code affirms that the remote actor was intentionally and permanently deleted.

If the Delete activity has the same shape as in the previous example, but an attempt to fetch the actor results in an error code of 404, it... gets more complicated. ☹️

Currently, most of the activities signed by actors whose URI resolves to a 404 error that I can see in my logs, I believe to be accurate Delete activities where the remote server has permanently deleted the actor in question. However, a 404 error on a remote actor is much more likely to happen accidentally and temporarily than a 410. For example, if a remote server goes down for maintenance, less experienced admins may simply “take everything down” and serve 404 errors (instead of the correct 503) while maintenance is ongoing. In such cases, a malicious third server could attempt to forge unverifiable Delete activities for temporarily unavailable actors.

For this and presumably other reasons, the SWICG draft report recommends:

If fetching the keyId (...) fails with an HTTP 404 error, the actor may have been deleted, or something else may have happened. Do not process the Delete.

If Fedify is to follow the SWICG advice, activities signed by actors which resolve to 404 errors (and for which the key is not cached, hence the signature being unverifiable) should not be passed to inbox listeners, as the activity should not be acted on.

This leaves the issue of attempted redelivery. My Fedify server has only been operating for two days and I am getting a lot of traffic signed by 404ing actors, almost all of which I presume to be redelivery attempts of the same activity. It is clogging my logs and causing pointless traffic on both servers. Thus, I add my own proposal to the SWICG guidance:

If an inbox receives a signed Delete activity where the signing actor (key not cached) is also the object of the Delete activity and an attempt to fetch the actor results in an error code of 404 and the server sending the HTTP request that wraps the Delete activity is verifiably the same server that the actor URI resolves to – that is, the request's Host header matches the host in the actor URI, then the Delete activity should not be processed (i.e. in Fedify's case, not passed to inbox listeners), but the request should be acknowledged with a 2xx (success) status code instead of a 401, quelling the remote server's attempts to redeliver the activity.

This might need more investigation, but I believe it can result in reduced server traffic and slightly improved performance for all Fedify applications.

Alternatives Considered

The current behavior could be left as-is for actors emitting 404 errors, as their state is indeterminate. However, for actors emitting 410 errors, I believe enabling the processing of their deletions by Fedify applications is an important step for data integrity and for honoring user expectations for account deletions on the fediverse.

In cases where a non-fetchable actor results in a failed signature verification, Fedify may additionally consider adding the type, actor and object (if applicable) of the received activity to the error log message. This would make it easier to verify whether I am correct that actor deletions are the main culprit of this pattern.

Scope / Dependencies

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions