Skip to content

Add hashing for activity ids#6366

Open
uOJackDu wants to merge 18 commits intoLemmyNet:mainfrom
uOJackDu:outbox-activity-id
Open

Add hashing for activity ids#6366
uOJackDu wants to merge 18 commits intoLemmyNet:mainfrom
uOJackDu:outbox-activity-id

Conversation

@uOJackDu
Copy link
Copy Markdown

For issue #6341.

Add hashing for activity ids so that the ids of the activity objects do not change on every request.


let mut ordered_items = vec![];
for post_view in post_views {
let post_ap_id = post_view.post.ap_id.clone();
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where I am passing in the object_id. I think it makes the ids for Announce and Create stable. Not sure where else I could use it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs to be in crates/apub/activities/src/create_or_update/post.rs so both code paths generate the same id.

@uOJackDu uOJackDu marked this pull request as ready for review February 27, 2026 09:10
Comment thread crates/apub/activities/Cargo.toml Outdated

let create_or_update =
CreateOrUpdatePage::new(post.into(), &person, &community, kind, &context).await?;
CreateOrUpdatePage::new(post.into(), &person, &community, kind, None, &context).await?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CreateOrUpdatePage::new(post.into(), &person, &community, kind, None, &context).await?;
CreateOrUpdatePage::new(post.into(), &person, &community, kind, Some(post.ap_id), &context).await?;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this can be a problem when editing the post, then each Update activity will have the same id. So you also need to hash the timestamp (published_at or updated_at).

.into();

let id = generate_activity_id(kind.clone(), &context)?;
let id = generate_activity_id(kind.clone(), None, &context)?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let id = generate_activity_id(kind.clone(), None, &context)?;
let id = generate_activity_id(kind.clone(), Some(comment.ap_id), &context)?;

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use the timestamps here for Update as well?

Comment thread crates/apub/activities/src/lib.rs Outdated
/// Generate a unique ID for an activity, in the format:
/// `http(s)://example.com/receive/create/202daf0a-1489-45df-8d2e-c8a3173fed36`
fn generate_activity_id<T>(kind: T, context: &LemmyContext) -> Result<Url, ParseError>
fn generate_activity_id<T>(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid passing None in so many places, you could do:

fn generate_activity_id_with_object_id(kind, context) {
  generate_activity_id(kind, None, context)
}

fn generate_activity_id(kind, object_id, context) {
  generate_activity_id(kind, context)
}

@dessalines
Copy link
Copy Markdown
Member

dessalines commented Mar 2, 2026

What is the minimum amount of info needed to create uniqueness for these generated ap_ids?

Seems like kind/action, numeric_object_id should be enough right?

@Nutomic
Copy link
Copy Markdown
Member

Nutomic commented Mar 3, 2026

That is not enough, because if you edit a post, it is federated as Update<Page> activity each time. So the kind and object id will be identical every time, but the generated ap_id must be different (otherwise it will be ignored as duplicate on the receiving side). So the object created_at/updated_at timestamp also needs to go into the hash.

@uOJackDu uOJackDu force-pushed the outbox-activity-id branch from 18c6db2 to 91f043f Compare March 29, 2026 00:29
@dessalines
Copy link
Copy Markdown
Member

Bump on this to address the comments above. As stated you'll probably need to hash the object id as well as the published / updated time.

@uOJackDu
Copy link
Copy Markdown
Author

uOJackDu commented Apr 7, 2026

Hi, should we use timestamps for Update comments as well? Or it's not needed since I don't see any comment-related actions when making a request to the outbox of a community?
I don't see any update activities showing up in the outbox either, but we still use hashed timestamps to ensure they are unique, so just wanna know if we should do that for comments or other objects as well.

) -> LemmyResult<()> {
let announce = AnnounceActivity::new(object.clone(), community, context)?;
let announce = AnnounceActivity::new(object.clone(), community, None, context)?;
let inboxes = ActivitySendTargets::to_local_community_followers(community.id);
Copy link
Copy Markdown
Author

@uOJackDu uOJackDu Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want the activity id for Announce activities to be the hash in the outbox, and a random UUID in the sent_activity table? For Create activities, both are the same.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same activity must always have the same ID.

Comment thread crates/apub/activities/src/lib.rs Outdated
.get(..16) // should not fail
.ok_or(UntranslatedError::CouldntGenerateHash)?
.try_into()?,
))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You dont really need a Uuid but can return digest.to_string() and probably get rid of this new error. Also move this code inline as the method is only called in a single place.

Comment thread crates/apub/activities/src/lib.rs Outdated
};

let id = format!("{}/activities/{}/{}", hostname, kind_str, uuid);
Url::parse(&id).map_err(|e| LemmyError::from(anyhow::anyhow!(e)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Url::parse(&id).map_err(|e| LemmyError::from(anyhow::anyhow!(e)))
Ok(Url::parse(&id)?)

seed_url.set_fragment(Some(&timestamp.to_rfc3339()));
Some(&seed_url.clone())
}
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private message also needs the same logic. Best generate_activity_id() to take an optional timestamp param (or make a separate method), to avoid duplicate code. Theres nothing wrong with hashing the timestamp for create activities as well so it will be simpler.

And you dont need to change anything in this file, it only needs to be in CreateOrUpdatePage::new

Comment on lines +65 to +67
let timestamp = comment.updated_at.unwrap_or(comment.published_at); // use the latest timestamp
let mut seed_url = ap_id;
seed_url.set_fragment(Some(&timestamp.to_rfc3339()));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be in this PR, but these things should be trait functions.

Actors have a ApubActor trait, which defines functions like generate_local_actor_url, read_from_name, etc.

There's no reason why that couldn't also be done for the other types, and activities on them also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants