Skip to content

Conversation

@benjaminfrueh
Copy link
Contributor

@benjaminfrueh benjaminfrueh commented Sep 19, 2025

📝 Summary

The oc_text_steps table was growing significantly. One reason for that were orphaned steps accumulating in the database. Steps without existing sessions were never cleaned up by the existing cleanup job.

This addresses orphaned steps only. A separate PR will add garbage collection for very old text sessions to prevent long-term accumulation.

The steps are deleted in batches of 1000 and the execution is time-limited for 30 seconds.

Fixes: #3740
Fixes: #3915

🏁 Checklist

  • Code is properly formatted (npm run lint / npm run stylelint / composer run cs:check)
  • Sign-off message is added to all commits
  • Tests (unit, integration and/or end-to-end) passing and the changes are covered with tests
  • Documentation (README or documentation) has been updated or is not required

@max-nextcloud
Copy link
Collaborator

Thanks a lot for tackling this @benjaminfrueh !

The wording here is confusing and we've been wanting to fix this for a while - but never got to it.
A session is a single users editing connection. It can be ephemeral. If the network is flaky the and the user disconnects for more then 30 seconds the client will reconnect and create a new session for example. If multiple users edit the same document there are multiple sessions at the same time.
The steps are basically capturing the changes a user made with a given session. Even if the session does not exist anymore they continue to be relevant as even if i am not editing anymore and my session has been cleaned up - the changes that I made should persist.

However every 30 seconds we autosave the document including the full yjs editing history (docuementState). Someone who starts to edit will only fetch the latest documentState and all steps that have been created afterwards. So for people joining steps with an id lower than the lastSavedVersion in the corresponding document are not needed anymore. However there may be people who are already connected who have not yet received all steps up to the lastSavedVersion so we cannot clean it up right away.

I'm sorry this is such a mess to begin with. It evolved over time and we did not manage to clean it up. I'm happy to talk it through with you to figure out which steps we can remove and how to clean this up for the future.

Copy link
Member

@mejo- mejo- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this. Code changes look good to me. I only have a minor nitpicking comment.

Also, these issues seem related and it might be worth to mention them in the commit or PR message:

@github-project-automation github-project-automation bot moved this to 🧭 Planning evaluation (don't pick) in 📝 Office team Sep 22, 2025
@benjaminfrueh benjaminfrueh moved this from 🧭 Planning evaluation (don't pick) to 🏗️ In progress in 📝 Office team Sep 22, 2025
@blizzz blizzz moved this from 🏗️ In progress to 📄 To do (~10 entries) in 📝 Office team Sep 22, 2025
@benjaminfrueh benjaminfrueh moved this from 📄 To do (~10 entries) to 🏗️ In progress in 📝 Office team Sep 22, 2025
@benjaminfrueh
Copy link
Contributor Author

Hi @max-nextcloud and @mejo- , thank you for the feedback. I've pushed some changes.

  • Only orphaned steps with a version (id) < lastSavedVersion are deleted, with a 24h safety buffer
  • Steps with no existing document are deleted
  • Steps with an existing session or version >= lastSavedVersion are not deleted
  • Added Qb suffixes to the query builder variables

If you have any more suggestions or proposed changes for the cleanup I would be happy to discuss them with you in detail.

@benjaminfrueh benjaminfrueh requested a review from mejo- September 22, 2025 16:20
@mejo-
Copy link
Member

mejo- commented Sep 24, 2025

Points from discussion with @benjaminfrueh:

  • Condition to keep steps of active sessions doesn't help for scenario when client comes back after long time. Steps from sessions that got closed in the meantime will be missing then as well.
  • What happens if client comes back after a long time and asks for steps older than last one available on server? Code seems like I get an incomplete subset of steps (if I ask for steps > 10 and server has only steps starting from 15 onwards, I get steps 15+).
  • What's the consequence client-side? Does it reconnect because applying the steps results in the yjs state that indicates missing steps?
  • Shall we return an error server-side instead if asking for steps older than those that exist?

@max-nextcloud
Copy link
Collaborator

@mejo- You are right. I was confusing what the query does. I was thinking it only removes steps for documents where no sessions exist anymore at all. But it would remove those from a session that stopped existing even if further sessions exist.

Right now the client would fetch the steps that it can get. It may not notice that steps are missing if there are no steps arriving that depend upon them. If there are steps missing that later steps depend upon the client will notice and attempt to recover the state by loading the saved documentState and all steps since then. However we have observed that that can lead to small parts of the content being duplicated. So I would not want to rely on it.

Detecting the missing step server side does not seem possible right now as we are just using the auto-incrementing id to retrieve newer steps. We used to have a version count that was per document and increased in steps of one - but I thought it was a clever idea to remove that 😶‍🌫️ - which I've been regretting ever since 😉 . But now you cannot tell if some step is missing in between the version provided by the client and the latest step.

@benjaminfrueh benjaminfrueh moved this from 🏗️ In progress to 👀 In review in 📝 Office team Oct 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2025

Hello there,
Thank you so much for taking the time and effort to create a pull request to our Nextcloud project.

We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process.

Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6

Thank you for contributing to Nextcloud and we hope to hear from you soon!

(If you believe you should not receive this message, you can add yourself to the blocklist.)

Copy link
Collaborator

@max-nextcloud max-nextcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for tackling this!

@mejo- and me observed an out of sync state that was persisted to the documentState and could be recovered by replaying all the steps.

We need to make sure we can always load the editing session from the documentState and the steps that follow it.

Right now this does not seem to be the case. Let's discuss together what can be done about this and then get this merged once we are sure the client does not write incomplete documentState anymore. See also #7704 and #7692 for the bigger picture.

Copy link
Collaborator

@max-nextcloud max-nextcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been no reconnect attempts in the last week and we also observed none. So seems fine to clean up the orphaned steps as planned.

@max-nextcloud
Copy link
Collaborator

@mejo- I think your change request has been addressed, right?

@mejo-
Copy link
Member

mejo- commented Nov 12, 2025

@max-nextcloud agreed, let's get this merged. I guess it should also be backported, right @benjaminfrueh?

@mejo- mejo- merged commit f7c552e into main Nov 12, 2025
78 of 80 checks passed
@mejo- mejo- deleted the fix/steps-cleanup-orphaned branch November 12, 2025 14:42
@github-project-automation github-project-automation bot moved this from 👀 In review to ☑️ Done in 📝 Office team Nov 12, 2025
@mejo-
Copy link
Member

mejo- commented Nov 12, 2025

/backport to stable32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ☑️ Done

Development

Successfully merging this pull request may close these issues.

Backgroundjob cleanup is incomplete Database leftovers

4 participants