Skip to content

Conversation

@Rylan12
Copy link

@Rylan12 Rylan12 commented Dec 11, 2025

Currently, yugabyted kills child processes with SIGKILL, which does not allow them to perform any cleanup. For tserver, this means not cleaning up shared memory segments that were created. The current approach to cleaning these up is to use the previous tserver UUID from the configuration files in the base directory to determine the previous segment name and remove it. However, if the base directory no longer exists, this cleanup cannot be performed. As a consequence, the shared memory segments build up over time. On my machine, the limit for shared memory identifiers is 32, which means it doesn't take too long to encounter an issue with this.

This PR proposes a way to cleanup these shared segments at termination without replacing the SIGKILL signal. It doesn't replace the previous approach, so anything that slips through here should still be cleaned up on the next startup. However, I think this may be a slightly cleaner way to handle this.

When yugabyted is cleaning up child processes, it now does the following:

  1. Terminate the tserver process using the pid stored in self.processes
    • This means the process will only be killed if it was actually started by yugabyted, which should preserve existing behavior
  2. Wait 0.5 seconds for the process to clean up
  3. Cleanup the shared memory segment using the tserver UUID from self.configs to generate the segment name
  4. Kill the remaining child processes by stopping the entire process group, as it currently done

This should be a relatively minimal change that fails gracefully if any of the required information isn't available, but may help to avoid building up orphaned shared segments in certain cases.

@CLAassistant
Copy link

CLAassistant commented Dec 11, 2025

CLA assistant check
All committers have signed the CLA.

@netlify
Copy link

netlify bot commented Dec 11, 2025

Deploy Preview for infallible-bardeen-164bc9 ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 70dd64f
🔍 Latest deploy log https://app.netlify.com/projects/infallible-bardeen-164bc9/deploys/69616b5ce4b1d000085c6ffa
😎 Deploy Preview https://deploy-preview-29676--infallible-bardeen-164bc9.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@hari90 hari90 requested review from druzac and nchandrappa December 11, 2025 22:53
@hari90
Copy link
Contributor

hari90 commented Dec 11, 2025

@nmalladi can we make yugabyted use SIGTERM instead of SIGKILL now that the DB supports graceful shutdown?
cc @druzac

@Rylan12 Rylan12 force-pushed the yugabyted-cleanup-shared-mem branch from c599bae to 3fa7e40 Compare January 6, 2026 19:15
@Rylan12 Rylan12 changed the title yugabyted: cleanup shared memory segments from tserver on interrupt yugabyted: stop child processes with SIGTERM instead of SIGKILL Jan 6, 2026
@Rylan12
Copy link
Author

Rylan12 commented Jan 6, 2026

I tested simply using SIGTERM instead of SIGKILL, and I can no longer reproduce the leak with the methods above 🎉

To simplify the PR, I've just switched the signal and remove the additional cleanup logic. However, I can add it back if you think having an extra safety measure is worthwhile (I'm not convinced it makes sense to live in yugabyted anymore, given SIGTERM works).

Thanks!

@Rylan12 Rylan12 requested a review from nchandrappa January 6, 2026 19:19
@Rylan12 Rylan12 requested a review from druzac January 9, 2026 20:57
bin/yugabyted Outdated
Comment on lines 3195 to 3198
for p in self.processes.values():
p.kill()
self.script.delete_pidfile()
time.sleep(0.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'd just ask you to bump the wait time, 0.5 is extremely short. Let's do 10 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, a 10 second blind wait would be very annoying. Alright, 2 alternatives:

  1. Keep the blind wait, and change the sleep to 3 seconds
  2. Remove the blind wait - poll process status and only wait as long as any of the signalled processes are still alive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, makes sense. I added a wait_until_stop calls for each process, which looks like it should do the trick? Although this might just check the pidfile presence which isn't really the same as checking whether the process is actually running. Can you advise whether this approach is sufficient?

@Rylan12 Rylan12 requested a review from druzac January 12, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants