Skip to content

fix issue75: seg fault and listen volcano.sock fail #76

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

linuxfhy
Copy link

No description provided.

@volcano-sh-bot
Copy link
Collaborator

Welcome @linuxfhy!

It looks like this is your first PR to volcano-sh/devices.

Thank you, and welcome to Volcano. 😃

@volcano-sh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign william-wang
You can assign the PR to them by writing /assign @william-wang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@linuxfhy
Copy link
Author

/assign @william-wang

@linuxfhy
Copy link
Author

related issue: #75

@Monokaix Monokaix requested a review from Copilot March 14, 2025 09:42
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request addresses issue75 by improving error handling around the socket listen setup and adjusting the client usage in the allocation process.

  • In the Serve function, the code now attempts to remove the existing socket file and retry listening when net.Listen fails.
  • In the Allocate function, the invocation of util.UseClient has been relocated to ensure proper initialization before proceeding with candidate pod evaluation.
Comments suppressed due to low confidence (1)

pkg/plugin/nvidia/server.go:192

  • Consider verifying that the error is due to an existing socket (e.g., EADDRINUSE) before removing the file to avoid inadvertently deleting an unrelated file.
log.Printf("Listen sock fail and retry for '%s': %s", m.resourceName, err)

Comment on lines +357 to 358
util.UseClient(m.kubeInteractor.clientset)

Copy link
Preview

Copilot AI Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the relocation of util.UseClient here is intentional and that omitting its call at the original location does not lead to any side effects or missed initialization.

Suggested change
util.UseClient(m.kubeInteractor.clientset)

Copilot uses AI. Check for mistakes.

@linuxfhy linuxfhy closed this Mar 14, 2025
@linuxfhy linuxfhy deleted the dev_20250311_fix_segfault branch March 14, 2025 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants