Skip to content

Conversation

@a-nogikh
Copy link
Collaborator

Enable external abortion of the instance creation process. This is
especially useful for the qemu case where we retry the creation/boot up
to 1000 times, which can take significant time (e.g. it timeouts
syz-cluster pods on unstable kernels).

The context can be further propagated to WaitForSSH, but that requires
another quite significant vm/ refactoring.


In almost all cases these mean some boot time crash.
It also doesn't make much sense to continue string matching since the
boot output may contain the matched strings in benign contexts.


This sequence also asks for another change - we abuse osutil.VerboseError in too many different contexts and in a way that contradicts current Go error wrapping APIs. In this particular case it prevents an adequate ssh timeout error detection (which should have been something like errors.Is(err, vmimpl.ErrSSHTimeout)). I'll see how tricky it will be to replace VerboseErrors Title by some nested Error.

@a-nogikh a-nogikh requested a review from dvyukov September 26, 2025 13:13
@a-nogikh a-nogikh requested a review from avagin as a code owner September 26, 2025 13:13
@a-nogikh a-nogikh removed the request for review from avagin September 26, 2025 13:13
@a-nogikh
Copy link
Collaborator Author

Added two VerboseError refactoring commits.

Enable external abortion of the instance creation process. This is
especially useful for the qemu case where we retry the creation/boot up
to 1000 times, which can take significant time (e.g. it timeouts
syz-cluster pods on unstable kernels).

The context can be further propagated to WaitForSSH, but that requires
another quite significant vm/ refactoring.
In almost all cases these mean some boot time crash.
It also doesn't make much sense to continue string matching since the
boot output may contain the matched strings in benign contexts.
After this change it fits more naturally into the Go's error
functionality.
This is a much cleaner logic than string matching.
@a-nogikh a-nogikh force-pushed the features/vm-qemu-context branch from d85ea03 to 0ff7b35 Compare September 30, 2025 08:25
@a-nogikh
Copy link
Collaborator Author

Fixed the failing test.

Copy link
Collaborator

@tarasmadan tarasmadan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also doesn't make much sense to continue string matching since the boot output may contain the matched strings in benign contexts.

I didn't understand this part. Could you please explain the details here?

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Oct 1, 2025

The original problem was that, for can't ssh into the instance errors, Error() used to also include the boot output. And the boot output legitimately had several Device or resource busy lines, so it was treated as some temporary qemu problem and we reiterated. While it wasn't.

@a-nogikh a-nogikh added this pull request to the merge queue Oct 1, 2025
Merged via the queue into google:master with commit 1bd1bc7 Oct 1, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants