Graceful otel shutdown on build failure#320
Conversation
Co-authored-by: Ona <no-reply@ona.com>
| }, | ||
| } | ||
|
|
||
| func build(cmd *cobra.Command, args []string) error { |
There was a problem hiding this comment.
build() is the same logic as what was in the Run function above, except that log.Fatals have been replaced by returning an error.
RunE above still logs any errors with log.Fatal, so the leeway log output remains the same
|
I related to CORE-6452 (moving the related issue to our new project). |
| // Ensure cache directory exists with proper permissions | ||
| if err := os.MkdirAll(buildDir, 0755); err != nil { | ||
| log.WithError(err).Fatal("failed to create build directory") | ||
| return nil, xerrors.Errorf("failed to create build directory: %w", err) |
There was a problem hiding this comment.
There was a problem hiding this comment.
It would still result in a log.Fatal indeed when an error gets returned as part of the run command. There's a ton of log.Fatal left throughout the codebase, it would require a larger cleanup to change all
| } | ||
| if serve != "" { | ||
| go serveBuildResult(ctx, serve, localCache, pkg) | ||
| return err |
There was a problem hiding this comment.
Nit, not blocking:
It looks like we could invoke cancel() before returning the error, to be tidy, not a functional problem.
| err = saveBuildResult(context.Background(), save, localCache, pkg) | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| return err |
There was a problem hiding this comment.
Nit, not blocking:
We could wrap the error with with return fmt.Errorf("...: %w", err) to help future us catch where this happened.
The same is true with above errors. I wonder if maybe I am missing a convention?
There was a problem hiding this comment.
didn't want to change the output that leeway produces, assumption was that these errors already include enough details and wrapping them would make them more verbose
Description
When a build fails, leeway was using
log.Fatalto shutdown, but this doesn't run any defer-ed cleanup functions preventing otel from pushing all spans. Results in incomplete tracesThe pr changes the build command to return errors instead of using
log.Fatal, and then have one top-levellog.Fatal(after otel shutdown has run).Related Issue(s)
Relates to CORE-6452
How to test
Complete traces in honeycomb now:

https://ui.honeycomb.io/gitpod/environments/ci/datasets/leeway/result/e9NjDmof1Yh/trace/pct7KaVe6Z4?fields%5B%5D=s_name&fields%5B%5D=s_serviceName&span=96f4a3ca1c7caed8
Documentation
/hold