-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Describe the bug.
Issue
When file streaming (e.g. /spec, /events, /logs) fail due to legitimate issues (e.g. 4XX, 5XX), we currently do not catch early and return to users.
Instead, we attempt to stream the file anyways only for the stream to break (due to the aforementioned issues).
Proposal
Rather than attempting to stream immediately, we should catch any OSMODataStorageClientError at: https://github.com/NVIDIA/OSMO/blob/main/src/service/core/workflow/helpers.py#L354-L363
Before we stream anything, we always make a HEAD call first to ensure the file is accessible, for example:
https://github.com/NVIDIA/OSMO/blob/main/src/lib/data/storage/backends/s3.py#L727
https://github.com/NVIDIA/OSMO/blob/main/src/lib/data/storage/backends/azure.py#L442
This gives us the opportunity to "pre-validate" and surface any real and meaning errors back to the client.
Minimum reproducible example
1. Have invalid S3 credentials (no read scope, expired, etc)
2. Attempt to fetch logs/events/specRelevant log output
Error message: Failed to fetch logs: Response ended prematurely
Error code: 1Environment
No response
Code of Conduct
- I agree to follow NVIDIA OSMO's Code of Conduct
- I have searched the open bugs and have found no duplicates for this bug report