Skip to content

Commit b87f1ee

Browse files
GAUNSDclaude
andauthored
chore(autorag+automl): improvements - S3 timeout error handling + FileExplorer UX (#7219)
* chore(autorag+automl): S3 timeout error handling improvements Functional changes: - Add 10s connect/TLS timeout to S3 HTTP client via custom transport so unreachable endpoints (air-gapped, misconfigured) fail fast instead of hanging until the OpenShift route 30s gateway timeout. - Set RetryMaxAttempts=1 on the AWS config to avoid compounding the timeout on dead endpoints. - Add isS3ConnectivityError() to classify net.Error timeouts, net.OpError, and net.DNSError as connectivity failures. - Add s3ConnectivityErrorMessage() to return a user-facing 503 message explaining the likely cause and remediation steps. - Wire connectivity error detection into all S3 handlers (GetS3File, PostS3File, GetS3Files, GetS3FileSchema) so they return 503 with the actionable message instead of a generic 500. Test coverage: - isS3ConnectivityError: 9-case table-driven test covering timeout, OpError, DNSError, wrapped variants, nil, and non-connectivity errors (access denied). - s3ConnectivityErrorMessage: asserts bucket name and key phrases appear in the message. - Handler-level 503 tests: GetS3File, GetS3Files, PostS3File (on key resolution), and PostS3File (on upload) all return 503 with the connectivity error message when the S3 endpoint is unreachable. - s3ConnectTimeout constant: asserts the value is 10s. - NewRealS3Client: verifies client construction succeeds with the new timeout-aware HTTP transport. All changes are mirrored across both automl and autorag BFFs. Generated-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> * chore: Add UI handling + tests Generated-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> * chore: Add UI fix for view details (minor issue) Generated-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> * chore: PR feedback automl: Add TestGetS3FileSchemaHandler_ConnectivityError_Returns503 to exercise the 503 branch when GetCSVSchema hits an unreachable S3 endpoint. Added GetCSVSchema override to connectivityErrorS3Client. automl+autorag: Extract buildS3AWSConfig() from NewRealS3Client so RetryMaxAttempts can be directly asserted in tests. The AWS SDK does not expose RetryMaxAttempts on the constructed *s3.Client, so the previous test could only verify client creation — not the config value. The extracted function returns the raw aws.Config, enabling a real assertion (cfg.RetryMaxAttempts == 1) without DNS or network dependencies. Rename TestNewRealS3Client_SetsRetryMaxAttemptsToOne to TestNewRealS3Client_CreatesClientWithValidCredentials and switch endpoint to a literal IP to remove DNS dependency. Generated-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> * chore: PR feedback Add net.ErrClosed sentinel check to isS3ConnectivityError() in both automl and autorag BFFs. A closed keep-alive socket can surface as net.ErrClosed without a *net.OpError wrapper, causing it to fall through to a 500 instead of the intended 503. The new check catches both direct and wrapped net.ErrClosed errors. Includes unit tests for both direct and wrapped net.ErrClosed cases in both packages. Generated-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> * chore: PR feedback Add per-call context deadlines (s3MetadataTimeout = 15s) on read-only S3 handler operations to bound the response-header phase. net/http's WriteTimeout sets a conn deadline but does NOT cancel r.Context(), so if an S3 endpoint accepts the TCP connection but never sends response headers, metadata calls could hang indefinitely. File-transfer handlers (GetObject, UploadObject) are intentionally excluded because legitimate large payloads can exceed any static timeout. Handlers protected: - autorag: GetS3FilesHandler (ListObjects) - automl: GetS3FilesHandler (ListObjects), GetS3FileSchemaHandler (GetCSVSchema) Tests added: - Positive: verify ListObjects/GetCSVSchema receive a deadline-aware context - Negative: verify GetObject does NOT get a handler-imposed deadline Generated-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> * chore: Self PR review Generated-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> * chore: PR feedback - isS3ConnectivityError: add context.DeadlineExceeded as first check so metadata timeouts produce 503; narrow net.OpError to dial-only - PostS3FileHandler: wrap resolveNonCollidingS3Key with s3MetadataTimeout - client.go: remove dead HTTPClient from buildS3AWSConfig, consolidate s3ConnectTimeout dial/TLS settings into NewRealS3Client transport - client_test.go: replace constant self-assertion test with transport inspection (TestNewRealS3Client_TransportHasConnectTimeout) - s3_handler_test.go: add DeadlineExceeded and OpError{write} test cases, update upload mock from Op:"write" to Op:"dial", add TestPostS3FileHandler_SetsMetadataTimeoutForResolveKey - autorag s3_handler.go: reorder declarations (constants/types first) - FileExplorer.spec.tsx: scope dropdown assertions with within() and role-based queries instead of global screen.queryByText All changes applied symmetrically to both automl and autorag packages. Generated-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * chore: PR feedback Generated-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> * chore: PR feedback Generated-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> * chore: make fmt * chore: PR feedback - Hide "View details" kebab action when file is already being viewed - Add "Select file"/"Select folder" kebab action for unselected items - Delegate onRemoveSelection to parent handler to clear filesToView - Add table row kebab actions test suite (7 tests) - Fix eye icon test that relied on redundant "View details" click Generated-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 1c9bfb5 commit b87f1ee

16 files changed

Lines changed: 1595 additions & 88 deletions

File tree

packages/automl/bff/internal/api/s3_handler.go

Lines changed: 80 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,13 @@ import (
77
"io"
88
"mime"
99
"mime/multipart"
10+
"net"
1011
"net/http"
1112
"path"
1213
"regexp"
1314
"strconv"
1415
"strings"
16+
"time"
1517

1618
"github.com/aws/aws-sdk-go-v2/service/s3/types"
1719
"github.com/julienschmidt/httprouter"
@@ -24,6 +26,15 @@ import (
2426
apierrors "k8s.io/apimachinery/pkg/api/errors"
2527
)
2628

29+
// s3MetadataTimeout is the deadline for read-only S3 metadata operations
30+
// (ListObjects, HeadObject, GetCSVSchema, etc.) that should complete quickly.
31+
// This bounds the response-header phase: if the endpoint accepts the TCP
32+
// connection but never sends response headers, r.Context() alone won't cancel
33+
// the call because net/http's WriteTimeout sets a conn deadline, not a context
34+
// cancellation. File transfers (GetObject, UploadObject) are excluded because
35+
// legitimate large payloads can exceed any static timeout.
36+
const s3MetadataTimeout = 15 * time.Second
37+
2738
// resolvedS3 holds a ready-to-use S3 client and the resolved bucket name.
2839
type resolvedS3 struct {
2940
client s3int.S3ClientInterface
@@ -170,6 +181,42 @@ func (app *App) resolveS3Client(w http.ResponseWriter, r *http.Request, secretNa
170181
return &resolvedS3{client: s3Client, bucket: bucket}, true
171182
}
172183

184+
// isS3ConnectivityError checks whether an error is caused by a network-level
185+
// failure to reach the S3 endpoint (e.g. connection timeout, DNS failure,
186+
// connection refused, or a metadata timeout deadline). This lets handlers
187+
// return an actionable 503 instead of a generic 500 when the endpoint is
188+
// unreachable — common in air-gapped or misconfigured environments.
189+
func isS3ConnectivityError(err error) bool {
190+
if errors.Is(err, context.DeadlineExceeded) {
191+
return true
192+
}
193+
if errors.Is(err, net.ErrClosed) {
194+
return true
195+
}
196+
var netErr net.Error
197+
if errors.As(err, &netErr) && netErr.Timeout() {
198+
return true
199+
}
200+
var opErr *net.OpError
201+
if errors.As(err, &opErr) && opErr.Op == "dial" {
202+
return true
203+
}
204+
var dnsErr *net.DNSError
205+
return errors.As(err, &dnsErr)
206+
}
207+
208+
// s3ConnectivityErrorMessage returns a user-facing message for S3 connectivity failures.
209+
func s3ConnectivityErrorMessage(bucket string) string {
210+
return fmt.Sprintf(
211+
"Unable to connect to the S3 storage endpoint for bucket '%s'. "+
212+
"The endpoint may be unreachable from this cluster. "+
213+
"If this is a disconnected or air-gapped environment, "+
214+
"verify the S3 endpoint URL in the data connection secret "+
215+
"points to a storage service accessible within the cluster network.",
216+
bucket,
217+
)
218+
}
219+
173220
// GetS3FileHandler retrieves a file from S3 storage.
174221
// Query parameters:
175222
// - secretName (optional): Kubernetes secret with S3 credentials.
@@ -218,6 +265,11 @@ func (app *App) GetS3FileHandler(w http.ResponseWriter, r *http.Request, _ httpr
218265
return
219266
}
220267

268+
if isS3ConnectivityError(err) {
269+
app.serviceUnavailableResponseWithMessage(w, r, err, s3ConnectivityErrorMessage(s3.bucket))
270+
return
271+
}
272+
221273
app.serverErrorResponse(w, r, err)
222274
return
223275
}
@@ -279,16 +331,21 @@ func (app *App) PostS3FileHandler(w http.ResponseWriter, r *http.Request, _ http
279331
return
280332
}
281333

282-
ctx := r.Context()
283334
bucket := s3.bucket
284-
resolvedKey, err := resolveNonCollidingS3Key(ctx, s3.client, bucket, key, app.effectivePostS3CollisionAttempts())
335+
metadataCtx, metadataCancel := context.WithTimeout(r.Context(), s3MetadataTimeout)
336+
defer metadataCancel()
337+
resolvedKey, err := resolveNonCollidingS3Key(metadataCtx, s3.client, bucket, key, app.effectivePostS3CollisionAttempts())
285338
if err != nil {
286339
if errors.Is(err, ErrMaxCollisionsExceeded) {
287340
app.conflictResponse(w, r,
288341
fmt.Sprintf("unable to find unique filename after %d attempts; try a different base name",
289342
app.effectivePostS3CollisionAttempts()))
290343
return
291344
}
345+
if isS3ConnectivityError(err) {
346+
app.serviceUnavailableResponseWithMessage(w, r, err, s3ConnectivityErrorMessage(bucket))
347+
return
348+
}
292349
app.serverErrorResponse(w, r, fmt.Errorf("error resolving S3 key for upload: %w", err))
293350
return
294351
}
@@ -356,7 +413,7 @@ func (app *App) PostS3FileHandler(w http.ResponseWriter, r *http.Request, _ http
356413
}
357414
limitedFile := http.MaxBytesReader(nil, filePart, maxFilePartBytes)
358415
defer limitedFile.Close()
359-
if err := s3.client.UploadObject(ctx, bucket, resolvedKey, limitedFile, contentType); err != nil {
416+
if err := s3.client.UploadObject(r.Context(), bucket, resolvedKey, limitedFile, contentType); err != nil {
360417
var maxBytesErr *http.MaxBytesError
361418
if errors.As(err, &maxBytesErr) {
362419
app.payloadTooLargeResponse(w, r, s3FilePartTooLargeMsg)
@@ -371,6 +428,10 @@ func (app *App) PostS3FileHandler(w http.ResponseWriter, r *http.Request, _ http
371428
app.forbiddenResponse(w, r, fmt.Sprintf("access denied uploading to S3 '%s/%s'", bucket, resolvedKey))
372429
return
373430
}
431+
if isS3ConnectivityError(err) {
432+
app.serviceUnavailableResponseWithMessage(w, r, err, s3ConnectivityErrorMessage(bucket))
433+
return
434+
}
374435
app.serverErrorResponse(w, r, fmt.Errorf("error uploading file to S3: %w", err))
375436
return
376437
}
@@ -560,7 +621,9 @@ func (app *App) GetS3FileSchemaHandler(w http.ResponseWriter, r *http.Request, _
560621
return
561622
}
562623

563-
ctx := r.Context()
624+
ctx, cancel := context.WithTimeout(r.Context(), s3MetadataTimeout)
625+
defer cancel()
626+
564627
schemaResult, err := s3.client.GetCSVSchema(ctx, s3.bucket, key)
565628
if err != nil {
566629
var noSuchKey *types.NoSuchKey
@@ -595,6 +658,11 @@ func (app *App) GetS3FileSchemaHandler(w http.ResponseWriter, r *http.Request, _
595658
return
596659
}
597660

661+
if isS3ConnectivityError(err) {
662+
app.serviceUnavailableResponseWithMessage(w, r, err, s3ConnectivityErrorMessage(s3.bucket))
663+
return
664+
}
665+
598666
app.serverErrorResponse(w, r, err)
599667
return
600668
}
@@ -637,7 +705,9 @@ func (app *App) GetS3FilesHandler(w http.ResponseWriter, r *http.Request, _ http
637705
return
638706
}
639707

640-
ctx := r.Context()
708+
ctx, cancel := context.WithTimeout(r.Context(), s3MetadataTimeout)
709+
defer cancel()
710+
641711
result, err := s3.client.ListObjects(ctx, s3.bucket, s3int.ListObjectsOptions{
642712
Path: params.path,
643713
Search: params.search,
@@ -657,6 +727,11 @@ func (app *App) GetS3FilesHandler(w http.ResponseWriter, r *http.Request, _ http
657727
return
658728
}
659729

730+
if isS3ConnectivityError(err) {
731+
app.serviceUnavailableResponseWithMessage(w, r, err, s3ConnectivityErrorMessage(s3.bucket))
732+
return
733+
}
734+
660735
app.serverErrorResponse(w, r, err)
661736
return
662737
}

0 commit comments

Comments
 (0)