Skip to content

Conversation

@stayrascal
Copy link
Contributor

Changes Made

support configure some customized keywords of s3 error message, if the configured keywords matched the actual s3 error message, which will retry the s3 request based on S3 ClassifyRetry.

Notes: we retried UnexpectedEof error last time, and we meet a new timeout error mentioned in #5043 during UploadPart, I think we might not list all potential transient errors, so it's better to support configured these error messages, so that we don't need modify code once meet another error.

Related Issues

#5043

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly

@github-actions github-actions bot added the feat label Oct 25, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds configurable custom retry messages for S3 requests to handle transient errors that aren't covered by AWS SDK's default retry classifiers. The implementation adds a custom_retry_msgs field (defaulting to ["UnexpectedEof", "Timeout"]) and a custom RetryCustomRetrier that checks error messages against configured keywords.

Major Issues Found:

  • Critical: Two unintended default value changes in src/common/io-config/src/s3.rs: num_tries reduced from 25 to 2, and multipart_max_concurrency reduced from 100 to 8. These will significantly impact retry behavior and upload performance.
  • Type mismatch: In src/daft-sql/src/modules/config.rs, multipart_max_concurrency is cast from i64 to u64 but the field expects u32.

Implementation Details:

  • The retry classifier uses substring matching on the Debug representation of errors (format!("{err:?}"))
  • Proto definitions, Python bindings, and SQL config support are all properly updated
  • Test expectations updated to reflect new config display output

Confidence Score: 1/5

  • This PR has critical issues that must be fixed before merging - unintended default value changes will significantly impact production behavior
  • Score reflects two critical unintended changes to default retry and concurrency settings that will degrade system reliability and performance, plus a type mismatch bug in SQL config parsing
  • Pay immediate attention to src/common/io-config/src/s3.rs (default value changes) and src/daft-sql/src/modules/config.rs (type mismatch)

Important Files Changed

File Analysis

Filename Score Overview
src/common/io-config/src/s3.rs 1/5 Adds custom_retry_msgs field but unintentionally changes critical defaults: num_tries 25→2, multipart_max_concurrency 100→8
src/daft-io/src/s3_like.rs 4/5 Implements custom retry classifier that checks error messages against configured keywords, logic looks correct
src/daft-sql/src/modules/config.rs 2/5 Adds SQL config support for new fields, but has type mismatch: casts i64 to u64 for multipart_max_concurrency which is u32

Sequence Diagram

sequenceDiagram
    participant User
    participant S3Config
    participant S3Client
    participant RetryCustomRetrier
    participant AWS_S3

    User->>S3Config: Configure custom_retry_msgs=["UnexpectedEof", "Timeout"]
    User->>S3Client: Initiate S3 request (read/write)
    S3Client->>AWS_S3: Send HTTP request
    AWS_S3-->>S3Client: Error response (e.g., "UnexpectedEof")
    S3Client->>RetryCustomRetrier: classify_retry(error)
    RetryCustomRetrier->>RetryCustomRetrier: Check if error contains custom retry msg
    RetryCustomRetrier-->>S3Client: RetryAction::server_error()
    S3Client->>S3Client: Apply backoff and retry logic
    S3Client->>AWS_S3: Retry request
    AWS_S3-->>S3Client: Success response
    S3Client-->>User: Return result
Loading

9 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@codecov
Copy link

codecov bot commented Oct 25, 2025

Codecov Report

❌ Patch coverage is 64.78873% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.54%. Comparing base (b7f90c0) to head (f0f8843).

Files with missing lines Patch % Lines
src/daft-sql/src/modules/config.rs 16.66% 15 Missing ⚠️
src/daft-io/src/s3_like.rs 70.83% 7 Missing ⚠️
src/common/io-config/src/python.rs 62.50% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5447      +/-   ##
==========================================
- Coverage   71.56%   71.54%   -0.02%     
==========================================
  Files         996      996              
  Lines      126628   126673      +45     
==========================================
+ Hits        90622    90631       +9     
- Misses      36006    36042      +36     
Files with missing lines Coverage Δ
src/common/io-config/src/s3.rs 81.25% <100.00%> (+1.12%) ⬆️
src/common/io-config/src/python.rs 50.35% <62.50%> (+0.13%) ⬆️
src/daft-io/src/s3_like.rs 64.73% <70.83%> (+0.09%) ⬆️
src/daft-sql/src/modules/config.rs 17.59% <16.66%> (-0.05%) ⬇️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kevinzwang kevinzwang self-requested a review October 28, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant