- 
                Notifications
    
You must be signed in to change notification settings  - Fork 328
 
feat: support customized retries error message of S3 request #5447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds configurable custom retry messages for S3 requests to handle transient errors that aren't covered by AWS SDK's default retry classifiers. The implementation adds a custom_retry_msgs field (defaulting to ["UnexpectedEof", "Timeout"]) and a custom RetryCustomRetrier that checks error messages against configured keywords.
Major Issues Found:
- Critical: Two unintended default value changes in 
src/common/io-config/src/s3.rs:num_triesreduced from 25 to 2, andmultipart_max_concurrencyreduced from 100 to 8. These will significantly impact retry behavior and upload performance. - Type mismatch: In 
src/daft-sql/src/modules/config.rs,multipart_max_concurrencyis cast from i64 to u64 but the field expects u32. 
Implementation Details:
- The retry classifier uses substring matching on the Debug representation of errors (
format!("{err:?}")) - Proto definitions, Python bindings, and SQL config support are all properly updated
 - Test expectations updated to reflect new config display output
 
Confidence Score: 1/5
- This PR has critical issues that must be fixed before merging - unintended default value changes will significantly impact production behavior
 - Score reflects two critical unintended changes to default retry and concurrency settings that will degrade system reliability and performance, plus a type mismatch bug in SQL config parsing
 - Pay immediate attention to 
src/common/io-config/src/s3.rs(default value changes) andsrc/daft-sql/src/modules/config.rs(type mismatch) 
Important Files Changed
File Analysis
| Filename | Score | Overview | 
|---|---|---|
| src/common/io-config/src/s3.rs | 1/5 | Adds custom_retry_msgs field but unintentionally changes critical defaults: num_tries 25→2, multipart_max_concurrency 100→8 | 
| src/daft-io/src/s3_like.rs | 4/5 | Implements custom retry classifier that checks error messages against configured keywords, logic looks correct | 
| src/daft-sql/src/modules/config.rs | 2/5 | Adds SQL config support for new fields, but has type mismatch: casts i64 to u64 for multipart_max_concurrency which is u32 | 
Sequence Diagram
sequenceDiagram
    participant User
    participant S3Config
    participant S3Client
    participant RetryCustomRetrier
    participant AWS_S3
    User->>S3Config: Configure custom_retry_msgs=["UnexpectedEof", "Timeout"]
    User->>S3Client: Initiate S3 request (read/write)
    S3Client->>AWS_S3: Send HTTP request
    AWS_S3-->>S3Client: Error response (e.g., "UnexpectedEof")
    S3Client->>RetryCustomRetrier: classify_retry(error)
    RetryCustomRetrier->>RetryCustomRetrier: Check if error contains custom retry msg
    RetryCustomRetrier-->>S3Client: RetryAction::server_error()
    S3Client->>S3Client: Apply backoff and retry logic
    S3Client->>AWS_S3: Retry request
    AWS_S3-->>S3Client: Success response
    S3Client-->>User: Return result
    9 files reviewed, 3 comments
          Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@            Coverage Diff             @@
##             main    #5447      +/-   ##
==========================================
- Coverage   71.56%   71.54%   -0.02%     
==========================================
  Files         996      996              
  Lines      126628   126673      +45     
==========================================
+ Hits        90622    90631       +9     
- Misses      36006    36042      +36     
 🚀 New features to boost your workflow:
  | 
    
Changes Made
support configure some customized keywords of s3 error message, if the configured keywords matched the actual s3 error message, which will retry the s3 request based on S3 ClassifyRetry.
Notes: we retried
UnexpectedEoferror last time, and we meet a new timeout error mentioned in #5043 during UploadPart, I think we might not list all potential transient errors, so it's better to support configured these error messages, so that we don't need modify code once meet another error.Related Issues
#5043
Checklist
docs/mkdocs.ymlnavigation