-
Notifications
You must be signed in to change notification settings - Fork 701
Description
Severity
P1 - Urgent, but non-breaking
Current Behavior
Deeplake 4.x: S3 connectivity timeout when accessing hub://activeloop datasets
Summary
Deeplake 4.x fails to connect to Activeloop-hosted datasets (e.g., hub://activeloop/ffhq) with S3 timeout errors, while the same network environment works perfectly with deeplake 3.x and direct HTTP/curl requests.
Environment
- Deeplake version: 4.4.4 (fails) vs 3.9.52 (works)
- Python version: 3.13.9
- OS: Linux (Ubuntu-based HPC cluster)
- Installation method: pip via uv
Actual Behavior
With deeplake 4.4.4:
[S3] Failed to get bucket region for URL: snark-hub/protected/activeloop/ffhq/
with error: [S3] Network connection error: snark-hub curlCode: 28, Timeout was reached
deeplake._deeplake.LogNotexistsError: Dataset does not exist at path 'hub://activeloop/ffhq/'
Network Diagnostics
I performed extensive network diagnostics to confirm the network is functioning properly:
DNS Resolution ✅
snark-hub.s3.amazonaws.com → 52.217.173.177 (resolves correctly)
s3.amazonaws.com → 52.216.221.208 (resolves correctly)
HTTP Connectivity ✅
curl -s -o /dev/null -w "%{http_code}" https://snark-hub.s3.amazonaws.com
# Returns: 403 (expected for unauthenticated access)
# Response time: 0.15s connect, 0.52s totalPort Connectivity ✅
s3.amazonaws.com:443 - connected successfully
s3.amazonaws.com:80 - connected successfully
Direct S3 Access ✅
All S3 endpoints are reachable with sub-second response times:
https://snark-hub.s3.amazonaws.com/→ HTTP 403 (0.52s)https://snark-hub.s3.us-east-1.amazonaws.com/→ HTTP 403 (0.65s)https://snark-hub.s3.us-west-2.amazonaws.com/→ HTTP 301 (0.87s)
Root Cause Analysis
The network connectivity is fully functional. The issue appears to be in deeplake 4.x's internal Rust-based S3 client, which uses a different connection mechanism than:
- Python's
requestslibrary (used by deeplake 3.x) - System
curlcommand
The error message shows curlCode: 28 (timeout), but standard curl commands succeed instantly to the same endpoints.
Questions
- Does deeplake 4.x's Rust backend have different S3 connection requirements?
- Are there any environment variables or configuration options that might help?
- Is there a known incompatibility with certain network configurations?
Related: This may be related to the API migration from 3.x to 4.x documented at https://docs.deeplake.ai/latest/details/v3_conversion/
Steps to Reproduce
import deeplake
# This fails with deeplake 4.4.4
ds = deeplake.open_read_only('hub://activeloop/ffhq')Expected/Desired Behavior
The dataset should load successfully, as it does with deeplake 3.x:
import deeplake # version 3.9.52
ds = deeplake.load('hub://activeloop/ffhq', read_only=True)
# SUCCESS: Dataset loads with 70,000 imagesPython Version
3.13.9
OS
Ubuntu
IDE
Cursor
Packages
4.4.4
Additional Context
- The dataset exists and is accessible: https://app.activeloop.ai/activeloop/ffhq/
- The ACTIVELOOP_TOKEN environment variable was set correctly
- This was tested on a university HPC cluster (TAU) with standard internet access
- No proxy is required or configured
Possible Solution
Downgrade to deeplake 3.x:
pip install 'deeplake<4'Then use the 3.x API:
import deeplake
ds = deeplake.load('hub://activeloop/ffhq', read_only=True)
# Works successfullyAre you willing to submit a PR?
- I'm willing to submit a PR (Thank you!)