Skip to content

OOM when using S3TransferManager.downloadFile() #5744

Open
@fivetran-aakashtiwari

Description

Describe the bug

When we are downloading multiple files around 50 files of size around 500 Mb concurrently we're running into OOM issue and even when we reduced the MaxConcurrency of S3Client to as low as 5 we're still facing the issue.
Heap dump :
Screenshot 2024-12-06 at 12 45 17 PM

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

S3TransferManager can work fine no matter how many files or how big the file is.

Current Behavior

Getting OOM for some of the threads due to which some files are not fully downloaded:

Exception in thread "AwsEventLoop 3" Exception in thread "AwsEventLoop 1" Exception in thread "AwsEventLoop 7" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
09:10:49.954 [AwsEventLoop 5] WARN software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - Transfer failed.
software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: OutOfMemoryError has been raised from JVM.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.handleError(S3CrtResponseHandlerAdapter.java:165)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.onFinished(S3CrtResponseHandlerAdapter.java:129)
	at software.amazon.awssdk.crt.s3.S3MetaRequestResponseHandlerNativeAdapter.onFinished(S3MetaRequestResponseHandlerNativeAdapter.java:25)
Exception in thread "AwsEventLoop 1" java.lang.OutOfMemoryError: Java heap space
09:10:50.038 [Thread-2] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.218 [Thread-31] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:50.298 [Thread-14] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.348 [Thread-34] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |=                   | 5.0%
09:10:50.368 [Thread-30] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.388 [Thread-17] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====================| 100.0%
09:10:50.408 [Thread-36] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.408 [Thread-28] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.478 [Thread-23] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.498 [Thread-19] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.498 [Thread-39] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.518 [Thread-20] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====================| 100.0%
09:10:50.528 [Thread-35] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:50.625 [sdk-async-response-0-0] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - Transfer complete!
09:10:50.702 [AwsEventLoop 5] WARN software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - Transfer failed.
software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: OutOfMemoryError has been raised from JVM.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.handleError(S3CrtResponseHandlerAdapter.java:165)
	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.onFinished(S3CrtResponseHandlerAdapter.java:129)
	at software.amazon.awssdk.crt.s3.S3MetaRequestResponseHandlerNativeAdapter.onFinished(S3MetaRequestResponseHandlerNativeAdapter.java:25)
09:10:51.398 [Thread-9] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.478 [Thread-34] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.508 [Thread-25] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |=================== | 95.0%
09:10:51.518 [Thread-5] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |===                 | 15.0%
09:10:51.548 [Thread-30] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.548 [Thread-36] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |=================   | 85.0%
09:10:51.568 [Thread-13] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.568 [Thread-8] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |=====               | 25.0%
09:10:51.598 [Thread-23] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.628 [Thread-19] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%
09:10:51.648 [Thread-21] INFO software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener - |====                | 20.0%

I Have also attached the full stack trace
StackTrace.txt

Reproduction Steps

//setting up client     
this.s3AsyncClient =
                S3AsyncClient.crtBuilder()
                        .credentialsProvider(StaticCredentialsProvider.create(getCreds()))
                        .region(Region.of("us-east-1"))
                        .build();
   this.s3TransferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build();

//downloading multiple files concurrently 
   protected void downloadWithRetry(String s3Uri, File downloadDir, AtomicLong syncCallDuration, int numFiles) {
        List<CompletableFuture<CompletedFileDownload>> futures = new ArrayList<>();
        String s3Object = extractObjectLocation(s3Uri);
        System.out.println("Downloading " + s3Object + " s3 uri" + s3Uri);
        String[] tokens = s3Object.split("/");
        String fileName = tokens[tokens.length - 1];
        for (int i = 0; i < numFiles; i++) {
            File file = new File(downloadDir, String.valueOf(i) + fileName);
            try (FileOutputStream fos = new FileOutputStream(file)) {
                DownloadFileRequest downloadFileRequest =
                        DownloadFileRequest.builder()
                                .getObjectRequest(b -> b.bucket(s3BucketName).key(s3Object))
                                .addTransferListener(LoggingTransferListener.create())
                                .destination(file)
                                .build();
                var completableFuture = s3TransferManager.downloadFile(downloadFileRequest).completionFuture();

                // force syncing file
                long start = currentTimeMillis();
                fos.getFD().sync();
                syncCallDuration.addAndGet(currentTimeMillis() - start);
                futures.add(completableFuture);
            } catch (IOException e) {
                System.out.println("SEVERE : Failed in download and retry: error : " + e.getMessage());
                throw new UncheckedIOException(e);
            } catch (Exception e) {
                System.out.println("Identified exception during download of file" + e.getMessage());
                throw e;
            }
        }
        AtomicInteger num = new AtomicInteger();
        futures.forEach(
                future -> {
                    try {
                        future.join();
                    } catch (CompletionException e) {
                        System.out.println("CompletionException for : " + num.get() + e.getMessage());
                    } catch (Exception e) {
                        System.out.println("Exception in future.join(): " + num.get() + e.getMessage());
                    }
                    num.getAndIncrement();
                });
    }

i have also attached the dummy file link on which we reproduce issue.

Possible Solution

No response

Additional Information/Context

Initially we tried without any configuration but we got the issue so then i tried to setup the MaxConcurrency for the client and even when i reduced the value as slow as 5 we still got the OOM issue. It works fine when we reduced the concurrency to 1 or 2 but in that case i can see performance drop a lot. This is how we setup the MaxConcurrency:

      this.s3AsyncClient =
               S3AsyncClient.crtBuilder()
                       .maxConcurrency(5)
                       .credentialsProvider(StaticCredentialsProvider.create(getCreds()))
                       .region(Region.of("us-east-1"))
                       .build();

Then we tried to reduce the minimumPartSizeInBytes from default 8 MB to 1 MB in that case i didn't receive any OOM issue but this reduces the performance. This is how we setup minimumPartSizeInBytes:

 this.s3AsyncClient =
                S3AsyncClient.crtBuilder()
                        .minimumPartSizeInBytes(1*1024*1024l)
                        .credentialsProvider(StaticCredentialsProvider.create(getCreds()))
                        .region(Region.of("us-east-1"))
                        .build();

If you see the heapdump snapshot most of the memory is consumed by the Byte[] class, accounting for approximately 99%. We are suspecting the issue arises because during the multipart download of a large file, the file is split into multiple smaller parts, all of which are held in memory, leading to excessive memory usage and that’s why when we reducing the minimum part size from 8MB to 1MB we’re no more getting this issue.

AWS Java SDK version used

2.25.64

JDK version used

openjdk version "17.0.13" 2024-10-15 LTS OpenJDK Runtime Environment Corretto-17.0.13.11.1 (build 17.0.13+11-LTS) OpenJDK 64-Bit Server VM Corretto-17.0.13.11.1 (build 17.0.13+11-LTS, mixed mode, sharing)

Operating System and version

Operating System: Amazon Linux, EC2 instance: c7gd.2xlarge

Metadata

Assignees

Labels

bugThis issue is a bug.p2This is a standard priority issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions