Long running SnapStart Lambda runs out of ephemeral storage

### Describe the bug

AWS SnapStart has a fixed [ephemeral storage size in `/tmp` of 512 Mb](https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html#snapstart-runtimes).  We see long running Lambda's using the `aws-crt-java` library running out of ephemeral storage space:

```
Unable to unpack AWS CRT lib: java.io.IOException: No space left on device
java.io.IOException: No space left on device
at java.base/java.io.FileOutputStream.writeBytes(Native Method)
at java.base/java.io.FileOutputStream.write(Unknown Source)
at software.amazon.awssdk.crt.internal.ExtractLib.extractLibrary(ExtractLib.java:63)
at software.amazon.awssdk.crt.CRT.extractAndLoadLibrary(CRT.java:310)
at software.amazon.awssdk.crt.CRT.loadLibraryFromJar(CRT.java:330)
at software.amazon.awssdk.crt.CRT.<clinit>(CRT.java:50)
at software.amazon.awssdk.crt.CrtResource.<clinit>(CrtResource.java:104)
at software.amazon.awssdk.http.crt.AwsCrtHttpClientBase.<init>(AwsCrtHttpClientBase.java:77)
```

The reason for that is that SnapStart does not adhere the `deleteOnExit()` call necessary to clean up the shared objects which the `aws-crt-java` library extracts in the ephemeral storage.

Every time the AWS SnapStart re-initializes the lambda a copy of the shared library will leak chipping away 2 megabyte of the ephemeral storage.

### Regression Issue

- [ ] Select this option if this issue appears to be a regression.

### Expected Behavior

Old version of the library not leaking when AWS SnapStart re-initializes the function.

### Current Behavior

A typical example run showing `INIT_REPORT` or timeouts gradually filling the ephemeral storage:

```
Step 1: A typical request in snapstart takes 12 ms:

01:25:01.151 | START RequestId: 0axxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx9e Version: 61
01:25:01.163 | END RequestId: 0axxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx9e
01:25:01.163 | REPORT RequestId: 0axxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx9e Duration: 12.18 ms Billed Duration: 13 ms Memory Size: 1024 MB Max Memory Used: 288 MB

Step 2:
After it finishes, nothing happens for 96 seconds
We are then greeted with a trace coming from our Java constructor.  Telling us that the lambda is re-started.  Something normally not seen for a lambda 
Aws Support told us that `INIT_REPORT` points to a software update or runtime optimization step re-initializing/re-snapshotting the snapstart environment.

01:26:53.004 | [main] INFO  c.t.w.c.j.Lambda - Discovered region for bucket xxxxxxxxx: us-east-1

Step 3: The INIT_REPORT step takes 6 seconds, which is the time out period of this lambda

01:26:48.441 | INIT_REPORT Init Duration: 6006.06 ms
01:26:48.441 | START RequestId: 6dxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxd9 Version: 61

Step 4: Giving the lambda 34 ms to wake up and then timing out again  (INIT_REPORT takes btw 2s longer than the billed timeout)

01:26:48.475 | 01:26:48.474Z 6dxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxd9 Task timed out after 6.04 seconds
01:26:48.475 | END RequestId: 6dxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxd9
01:26:48.475 | REPORT RequestId: 6dxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxd9 Duration: 6039.30 ms Billed Duration: 6000 ms Memory Size: 1024 MB Max Memory Used: 150 MB

Step 5: The above failure is repeated 2 times, hitting the final error condition:

01:38:11.064 | REPORT RequestId: 24xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx73 Duration: 6025.82 ms Billed Duration: 6000 ms Memory Size: 1024 MB Max Memory Used: 147 MB
01:38:13.781 | Unable to unpack AWS CRT lib: java.io.IOException: No space left on device
01:38:13.903 | INIT_REPORT Init Duration: 2809.79 ms Phase: invoke Status: error Error Type: Runtime.BadFunctionCode
01:38:13.904 | START RequestId: 69xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx9e Version: 61
01:38:13.904 | Unknown application error occurred Runtime.BadFunctionCode
```

### Reproduction Steps

We identified 2 scenarios where AWS SnapStart might re-initialize a lambda (detectable with the `INIT_REPORT` trace in the lambda logs):
* the lambda execution times out (visible in Step 4 in the traces above)
* AWS SnapStart deciding to restart the lambda on itself for an upgrade scenario or even an optimization scenario (taking a new snapshot) (visible in Step 2 in the traces)

To reproduce the scenario you can implement a lambda using SnapStart having a sporadic time-out emulating this behavior more rapidly. 

### Possible Solution

In our lambda's we already reduced the likelihood of this happening by increasing timeouts as lambda timeouts are source of leak .  But we can still see the ephemeral disk not being cleaned up.

Possible solutions might be:

* Make AWS SnapStart environment adhere `deleteOnExit` for time out operations, updates or any other situation triggering an `INIT_REPORT` 
* Update the `aws-crt-java` library to clean up old libraries, [like it does for windows](https://github.com/awslabs/aws-crt-java/blob/main/src/main/java/software/amazon/awssdk/crt/CRT.java#L308).
* Update the `aws-crt-java` library to reuse an previously unpacked shared object.
* Update the `aws-crt-java` library to delete the shared object on linux/unix directly after it is loaded. It will remove the filename, but for the java process the shared object stays accessible through its file descriptor.



### Additional Information/Context

More detailed traces and logs are also found in AWS support case 174136266000566.

### aws-crt-java version used

0.33.9 (aws sdk v2 2.30.15)

### Java version used

 java21

### Operating System and version

latest lambda arm64 java21 runtime

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long running SnapStart Lambda runs out of ephemeral storage #876

Describe the bug

Regression Issue

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

aws-crt-java version used

Java version used

Operating System and version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Long running SnapStart Lambda runs out of ephemeral storage #876

Description

Describe the bug

Regression Issue

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

aws-crt-java version used

Java version used

Operating System and version

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions