Support Closing Sessions to Free Resources and Fix Issue #140" #141

cdhermann · 2025-01-05T10:53:47Z

Allows for closing certain session and therefore freeing the associated resources
Fixes A lot of temp files are created but not deleted #140

tjake · 2025-02-22T17:02:38Z

jlama-core/src/main/java/com/github/tjake/jlama/tensor/KvBufferCache.java

+                        model.getConfig().workingDirectory().get().toString(),
+                        pageCtx.session.toString() + "-" + pageId + ".page"
+                    ).toFile();
+                    rafFile.deleteOnExit();


What about the case where you want to keep the cache?

I must admit, I was focused on my specific use case, which involved numerous short-lived sessions that rapidly consumed all available disk space. I hadn’t considered restarting the application and resuming a session using its session ID.

tjake · 2025-02-22T17:04:57Z

Thanks for this. I wonder if the KV cache should be marked ephemeral when the model is created? Otherwise you can never keep the cache around long term (say you want to store threads of different conversations to go back to)

cdhermann · 2025-02-22T20:01:28Z

Thanks for this. I wonder if the KV cache should be marked ephemeral when the model is created? Otherwise you can never keep the cache around long term (say you want to store threads of different conversations to go back to)

Perhaps it's possible to achieve the best of both worlds: short-lived sessions that can be deleted and long-lived sessions that can be resumed by explicitly modeling the concept of a session.

E.g. something like that

/** 
 * Represents a session with a unique ID and persistence setting.
 */
public record Session(UUID sessionId, boolean persistent) {

    /**
     * Creates a persistent session with the provided session ID.
     * 
     * <p>
     * This session can be resumed even after the program exits.
     * </p>
     */
    public Session(UUID sessionId) {
        this(sessionId, true);
    }

    /**
     * Creates an ephemeral session with a new random session ID.
     * 
     * <p>
     * All resources are freed when the session is closed.
     * This session cannot be resumed later.
     * </p>
     */
    public Session() {
        this(UUID.randomUUID(), false);
    }
}

....

AbstractModel model = ModelSupport.loadModel(localModelPath, workingMemory, workingQuantization);

// Creates an ephemeral session
Session session = new Session();

Generator.Response response = model.generate(session, ctx, 0.1f, 1024, (s, f) -> {
    // Handle generation callback
});

/*
 * Closes the the given session
 * - Persistent sessions: No deletion of the temporary files
 * - Ephemeral sessions: Deletes temporary files and marks them for deletion on exit
 */
model.close(session);

cdhermann · 2025-02-22T20:17:43Z

Since Jlama provides LangChain4j integration, their expectations regarding this integration should also be considered. Based on my understanding of the LangChain4j chat memory documentation, there is no default persistence. However, I must admit that I haven’t explored the LangChain4j integration and its usage in depth yet.

tjake · 2025-02-22T21:49:51Z

Based on my understanding of the LangChain4j chat memory documentation, there is no default persistence.

Correct, in this case it would always be ephemeral. But for Jlama I want to handle stored sessions. I can take a crack at fixing this based on your PR!

Copilot

Pull Request Overview

This PR enables closing sessions to free associated resources and fixes issue #140 by improving resource management and temporary file cleanup.

Adds a session-specific close method in KvBufferCache and AbstractModel.
Enhances resource cleanup in KvBufferPage by closing file channels, deleting temporary files, and nullifying buffer references.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
jlama-core/src/main/java/com/github/tjake/jlama/tensor/KvBufferCache.java	Introduces a new close(UUID sessionId) method and refines resource cleanup in KvBufferPage.
jlama-core/src/main/java/com/github/tjake/jlama/model/AbstractModel.java	Exposes session-specific resource cleanup via the new close(UUID sessionId) method.

Comments suppressed due to low confidence (1)

jlama-core/src/main/java/com/github/tjake/jlama/tensor/KvBufferCache.java:208

[nitpick] Consider catching more specific exceptions rather than a generic Exception when deleting the temporary file to avoid masking important issues.

try { Files.delete(rafFile.toPath()); } catch (Exception e) {

Copilot · 2025-05-01T18:05:37Z

jlama-core/src/main/java/com/github/tjake/jlama/tensor/KvBufferCache.java

    }

+    public void close(UUID sessionId) {
+        KvBuffer buffer = kvBufferCache.get(sessionId);


The method close(UUID sessionId) does not check if kvBufferCache.get(sessionId) returns null, which can lead to a NullPointerException if the session ID is not tracked. Consider adding a null check before calling close() on the buffer.

Suggested change

KvBuffer buffer = kvBufferCache.get(sessionId);

KvBuffer buffer = kvBufferCache.get(sessionId);

if (buffer == null) {

logger.warn("Attempted to close a non-existent KvBuffer for sessionId: {}", sessionId);

return;

}

DumiJDev · 2025-09-13T21:04:16Z

@tjake, I think that we can have the best of two worlds (ephemeral and persistent) giving power for user, choosing what he/she prefer.

tjake · 2025-09-13T22:27:48Z

Yes agreed

edwardcapriolo · 2025-10-22T11:19:29Z

If you take VLLM they have a shared KVcache. Users are encourage to set a cache_salt if they want to ensure people cant "guess prompts" by looking at the timings of requests in multi-user environmenments. There is no concept of user here. The generate is a guuid each time. I think it becomes important to request to a time, sharing thecache IS a good thing. If three people are working on the same problem they can share a pre-shared SHA salt with each other. Since it is a cache expiring by time and volume makes the most sense to me. Simply you dont want it to grow boundlessly big, everything else is a per-use-case optimization

edwardcapriolo · 2025-10-23T23:08:22Z

Take a look at this. "cacheSalt". The idea here is in multi-user envs I can "guess other prompts, by looking at the timings of the response.

edwardcapriolo/deliverance#6

We all share the cache which makes sense, unless we dont want to then we use a "cache_salt" and the cache is private to those who know the sha.

https://docs.vllm.ai/en/stable/design/prefix_caching.html

My next work is to run a background thread to clean up old entries. We can expire by age or even size.

edwardcapriolo · 2025-10-25T13:50:32Z

A further improvement dedicated kvcache:
https://github.com/edwardcapriolo/deliverance/pull/new/dedicated_kv

cdhermann added 3 commits January 5, 2025 10:48

feat: allow closing a specific session

0454a72

fix: delete temporary file when closing the KvBufferPage

b7d1e97

Be more explicit when freeing KvBufferPage resource

982f55d

tjake reviewed Feb 22, 2025

View reviewed changes

tjake requested a review from Copilot May 1, 2025 18:05

Copilot AI reviewed May 1, 2025

View reviewed changes

Support Closing Sessions to Free Resources and Fix Issue #140" #141

Are you sure you want to change the base?

Support Closing Sessions to Free Resources and Fix Issue #140" #141

Uh oh!

Conversation

cdhermann commented Jan 5, 2025

Uh oh!

tjake Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

cdhermann Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

tjake commented Feb 22, 2025

Uh oh!

cdhermann commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdhermann commented Feb 22, 2025

Uh oh!

tjake commented Feb 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 1, 2025

Choose a reason for hiding this comment

Uh oh!

DumiJDev commented Sep 13, 2025

Uh oh!

tjake commented Sep 13, 2025

Uh oh!

edwardcapriolo commented Oct 22, 2025

Uh oh!

edwardcapriolo commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edwardcapriolo commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cdhermann commented Feb 22, 2025 •

edited

Loading

edwardcapriolo commented Oct 23, 2025 •

edited

Loading