Skip to content

feat(phase-07): server lifecycle — wire JNA/Panama to AbstractChromaRuntime#79

Merged
tazarov merged 20 commits into
mainfrom
feature/phase-7-server-lifecycle
Mar 26, 2026
Merged

feat(phase-07): server lifecycle — wire JNA/Panama to AbstractChromaRuntime#79
tazarov merged 20 commits into
mainfrom
feature/phase-7-server-lifecycle

Conversation

@tazarov

@tazarov tazarov commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Phase 7: Server Lifecycle
Goal: Users can start a Chroma server from Java, retrieve connection details, and cleanly shut it down using try-with-resources in both JNA and Panama backends.
Verification: 9/9 must-haves passed

Retrofits both JnaChromaRuntime and PanamaChromaRuntime to extend AbstractChromaRuntime, centralizing FFI lock serialization, lifecycle management, and error handling into template methods. Adds server lifecycle integration tests for both backends.

Changes

Plan 01: AbstractChromaRuntime Retrofit

  • JNA and Panama runtimes now extend AbstractChromaRuntime instead of implementing ChromaRuntime directly
  • Template methods (callFfiHandle, callFfiInt, callFfiVoid, callFfiBorrowedString, callFfiJson, callFfiFree) centralize lock acquisition, null-handle checking, and error propagation
  • Lifecycle (closed/ensureOpen/close/doClose) lives in the base class with rollback on doClose failure
  • version(), startEmbedded(), startServer() are final template methods with shared config validation
  • Panama MethodHandle fields grouped into private Ffi record (12-param constructor → 2)

Plan 02: Server Lifecycle Integration Tests

  • JnaServerLifecycleTest (6 tests): full accessor verification, double-close idempotency, post-close rejection, null/empty/malformed config
  • PanamaServerLifecycleTest (6 tests): mirrors JNA coverage

Hardening (from 4-round PR review)

  • Fixed readOwnedString exception masking with suppressed exception pattern
  • Fixed readLastError returning synthetic string → returns null (callers have better fallbacks)
  • Fixed EmbeddedSession.close() rollback enabling double-free → stays closed on failure
  • Fixed callFfiVoid stale error slot leak → drains slot in catch path
  • Fixed JNA serverFree/embeddedFree missing error wrapping (regression)
  • Fixed Panama serverFree/embeddedFree bypassing FFI_LOCK (thread-safety)
  • Added ffiLock()/ffiUnlock() helpers for Panama's invokeExact workaround
  • Added validateLibraryPath() to deduplicate init() validation

Key files modified:

  • java/core/src/main/java/.../AbstractChromaRuntime.java
  • java/jna/src/main/java/.../JnaChromaRuntime.java
  • java/panama/src/main/java/.../PanamaChromaRuntime.java
  • java/core/src/main/java/.../EmbeddedSession.java

Key files created:

  • java/jna/src/test/.../JnaServerLifecycleTest.java
  • java/panama/src/test/.../PanamaServerLifecycleTest.java

Test plan

  • core:test — unit tests for AbstractChromaRuntime (lifecycle, callFfiInt, close rollback, FFI serialization)
  • jna:test — JNA integration tests with real native library
  • panama:test — Panama integration tests with real native library
  • 4-round automated PR review (code, errors, comments, types) — all clean

tazarov added 18 commits March 26, 2026 09:37
Two plans in 2 waves: Plan 01 retrofits JNA and Panama backends to
extend AbstractChromaRuntime, Plan 02 adds integration tests covering
the full server lifecycle error matrix.
- Replace `implements ChromaRuntime` with `extends AbstractChromaRuntime`
- Implement readBorrowedString, readOwnedString, readLastError abstract methods
- Route all FFI calls through callFfiHandle, callFfiBorrowedString, callFfiVoid
- Wire ServerSession with method-reference callbacks for port, address, persistPath
- Remove old lastError(String) helper and verbose try-catch blocks
…time

- Replace `implements ChromaRuntime` with `extends AbstractChromaRuntime`
- Implement readBorrowedString, readOwnedString, readLastError abstract methods
- Route all FFI calls through callFfiHandle, callFfiBorrowedString, callFfiVoid
- Wire ServerSession with method-reference callbacks for port, address, persistPath
- Preserve Windows arena close workaround
- Remove old lastError(String) helper and verbose try-catch blocks
- Update AbstractChromaRuntime comment to reflect wired state
- Add 07-01-PLAN.md and 07-01-SUMMARY.md
- Update STATE.md with phase 7 position and decisions
- Update ROADMAP.md progress for phase 7
…fecycle

# Conflicts:
#	.planning/ROADMAP.md
#	.planning/STATE.md
- Restore 07-01-PLAN.md, 07-01-SUMMARY.md, 07-02-PLAN.md
- Restore context, research, validation, and discussion log files
- Files were lost during merge conflict resolution
- 6 test methods per backend covering full error matrix
- Happy path: start, accessors, stop, close with ServerConfigBuilder
- Double close idempotency via AtomicBoolean CAS
- Accessors throw IllegalStateException after close
- Null, empty, and malformed config rejection
- Ephemeral ports via findFreePort(), @tempdir for isolation
- Add 07-02-SUMMARY.md with self-check
- Update STATE.md with phase 7 position and decisions
- Update ROADMAP.md progress for phase 7 (2/2 plans complete)
…fecycle

# Conflicts:
#	.planning/ROADMAP.md
#	.planning/STATE.md
…o AbstractChromaRuntime

- Lift closed/ensureOpen/close into AbstractChromaRuntime with doClose template
- Add callFfiInt for int-returning FFI calls (serverPort), callFfiFree for
  free operations, ffiLock/ffiUnlock for Panama invokeExact workaround
- Make version/startEmbedded/startServer final template methods with shared
  config validation
- Fix readOwnedString to use suppressed exception pattern instead of masking
- Fix readLastError to return null on failure instead of synthetic string
- Fix EmbeddedSession.close() to not rollback closed flag (prevents double-free)
- Drain error slot in callFfiVoid catch path to prevent stale error leaking
- Add error wrapping to JNA serverFree/embeddedFree (regression fix)
- Acquire FFI_LOCK in Panama serverFree/embeddedFree (thread-safety fix)
- Group Panama MethodHandle fields into private Ffi record
- Add validateLibraryPath static helper to eliminate init() duplication
- Add lifecycle, callFfiInt, and close-rollback tests
- Add startServer-after-close coverage to both runtime test suites
@tazarov

tazarov commented Mar 26, 2026

Copy link
Copy Markdown
Contributor Author

@claude PTAL

@claude

claude Bot commented Mar 26, 2026

Copy link
Copy Markdown

Claude finished @tazarov's task in 2m 49s —— View job


Code Review — Phase 7: Server Lifecycle

  • Gather context and understand the request
  • Read key changed files
  • Analyze AbstractChromaRuntime changes
  • Analyze JNA runtime changes
  • Analyze Panama runtime changes
  • Analyze test files and finalize review
  • Post final review

Overall the retrofit is well-executed. The template method pattern cleanly centralizes FFI lock serialization, error handling, and lifecycle management. The hardening fixes from the 4-round review are all correctly applied. A few observations below, ranging from a potential bug to minor style notes.


callFfiVoid loses native error details when lambda throws early

In JnaChromaRuntime.serverStop (and the Panama equivalent), the lambda does its own rc != 0 check and throws before callFfiVoid can call readLastError():

// JnaChromaRuntime.java:109-113
callFfiVoid(() -> {
    int rc = bindings.chroma_server_stop(new Pointer(handle));
    if (rc != 0) {
        throw new ChromaException("server stop failed (rc=" + rc + ")");  // throws here
    }
});

callFfiVoid's catch block then drains readLastError() but discards it. If the shim set a descriptive error message (e.g. "bind failed: address already in use") in the error slot before returning the non-zero rc, that message is silently dropped and the caller only sees "server stop failed (rc=-1)".

This is a diagnostic loss rather than a correctness bug, but it's worth considering using callFfiInt here instead, which naturally surfaces the error message:

// Alternative: surfaces native error message automatically
private void serverStop(long handle) {
    if (handle == 0L) return;
    callFfiInt(() -> bindings.chroma_server_stop(new Pointer(handle)));
}

Fix this →


url() has a compound open-check (minor)

ServerSession.url() calls address() then port(), each with their own ensureOpen(). A concurrent close() between the two calls would cause address() to succeed and port() to throw IllegalStateException. In practice this is only reachable if sessions are shared across threads without external synchronization, which goes against the expected usage pattern — but worth noting for the session-sharing doc if it ever gets written.


Test code duplication — JnaServerLifecycleTest vs PanamaServerLifecycleTest

The six tests are character-for-character identical except for the runtime type. A shared abstract test base would halve the maintenance surface:

abstract class ServerLifecycleTestBase {
    abstract AutoCloseable createRuntime(String libPath);
    // shared @Test methods here
}
class JnaServerLifecycleTest extends ServerLifecycleTestBase { ... }
class PanamaServerLifecycleTest extends ServerLifecycleTestBase { ... }

Not blocking, but Phase 8 will likely add more lifecycle tests.


findFreePort() TOCTOU (informational)

Both test files use the classic ServerSocket(0) trick to find a free port. The port is released before Chroma claims it, so another process can race to bind it first. This is widely accepted in integration test suites — just noting it for awareness.


Correct design decisions (explicitly calling these out)

  • EmbeddedSession.close() stays closed on failure (no rollback) — correct; the native handle is in an unknown state after a failed free, and retrying risks a double-free.
  • AbstractChromaRuntime.close() rolls back on doClose() failure — correct; doClose() throws when sessions are still open (Panama Arena constraint), and the caller should be able to close those sessions and retry.
  • Panama serverFree/embeddedFree use ffiLock()/ffiUnlock() directly — the invokeExact signature-polymorphic constraint is well-documented; this is the right workaround.
  • readLastError() returns null instead of a synthetic string — callers all have fallback messages, this avoids cascading misleading errors.
  • Suppressed exception pattern in readOwnedString — correctly prioritizes the read exception as primary, attaches free failure as suppressed.

Verdict: Looks good to merge. The diagnostic loss in serverStop is the only item worth addressing before merge; the rest are observations or future-phase work.

tazarov added 2 commits March 26, 2026 12:41
serverStop was using callFfiVoid with an inner rc check that threw before
readLastError could run, discarding the native error message. Since the
Rust shim returns negative error codes (SUCCESS=0, errors=-1..-8),
callFfiInt naturally surfaces the native error via readLastError on failure.
Native server may not release SQLite file handles synchronously after
chroma_server_free on Windows, causing @tempdir cleanup to fail with
IOException. Use CleanupMode.NEVER for all tests that start a real
Chroma server — the OS temp cleaner handles eventual cleanup.
@tazarov

tazarov commented Mar 26, 2026

Copy link
Copy Markdown
Contributor Author

@claude

@claude

claude Bot commented Mar 26, 2026

Copy link
Copy Markdown

Claude finished @tazarov's task in 2m 19s —— View job


Code Review — Phase 7: Server Lifecycle

  • Read key changed files
  • Analyze AbstractChromaRuntime
  • Analyze JNA and Panama runtimes
  • Analyze test files
  • Post final review

The prior review's main concern — serverStop losing native error details via callFfiVoid — has been resolved. Both JnaChromaRuntime.serverStop and PanamaChromaRuntime.serverStop now use callFfiInt, which naturally surfaces the shim's error message on failure. Good fix.

One new asymmetry is worth calling out:


JNA readOwnedString lacks suppressed-exception pattern

PanamaChromaRuntime.readOwnedString correctly handles the case where both the read and the free fail — it attaches the free failure as a suppressed exception to the read exception, preserving both:

// PanamaChromaRuntime.java:116-125 — correct
} catch (Throwable t) {
    if (readError != null) {
        readError.addSuppressed(t);   // free failure attached, read failure propagates
    } else {
        throw new ChromaException("failed to free native string", t);
    }
}

JnaChromaRuntime.readOwnedString has no such guard:

// JnaChromaRuntime.java:59-65 — asymmetric
protected String readOwnedString(long address) {
    Pointer ptr = new Pointer(address);
    try {
        return ptr.getString(0);
    } finally {
        bindings.chroma_string_free(ptr);  // if this throws, original exception is silently lost
    }
}

If getString(0) throws (e.g. on invalid encoding) and chroma_string_free also throws in the finally, the JVM replaces the read exception with the free exception. In practice, JNA's Pointer.getString rarely throws, so this is low-likelihood — but it's a correctness asymmetry with Panama that will silently swallow diagnostic information if it does occur.

Fix this →


Remaining observations (unchanged from prior review)

url() compound open-checkServerSession.url() calls address() then port(), each with their own ensureOpen(). A concurrent close() between the two would cause address() to succeed and port() to throw. Low risk in practice since sessions aren't typically shared across threads, but worth noting if session-sharing semantics are ever documented.

Test duplicationJnaServerLifecycleTest and PanamaServerLifecycleTest are character-for-character identical except for the runtime type. An abstract base class would halve maintenance surface for Phase 8 additions.


Confirmed correct design decisions

  • serverStop uses callFfiInt — native error message preserved on failure ✓
  • EmbeddedSession.close() stays closed on failure — no rollback risk of double-free ✓
  • AbstractChromaRuntime.close() rolls back on doClose() failure — caller can retry after resolving open sessions ✓
  • Panama serverFree/embeddedFree use ffiLock()/ffiUnlock() directly — invokeExact constraint well-documented ✓
  • readLastError() returns null instead of synthetic string — callers have better fallbacks ✓
  • Panama readOwnedString suppressed-exception pattern — read exception takes priority ✓
  • callFfiVoid catch path drains error slot — no stale-error leak ✓

Verdict: Looks good to merge. The readOwnedString asymmetry in JNA is the only new item; it's low-likelihood and the PR is otherwise clean. The prior main concern is resolved.

@tazarov tazarov merged commit b5adc08 into main Mar 26, 2026
11 checks passed
@tazarov tazarov deleted the feature/phase-7-server-lifecycle branch March 26, 2026 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant