Skip to content

Latest commit

 

History

History
102 lines (65 loc) · 4.81 KB

File metadata and controls

102 lines (65 loc) · 4.81 KB

ADR-022: Session-Scoped Venvs with Lifecycle Management

Status

Implemented

Date

2026-03-17

Context

The Ansible validator creates Python venvs containing ansible-core (and optionally Ansible collections) to perform plugin introspection (FQCN resolution, redirect detection, deprecation checks). The original implementation used a shared, hash-keyed venv directory under ~/.apme-data/collection-cache/venvs/. This had two problems:

  1. Concurrency bug: build_venv checked for an existing directory and then created it without holding a lock. Two concurrent requests for the same ansible_core_version + collection_specs hash could race, corrupting the shared venv.

  2. No lifecycle management: Venvs accumulated on disk indefinitely. There was no concept of sessions, TTLs, or cleanup. For future features (AI escalation via Abbenay, MCP tools for enterprise collection docstrings), we need venvs that persist for a known session duration and can be queried by other components.

Decision

Replace the shared hash-keyed venv pool with session-scoped venvs managed by VenvSessionManager, owned by the Primary orchestrator and shared read-only with validators.

Each session gets its own isolated directory with:

  • A file lock (.lock) for safe concurrent creation via fcntl.flock()
  • Per-core-version subdirectories with metadata (meta.json) tracking collections, timestamps
  • A venv/ subdirectory within each core-version entry containing the actual virtualenv

Sessions use a client-provided session_id (VS Code workspace hash, CI job ID, stable user name). Within a session, venvs are keyed by ansible_core_version — like tox matrix entries. Collections are installed incrementally (additive, never destructive).

Storage Layout

$SESSIONS_ROOT/
    <session_id>/
        <core_version>/
            venv/           # full virtualenv with ansible-core + collections
            meta.json       # {installed_collections, created_at, last_used_at}
        session.json        # session-level metadata
        .lock               # flock target for creation serialization

Ownership Model

The Primary orchestrator is the sole venv authority (single writer). It calls VenvSessionManager.acquire() before fanning out to validators. Validators mount the sessions volume read-only — they receive a venv_path in ValidateRequest and use it as-is. This eliminates concurrent validator writes and corruption risk.

CLI Integration

# Ephemeral (default, backward compatible)
apme check playbook.yml

# Named session (reusable, VS Code extension use case) — planned
apme check playbook.yml --session my-project --session-ttl 7200

# Session management — planned
apme session list
apme session info my-project
apme session delete my-project
apme session reap --ttl 3600

Alternatives Considered

Alternative 1: Add a Mutex to the Existing Shared Pool

Add fcntl.flock() around build_venv calls in the existing hash-keyed scheme. This fixes the race condition but doesn't provide session lifecycle management needed for future AI escalation and MCP tools.

Rejected: Solves concurrency but not lifecycle requirements.

Alternative 2: Truly Ephemeral tempdir-based Venvs

Use tempfile.mkdtemp() for every request, deleting when done. UV cache makes creation fast (~1-2s).

Rejected: Wasteful for repeated scans in the same session. Named sessions for VS Code extension integration require persistence.

Alternative 3: External Cache Manager (Redis/SQLite)

Use an external store to coordinate venv access across processes.

Rejected: Over-engineered. File locks are sufficient for the single-host, multi-process case. No multi-host coordination needed.

Consequences

Positive

  • Concurrency safe: File locks prevent race conditions during venv creation
  • Backward compatible: Default behavior (no --session flag) creates ephemeral venvs, matching previous semantics
  • Future-ready: Named sessions enable VS Code extension integration and AI escalation workflows where venvs need to persist for collection docstring queries
  • Observable: session list/info/delete/reap CLI subcommands provide visibility into venv lifecycle
  • Atomic metadata: os.replace() for metadata writes prevents corruption

Negative

  • Disk usage: Named sessions persist until TTL expiry or explicit deletion
  • fcntl dependency: Linux/macOS only (not Windows), acceptable for APME's container-first deployment
  • Module-level singleton: _manager in _venv.py means TTL is set on first use; subsequent calls with different TTL are ignored

References

  • src/apme_engine/collection_cache/venv_session.py — Implementation
  • src/apme_engine/validators/ansible/_venv.py — Integration layer
  • tests/test_venv_session.py — 20 tests covering all lifecycle operations