Skip to content

v0.6.0 — Idle Eviction

Latest

Choose a tag to compare

@johnzfitch johnzfitch released this 28 Mar 10:08
· 3 commits to master since this release
c4554f7

pyghidra-lite v0.6.0

"The JVM giveth, and the JVM taketh away. But now we taketh back."

25 files changed · +5,700 −2,028 · 6 new modules · 5 new test suites

This release folds in two unreleased versions (v0.5.0, v0.5.1) plus the new eviction system. It is the largest update since the initial release.


Idle Memory Eviction NEW

The server no longer pins every binary you've ever touched in the JVM heap for eternity. Binaries idle for 30 minutes are quietly unloaded from memory while their on-disk Ghidra projects stay intact. The next tool call that references an evicted binary reloads it transparently — same analysis, same function names, just a few seconds of reload latency.

Internals

Every tool call through _get_handle() stamps a monotonic access time. A background monitor sweeps every 60s, evicting anything past the idle threshold while always keeping at least N most-recently-used binaries warm.

The reload path (_get_handle_find_on_disk_hot_load_blocking) was already battle-tested for lazy loading. Eviction just gives it more to do.

--evict-after 30   # idle minutes before eviction (0 = off)
--min-loaded 2     # never drop below this many in memory

binaries() now shows a third state:

Status Meaning
ready Loaded in JVM, fast path
evicted Was loaded, idle-evicted, on disk. Any tool call reloads it.
complete Never loaded this session, on disk

Evicted entries include a hint field so the model knows load() is unnecessary.


Stdio Proxy Architecture NEW

pyghidra-lite (bare, no subcommand) now routes through a lightweight stdio↔HTTP proxy. The proxy auto-starts a shared backend on first connection and auto-stops it after 30 minutes of idle.

Before (v0.4.0)
Each MCP session spawned its own JVM. 8 Claude sessions = 8 JVMs × ~500 MB = 4 GB.
After (v0.6.0)
N stdio sessions → 1 shared HTTP backend. Each proxy is ~10 MB. 8 sessions ≈ 580 MB.

New module: proxy.py (226 lines) with 13 unit tests.


Version Tracking & Bootstrap NEW

Transfer function names between binary versions via bootstrap="reference-binary" on load(). Uses GhidraMemory.findBytes() on executable sections — a name transfers only when the byte pattern is unique and lands on a FUN_* entry point.

load(path="/bin/v2.0", bootstrap="v1.0")       # named functions only
load(path="/bin/v2.0", bootstrap="v1.0",
     bootstrap_mode="all")                       # + stable synthetic labels
How bootstrap_mode="all" works

When "all", every FUN_* function from the source that byte-matches in the destination gets a stable cross-version label like BTFN_02B4A120. This creates a complete address map — identical functions get labels, changed/new functions stay as FUN_* and are immediately visible as the diff.

Supporting tools:

  • rank_sources via binaries(rank_sources=True) — ranks loaded binaries by transferable named function count so you can pick the best reference
  • fresh=True on load() — purges stale analysis and re-imports from scratch. Sends SIGTERM to any in-flight worker before wiping the project directory.

Deep String Search & Runtime Detection NEW

search(mode="deep")

Raw memory scan for ASCII runs that Ghidra hasn't classified as defined strings. Finds strings in data sections, embedded payloads, and lightly-analyzed regions that mode="indexed" misses.

search(type="blob", query="offset,size")

Extract strings from arbitrary memory regions. Designed for embedded runtimes — after info(detail="full") identifies a payload with strategy="search_payload", pass the offset directly.

Embedded runtime detection

info(detail="full") now detects Bun, Electron, Node.js, PyInstaller, and UPX runtimes embedded in binaries. Each detection includes a suggested analysis strategy.

search(type="extract") extracts BunFS filesystems from Bun single-executables (runs in background).


Security Hardening

Fix Detail
TOCTOU race ProjectWatcher._check_and_load now opens status files with O_NOFOLLOW, rejecting symlinks atomically
Job queue bounds _MAX_QUEUED_JOBS=32 prevents unbounded accumulation
rmtree logging Replaced ignore_errors=True with onerror handler that logs failures
_kill_job() mutex threading.Lock around _active_jobs.pop() prevents race with async loop
delete_binary Exact-match-first with INVALID_PARAMS + candidate list on ambiguity
Autopurge safety Skips shutil.rmtree on projects currently loaded in backend._projects
Path policy Replaced allow_any_path + allowed_paths with single --restrict-path list

Test Coverage

5 new test suites, 62+ new tests:

Suite Coverage
test_deep_search.py search_strings_deep, batch_search_strings, extract_strings_from_blob
test_runtime_detect.py detect_embedded_runtime for Bun, Electron, Node, PyInstaller, UPX
test_scan_jobs.py Async scan jobs, extract_bunfs, AnalysisProgressListener
test_proxy.py Proxy lifecycle, auto-start, idle timeout, PID management
test_server.py +8 tests for _rank_sources_blocking, _kill_job mutex, delete ambiguity

Breaking Changes

  • Default transport is now proxy (stdio→HTTP). Direct --transport stdio still works but spawns an isolated JVM.
  • --allow-any-path / --allowed-paths removed. Use --restrict-path (repeatable, empty = unrestricted).

Full Changelog

All commits since v0.4.0
  • c4554f7 Bump version to 0.6.0
  • 035e299 Track evicted binaries: show 'evicted' status with reload hint
  • 3fa21f7 Add idle eviction: unload binaries from JVM, keep on disk
  • 7a36abf Ignore JVM crash dumps (hs_err_pid*.log)
  • 871c902 Track test suites for deep search, runtime detection, and scan jobs
  • 3ac927d Add stdio proxy as default transport
  • d5a54ed Fix bootstrap transfer and live analysis job status (#9)
  • f32510d Version tracking bootstrap + fresh scan (0.5.1) (#8)
  • 8148019 Add CLAUDE.md with MCP registry publish instructions
  • 82a396d Add MCP registry workflow and update to v0.4.0 (#7)

@johnzfitch · MIT · PyPI · MCP Registry