pyghidra-lite v0.6.0
"The JVM giveth, and the JVM taketh away. But now we taketh back."
25 files changed · +5,700 −2,028 · 6 new modules · 5 new test suites
This release folds in two unreleased versions (v0.5.0, v0.5.1) plus the new eviction system. It is the largest update since the initial release.
Idle Memory Eviction NEW
The server no longer pins every binary you've ever touched in the JVM heap for eternity. Binaries idle for 30 minutes are quietly unloaded from memory while their on-disk Ghidra projects stay intact. The next tool call that references an evicted binary reloads it transparently — same analysis, same function names, just a few seconds of reload latency.
Internals
Every tool call through _get_handle() stamps a monotonic access time. A background monitor sweeps every 60s, evicting anything past the idle threshold while always keeping at least N most-recently-used binaries warm.
The reload path (_get_handle → _find_on_disk → _hot_load_blocking) was already battle-tested for lazy loading. Eviction just gives it more to do.
--evict-after 30 # idle minutes before eviction (0 = off)
--min-loaded 2 # never drop below this many in memory
binaries() now shows a third state:
| Status | Meaning |
|---|---|
ready |
Loaded in JVM, fast path |
evicted |
Was loaded, idle-evicted, on disk. Any tool call reloads it. |
complete |
Never loaded this session, on disk |
Evicted entries include a hint field so the model knows load() is unnecessary.
Stdio Proxy Architecture NEW
pyghidra-lite (bare, no subcommand) now routes through a lightweight stdio↔HTTP proxy. The proxy auto-starts a shared backend on first connection and auto-stops it after 30 minutes of idle.
- Before (v0.4.0)
- Each MCP session spawned its own JVM. 8 Claude sessions = 8 JVMs × ~500 MB = 4 GB.
- After (v0.6.0)
- N stdio sessions → 1 shared HTTP backend. Each proxy is ~10 MB. 8 sessions ≈ 580 MB.
New module: proxy.py (226 lines) with 13 unit tests.
Version Tracking & Bootstrap NEW
Transfer function names between binary versions via bootstrap="reference-binary" on load(). Uses GhidraMemory.findBytes() on executable sections — a name transfers only when the byte pattern is unique and lands on a FUN_* entry point.
load(path="/bin/v2.0", bootstrap="v1.0") # named functions only
load(path="/bin/v2.0", bootstrap="v1.0",
bootstrap_mode="all") # + stable synthetic labelsHow bootstrap_mode="all" works
When "all", every FUN_* function from the source that byte-matches in the destination gets a stable cross-version label like BTFN_02B4A120. This creates a complete address map — identical functions get labels, changed/new functions stay as FUN_* and are immediately visible as the diff.
Supporting tools:
rank_sourcesviabinaries(rank_sources=True)— ranks loaded binaries by transferable named function count so you can pick the best referencefresh=Trueonload()— purges stale analysis and re-imports from scratch. SendsSIGTERMto any in-flight worker before wiping the project directory.
Deep String Search & Runtime Detection NEW
search(mode="deep")
Raw memory scan for ASCII runs that Ghidra hasn't classified as defined strings. Finds strings in data sections, embedded payloads, and lightly-analyzed regions that mode="indexed" misses.
search(type="blob", query="offset,size")
Extract strings from arbitrary memory regions. Designed for embedded runtimes — after info(detail="full") identifies a payload with strategy="search_payload", pass the offset directly.
Embedded runtime detection
info(detail="full") now detects Bun, Electron, Node.js, PyInstaller, and UPX runtimes embedded in binaries. Each detection includes a suggested analysis strategy.
search(type="extract") extracts BunFS filesystems from Bun single-executables (runs in background).
Security Hardening
| Fix | Detail |
|---|---|
| TOCTOU race | ProjectWatcher._check_and_load now opens status files with O_NOFOLLOW, rejecting symlinks atomically |
| Job queue bounds | _MAX_QUEUED_JOBS=32 prevents unbounded accumulation |
rmtree logging |
Replaced ignore_errors=True with onerror handler that logs failures |
_kill_job() mutex |
threading.Lock around _active_jobs.pop() prevents race with async loop |
delete_binary |
Exact-match-first with INVALID_PARAMS + candidate list on ambiguity |
| Autopurge safety | Skips shutil.rmtree on projects currently loaded in backend._projects |
| Path policy | Replaced allow_any_path + allowed_paths with single --restrict-path list |
Test Coverage
5 new test suites, 62+ new tests:
| Suite | Coverage |
|---|---|
test_deep_search.py |
search_strings_deep, batch_search_strings, extract_strings_from_blob |
test_runtime_detect.py |
detect_embedded_runtime for Bun, Electron, Node, PyInstaller, UPX |
test_scan_jobs.py |
Async scan jobs, extract_bunfs, AnalysisProgressListener |
test_proxy.py |
Proxy lifecycle, auto-start, idle timeout, PID management |
test_server.py |
+8 tests for _rank_sources_blocking, _kill_job mutex, delete ambiguity |
Breaking Changes
- Default transport is now proxy (stdio→HTTP). Direct
--transport stdiostill works but spawns an isolated JVM. --allow-any-path/--allowed-pathsremoved. Use--restrict-path(repeatable, empty = unrestricted).
Full Changelog
All commits since v0.4.0
c4554f7Bump version to 0.6.0035e299Track evicted binaries: show 'evicted' status with reload hint3fa21f7Add idle eviction: unload binaries from JVM, keep on disk7a36abfIgnore JVM crash dumps (hs_err_pid*.log)871c902Track test suites for deep search, runtime detection, and scan jobs3ac927dAdd stdio proxy as default transportd5a54edFix bootstrap transfer and live analysis job status (#9)f32510dVersion tracking bootstrap + fresh scan (0.5.1) (#8)8148019Add CLAUDE.md with MCP registry publish instructions82a396dAdd MCP registry workflow and update to v0.4.0 (#7)
@johnzfitch · MIT · PyPI · MCP Registry