Skip to content

Refactor to clarify plugin boundary#5

Merged
jcushman merged 2 commits into
mainfrom
plugin-boundary
Mar 18, 2026
Merged

Refactor to clarify plugin boundary#5
jcushman merged 2 commits into
mainfrom
plugin-boundary

Conversation

@jcushman

Copy link
Copy Markdown
Collaborator

This comes out of a conversation with @cmsetzer about plugin architecture -- do we really want a rust library with rust plugins to depend on a python loader that loads python-wrapped plugins for discovery?

The options for cross-language cross-binary plugin package management, discovery, and communication are very complex and full of tradeoffs.

To avoid figuring that out now, I instead turned to making the boundary between controller and plugins well defined and versioned, and making all of the answers to "plugin package management, discovery, and communication" confined to an SDK that plugins can depend on. This means for now we can do pip install binoc binoc-sqlite; binoc diff ... and it works, but later if we want to get rid of python entirely and shared objects entirely and have plugins communicate over IPC, all that will take is recompiling the plugins against a different SDK version.

So -- in this PR, a new model_plugins/ directory with three plugins, two rust and one pure python. The SDK takes care of taking their substantive code and wrapping it in a library (in python's case) or shared object (in Rust's case) that exposes a declarative description of what the plugin can do and what SDK version it uses, and is bundled as a python package. The SDK on the binoc controller side finds those via python's entrypoints mechanism, and loads the .so or library, fetches its declaration and registers it at the various extension points if the versions match. (A detail is that now that communication is a bit more expensive, plugins must declaratively state what files they want to see, no can_handle escape hatch. A plugin can still just say it wants to see everything, of course.)

This means that although we're still wrapping a rust project in python, the python is only used at startup and for pure-python plugins -- the rust binary loads rust plugins directly. I think this might actually be a pretty stable set-point in package management/discovery/communication design space -- having python is as close to a common denominator as anything, the ergonomics are great, the python isn't costing anything ...

The trickiest bit here is defining a stable controller-plugin interface we can version, that's reasonably efficient. See the Plugin SDK and ABI Safety ADR for more, but the general idea is there's a DataAccess object for accessing the two snapshots and the temp dir for intermediate files, and that's how the controller and plugins all work on the same underlying data -- later that can be a ramdisk or whatever it needs to be, if disk is a problem. And there's typed structures for the communication layer, "please compare these two files", "please modify this IR" etc. That's serialized to json, which is probably fine since it's just metadata. There's a method for reading and writing to a cache that is currently just files in the temp dir, but could be shared memory or something later if like the csv reordering-detection transformer didn't want to re-load the csv already loaded by the csv comparator and needed some way to keep that in RAM.

I added traces of all the SDK touchpoints to the test-vector golden files (every time it sends a message back and forth or touches the data store or cache), so we can audit if the data flow is making sense and notice if it changes.

Machine-generated details on the PR:

--

Extract binoc-sdk crate; C ABI plugin loading; declarative dispatch

Implements the Plugin SDK and ABI Safety ADR.

What changed

  • New binoc-sdk crate — plugin-facing surface (traits, IR, types, DataAccess, export_plugin! macro) extracted from binoc-core. Plugins depend on SDK only; no binoc-core dep.
  • DataAccess trait replaces Item.physical_path + CompareContext. Three construction modes: controller-owned session, C ABI plugin (shared data_root + workspace), extract-only. Filesystem-backed store()/load() cache visible across ABI boundary.
  • DiffNode.source_items — transient ItemPair on each node (#[serde(skip)]), set by controller. Transformers re-parse source data directly instead of JSON-serialized cache. Replaces closed ReopenedData enum.
  • Declarative routingcan_handle removed. ComparatorDescriptor.scope (Files/Containers/Any) replaces is_dir() checks. CompareResult::Skip is the escape hatch for content-dependent rejection.
  • SDK version checking at Controller::new() — rejects incompatible plugins with a named error. 0.x policy: [MIN_COMPATIBLE_MINOR, host_minor].
  • C ABI plugin loadingexport_plugin! macro generates extern "C" entry points (describe, compare, reopen, extract, transform, render) with JSON serialization. Host loads via libloading, wraps in NativeComparator/NativeTransformer/NativeOutputter. Panic-safe (catch_unwind).
  • Unified discovery — single binoc.plugins entry point group. Auto-detects native vs Python plugins.
  • model-plugins/ — three reference plugins moved/added: binoc-sqlite (Rust comparator, C ABI), binoc-row-reorder (Rust transformer, re-parses source CSV), binoc-html (pure Python outputter).
  • binoc-sqlite moved from workspace root to model-plugins/binoc-sqlite.

Deleted

  • binoc-core/src/{ir,traits,types}.rs — moved to SDK
  • binoc-core/src/types::CompareContext, Item, ReopenedData — replaced by DataAccess/ItemRef
  • README.md (superseded by AGENTS.md + docs/)

Not in scope

  • WASM/IPC transport (ABI designed to support it; not implemented)
  • Arrow IPC or other high-performance cache formats for store()/load()

@jcushman jcushman requested a review from cmsetzer March 13, 2026 17:42
@parkan

parkan commented Mar 14, 2026

Copy link
Copy Markdown

looks like you identified substantially the same limitations as I did in #4 + https://github.com/parkan/binoc-plugin-hdf5 but felt more free to do a big rewrite 😉

this is definitely the right direction and I'm glad you are taking it, I'd be happy to withdraw my PR in favor of this approach

my concerns with this as implemented are (1) how much re-parsing will need to be done in practice, for my example plugin it's fine (metadata reads are cheap) but larger block-level diffs can hurt, though need to think this through a bit more (2) store()/load() offers less type safety

it also looks like the model-plugins now provides more separation between core/sdk and plugin namespaces; for any 3rd party plugins I would still recommend a naming scheme like binoc-plugin-*

data_root.join(".cache")
}

fn safe_cache_key(key: &str) -> String {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually safe -- if two plugins define the same key, this will collide; prefer namespacing, maybe with an optional override if you know you want cross-plugin communication

@parkan

parkan commented Mar 14, 2026

Copy link
Copy Markdown

the other regression, aside from giving up type safety/cache key collisions and re-parsing, is that skips can now be quite expensive

@cmsetzer cmsetzer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! I've got thoughts on various details but nothing that should block this; I say let's get this merged and go from there.

@jcushman jcushman merged commit 67e2005 into main Mar 18, 2026
2 checks passed
@cmsetzer cmsetzer deleted the plugin-boundary branch March 25, 2026 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants