Skip to content

Add optional allowlist to pickle serde deserializer#818

Open
elijahbenizzy wants to merge 2 commits into
mainfrom
improve/pickle-serde-allowlist
Open

Add optional allowlist to pickle serde deserializer#818
elijahbenizzy wants to merge 2 commits into
mainfrom
improve/pickle-serde-allowlist

Conversation

@elijahbenizzy

Copy link
Copy Markdown
Contributor

Summary

Adds an optional allowlist to the pickle deserializer used by Burr's serde system (burr/integrations/serde/pickle.py), mirroring the shape of #794's pydantic allowlist work.

Without an allowlist: backward-compatible — deserialize_pickle continues to call pickle.loads(...) and emits a one-time SecurityWarning per registration site so users see the noise.

With an allowlist: a _RestrictedUnpickler(pickle.Unpickler) subclass overrides find_class(module, name) to validate against the allowlist and raises pickle.UnpicklingError for anything not on it.

API

Three ways to set the allowlist, in resolution order (highest priority first):

  1. Per-call kwarg: deserialize_pickle(value, allowlist=[...])
  2. At registration time: register_type_to_pickle(cls, allowlist=[...])
  3. Process-wide: set_pickle_serde_allowlist([...])
  4. Otherwise: legacy unsafe path + SecurityWarning

Allowlist entries are (module, qualname) tuples — e.g. ("myapp.models", "User") — rather than prefix strings. Pickle attacks routinely reach for specific symbols within otherwise-trusted modules (e.g. builtins.eval, os.system, subprocess.Popen), so per-class granularity matters more than for pydantic where the registered name is already a class.

Trust model (now in the docstring)

Pickle deserialization is unsafe when the bytes come from a source the application doesn't fully control — including persistence backends that have a separate access model (SQLite files, Redis, S3, filesystems). An attacker with write access to the backend can plant a malicious __reduce__ payload that triggers code execution when state is loaded. The allowlist closes that primitive.

Tests

8 tests in tests/integrations/serde/test_pickle.py (was 1):

  • _is_allowed truth table for allowlist matching
  • Malicious __reduce__ payload (calling a sentinel function via __reduce__) is blocked when an allowlist is set; sentinel never runs
  • Legitimate object roundtrips when its (module, qualname) is on the allowlist
  • Module-level set_pickle_serde_allowlist applies to fresh registrations
  • Instance-level allowlist (at register_type_to_pickle) overrides module default
  • Per-call allowlist kwarg overrides both
  • Legacy path (no allowlist) still works and emits the SecurityWarning exactly once per registration site

Wider tests/integrations/serde/ + tests/core/test_serde.py runs 24/24 green.

Introduces a configurable allowlist of (module, qualname) pairs for the
pickle-based deserializer registered by register_type_to_pickle().  When
an allowlist is configured, deserialization uses a restricted unpickler
that refuses to import classes outside the allowlist and raises
pickle.UnpicklingError instead.  When no allowlist is configured,
behavior remains backward-compatible and a runtime warning is emitted
once per call site to encourage adoption.

The allowlist can be supplied (a) per-call at deserialize time, (b)
per-registration via register_type_to_pickle(cls, allowlist=...), or (c)
process-wide via set_pickle_serde_allowlist([...]).
@github-actions github-actions Bot added the area/integrations External integrations (LLMs, frameworks) label Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/integrations External integrations (LLMs, frameworks)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant