Author: CTO, UnboxAPI
Date: 2026-05-27
Scope: Public release of the safety-proxy / context-injection interface
skeleton (Python interface definitions + one trivial reference hook) under
Apache-2.0 to UnboxAPI-SafetyProxy. This memo covers risks introduced by
publishing the skeleton, not risks of operating the proprietary UnboxAPI
production runtime.
| Asset | Owner | Trust level |
|---|---|---|
Interface definitions (interfaces.py) |
UnboxAPI (publisher) | Trusted authorship; integrity guaranteed by signed commit + signed tag + Sigstore attestation. |
LoggingHook reference implementation |
UnboxAPI (publisher) | Trusted authorship; ships as example only. |
CallContext fields at runtime (produced by callers) |
Third-party callers | Untrusted. Every field must be treated as untrusted data. |
| Production rule library, classifiers, spend-cap logic | UnboxAPI | Not shipped. Interface-only release. |
| Consumer hook implementations (derived works) | Third-party developers | Out of scope — they implement the interface; we cannot audit their logic. |
The trust boundary is clear: we publish an interface and one trivial example hook; we do not publish production safety logic. Third parties that implement hooks or build on this library own their own security posture.
Risk: A developer reads "safety proxy" in the repo name or PyPI description
and integrates LoggingHook as a real safety gate, believing it blocks
malicious tool calls. This is a misuse scenario enabled by the name and framing
of the repo.
Attack path:
- Developer
pip install unboxapi-safety-proxy. - Registers
LoggingHook. - Ships to production believing the proxy "has safety rules."
- Every call is permitted regardless of content.
Mitigations:
- Prominent
⚠ NOT PRODUCTION SAFETY ⚠banner at the top of README (first visible content; non-negotiable). HookAction.ALLOWreason string explicitly reads"unconditional ALLOW — NOT a security decision"so log inspection reveals the lack of enforcement.LoggingHookdocstring contains "performs zero security evaluation and is not suitable for use as a safety control" (verbatim).pyproject.tomldescription field reads"NOT PRODUCTION SAFETY — see README".- SECURITY.md opens with the
⚠ NOT PRODUCTION SAFETY ⚠callout. - No hook in this repo is named, documented, or structured to resemble a
blocking rule. The reference hook name is
Logging, notSafety,Filter, orGuard.
Residual risk: A developer who ignores all warnings can still misuse this library. That risk is inherent to publishing any open-source skeleton; we mitigate by maximising warning surface.
Risk: The CallContext fields (tool_name, tool_args, tenant_id,
metadata) originate from callers and may contain adversarial content.
A hook implementation that passes these fields into an LLM system prompt,
a format string, or a shell command is vulnerable to injection.
Attack paths (reference hook — LoggingHook):
- T2.1 Log injection via format string: If a hook used
%sor f-string interpolation to build a log message fromcontext.tool_name, an attacker could inject log-forging content (e.g. newlines, fake structured log fields). Mitigation:LoggingHook.evaluatelogscontext.tool_nameviaextra={}structured logging, not into the message format string. The docstring explicitly warns against changing this pattern. - T2.2 Upstream injection in production hooks (not shipped): Any hook
that feeds
contextfields into an LLM prompt must treat them as user-role, bounded, escaped input — never as system instructions. Codified in theSafetyHook.evaluatedocstring:"Treat every field of context as untrusted data"and"Pass context fields directly into format strings or system prompts"is listed as a MUST NOT.
Mitigations (shipped in this skeleton):
CallContextisfrozen=True(immutable dataclass) — hooks cannot mutate the context and leave poisoned state for downstream hooks.SafetyHook.evaluatedocstring explicitly lists injection vectors in MUST NOT section.LoggingHookuses structured logging pattern resistant to log injection.
Residual risk: Third-party hook implementations are outside our control.
The interfaces.py docstrings are the primary mitigation available to us;
we cannot audit consumer code.
v0.1.0 ships zero runtime dependencies (dependencies = [] in
pyproject.toml). Standard library only.
Build-time dependencies:
setuptools>=68andwheel(build only, not installed in consumer envs).mypy==1.10.0(CI type-check; pinned; not a runtime dep).
CI tooling:
gitleaks8.21.2 binary (pinned; hash-verified via Releases tag).osv-scanner1.8.5 binary (pinned; hash-verified via Releases tag).semgrep/semgrepcontainer (pulled at CI time; SBOM-of-nothing since no lockfile shipped).
Mitigations:
- Zero runtime dependency surface at v0.1.0.
- Dependabot enabled on the repo for the moment a runtime dep lands.
- CI tool versions pinned by release tag.
- SBOM (CycloneDX) published as a release asset; enumerates source files + LICENSE + zero transitive deps.
- Branch protection + CODEOWNERS prevent unapproved dependency additions.
Residual risk: CI action tags (actions/checkout@v4, etc.) are major-
version-mutable. This is a low-severity supply-chain vector accepted at
v0.1.0 (logged as RA-1); pin to commit SHAs in v0.1.1.
Risk: Publishing the safety-proxy interface reveals architectural patterns of the proprietary runtime, potentially helping competitors reverse-engineer the moat.
Assessment: The interface (hookable lifecycle points, CallContext,
HookResult) is commodity architecture. The moat is:
- The production rule library (not shipped).
- Vetted vertical SemanticMaps (not shipped).
- Prompt-injection classifiers (not shipped).
- EU AI Act disclosure wrappers (not shipped).
Publishing the interface shape transfers none of the moat. Reviewed against [DHA-8 §A.3] and [DHA-5 §2.2].
Additionally, gitleaks runs on the full commit range before push to
guarantee no accidental secrets or internal endpoint references land in the
public repo.
Risk: Third parties implement SafetyHook and introduce vulnerabilities
(prompt injection, SSRF, command injection) in their own hook code. These
vulnerabilities are not in this repo but may be attributed to it.
Mitigations we own:
SafetyHook.evaluatedocstring is explicit about MUST NOT patterns (blocking I/O, process spawning, LLM injection, context mutation).CallContextimmutability prevents one hook poisoning context for another.- SECURITY.md has a coordinated-disclosure path so third-party vulnerabilities can be disclosed to us privately and we can issue advisory guidance.
Residual risk: We cannot audit third-party hooks. The docstring mitigations are best-effort guidance.
| ID | Risk | Severity | Accepted? | Fix-by |
|---|---|---|---|---|
| RA-1 | CI action tags pinned to major version, not commit SHA | Low | Yes — no runtime artifact impact | v0.1.1 |
| RA-2 | PGP release key not yet minted | Low | Yes — plaintext security@unboxapi.pro is operable |
v0.1.1 |
| RA-3 | CODEOWNERS literal placeholder must be substituted pre-push | Low → mitigated by runbook hard-fail | Yes | First step of push runbook |
| RA-4 | Third-party hook implementations not auditable | Inherent | Yes — docstring mitigations are best-effort | Ongoing |
-
gitleaksclean on full commit range — zero findings -
osv-scannerclean — zero High/Critical -
semgrep(p/owasp-top-ten+p/supply-chain+p/python) clean — zero High/Critical -
mypy --strictpasses on package -
/security-reviewskill run on final diff — all High/Critical addressed or justified - License/IP review — Apache-2.0; no GPL/AGPL transitives; no copied snippets without attribution
- Branch protection on
main: required PR review, required status checks, no direct push, no force-push, linear history - Signed commits enforced; release tag signed
- Sigstore artifact attestation on release assets
- CycloneDX SBOM published as release asset
- SECURITY.md published; coordinated-disclosure contact active
- CODEOWNERS
@<cto-github-handle>substituted (HARD-FAIL if skipped) - Dependabot + secret scanning + Advanced Security enabled
- CEO sign-off recorded as comment on DHA-28
- Founder sign-off recorded as comment on DHA-28 (founder-authority on public push)
Memo status: ready for CEO + Founder review as item 1 of the Plan v2 §3.0 security gate.