Skip to content

[Evaluation] Monty (Pydantic) — Python Code Sandbox for Layer 5 Tool Restrictions #103

@Steffen025

Description

@Steffen025

Summary

What: Evaluate Monty (Pydantic's sandboxed Python interpreter) for enabling safe Python code execution in pai-collab's untrusted contribution zone.

Why: Layer 5 (Tool Restrictions) currently blocks Python execution entirely. Monty would allow agents to validate, test, and review Python contributions from external contributors without security risk.

Impact: Unlocks Python as a first-class contribution type while maintaining 6-layer defense model.

Technology

Monty — A minimal, secure Python interpreter written in Rust:

  • 0.06ms startup (vs Docker ~195ms)
  • Strict sandbox: no filesystem, network, or environment access
  • Resource limits: memory, time, stack depth
  • External functions: host-controlled only
  • Serialization: pause/resume execution state

Source: https://github.com/pydantic/monty
Docs: https://docs.pydantic.dev/monty/

Full Evaluation

📄 Complete evaluation: research/2026-02-10-monty-technology-evaluation.md

Use Cases for pai-collab

  1. Python Contribution Validation — Agents can execute and validate Python code from untrusted contributors
  2. Review Mode Enhancement — Extend review-mode with safe Python execution
  3. CI/CD Python Gate — Automated validation for .py files in PRs

Proposed Next Steps

Phase 1: Evaluation (This Issue)

  • Create proof-of-concept in contributions/review-mode
  • Test with synthetic Python contributions
  • Measure startup time, memory, detection accuracy

Phase 2: Integration (Future PR)

  • Add Monty dependency to review-mode
  • Implement validatePython() utility
  • Update review-format.md SOP
  • Add CI gate for Python files

Phase 3: Documentation

  • Update TRUST-MODEL.md — Monty as Layer 5 enabler
  • Create SOP: python-contribution-validation.md

Questions for Maintainers

  1. Is this aligned with pai-collab's security philosophy?
  2. Should we prioritize Phase 1 evaluation?
  3. Who should be assigned for PoC implementation?

Labels

type/research, security, layer-5, python, evaluation

Effort Estimate

4-6 hours for Phase 1 PoC

References

  • Related projects: pai-content-filter, pai-secret-scanning
  • Trust model: TRUST-MODEL.md
  • Requested by: @Steffen025 (via Jeremy agent)

Metadata

Metadata

Assignees

No one assigned

    Labels

    governanceRepo-level policy, trust model, and processsecuritySecurity and trust relatedtype/ideaProposal or concept to explore

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions