[Evaluation] Monty (Pydantic) — Python Code Sandbox for Layer 5 Tool Restrictions

## Summary

**What:** Evaluate Monty (Pydantic's sandboxed Python interpreter) for enabling safe Python code execution in pai-collab's untrusted contribution zone.

**Why:** Layer 5 (Tool Restrictions) currently blocks Python execution entirely. Monty would allow agents to validate, test, and review Python contributions from external contributors without security risk.

**Impact:** Unlocks Python as a first-class contribution type while maintaining 6-layer defense model.

## Technology

**Monty** — A minimal, secure Python interpreter written in Rust:
- 0.06ms startup (vs Docker ~195ms)
- Strict sandbox: no filesystem, network, or environment access
- Resource limits: memory, time, stack depth
- External functions: host-controlled only
- Serialization: pause/resume execution state

**Source:** https://github.com/pydantic/monty
**Docs:** https://docs.pydantic.dev/monty/

## Full Evaluation

📄 **Complete evaluation:** [research/2026-02-10-monty-technology-evaluation.md](../research/2026-02-10-monty-technology-evaluation.md)

## Use Cases for pai-collab

1. **Python Contribution Validation** — Agents can execute and validate Python code from untrusted contributors
2. **Review Mode Enhancement** — Extend review-mode with safe Python execution
3. **CI/CD Python Gate** — Automated validation for `.py` files in PRs

## Proposed Next Steps

### Phase 1: Evaluation (This Issue)
- [ ] Create proof-of-concept in `contributions/review-mode`
- [ ] Test with synthetic Python contributions
- [ ] Measure startup time, memory, detection accuracy

### Phase 2: Integration (Future PR)
- [ ] Add Monty dependency to `review-mode`
- [ ] Implement `validatePython()` utility
- [ ] Update `review-format.md` SOP
- [ ] Add CI gate for Python files

### Phase 3: Documentation
- [ ] Update `TRUST-MODEL.md` — Monty as Layer 5 enabler
- [ ] Create SOP: `python-contribution-validation.md`

## Questions for Maintainers

1. Is this aligned with pai-collab's security philosophy?
2. Should we prioritize Phase 1 evaluation?
3. Who should be assigned for PoC implementation?

## Labels

`type/research`, `security`, `layer-5`, `python`, `evaluation`

## Effort Estimate

4-6 hours for Phase 1 PoC

## References

- Related projects: pai-content-filter, pai-secret-scanning
- Trust model: [TRUST-MODEL.md](../TRUST-MODEL.md)
- Requested by: @Steffen025 (via Jeremy agent)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evaluation] Monty (Pydantic) — Python Code Sandbox for Layer 5 Tool Restrictions #103

Summary

Technology

Full Evaluation

Use Cases for pai-collab

Proposed Next Steps

Phase 1: Evaluation (This Issue)

Phase 2: Integration (Future PR)

Phase 3: Documentation

Questions for Maintainers

Labels

Effort Estimate

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Evaluation] Monty (Pydantic) — Python Code Sandbox for Layer 5 Tool Restrictions #103

Description

Summary

Technology

Full Evaluation

Use Cases for pai-collab

Proposed Next Steps

Phase 1: Evaluation (This Issue)

Phase 2: Integration (Future PR)

Phase 3: Documentation

Questions for Maintainers

Labels

Effort Estimate

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions