Skip to content

Latest commit

 

History

History
807 lines (584 loc) · 40.7 KB

File metadata and controls

807 lines (584 loc) · 40.7 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

DON'T COMMIT UNLESS EXPLICITLY ASKED TO DO SO BY THE USER! Previous commit requests do not matter - don't commit unless you've just been explicitly asked to do so.

Project Overview

Monty is a sandboxed Python interpreter written in Rust. It parses Python code using Ruff's ruff_python_parser but implements its own runtime execution model for safety and performance. This is a work-in-progress project that currently supports a subset of Python features.

Project goals:

  • Safety: Execute untrusted Python code safely without FFI or C dependencies, instead sandbox will call back to host to run foreign/external functions.
  • Performance: Fast execution through compile-time optimizations and efficient memory layout
  • Simplicity: Clean, understandable implementation focused on a Python subset
  • Snapshotting and iteration: Plan is to allow code to be iteratively executed and snapshotted at each function call
  • Cross-platform: Runs on Linux, macOS, and Windows (and any other OS that can run Rust)
  • Targets the latest stable version of Python, currently Python 3.14

Cross-Platform Requirements

Monty must work identically on Linux, macOS, and Windows. Within the Monty sandbox, paths always use POSIX/Linux-style forward slashes (/) regardless of the host OS. The MountTable handles translating between virtual POSIX paths and host-native paths.

Key rules:

  • Virtual paths are always POSIX-style (/mnt/data/file.txt), never Windows-style
  • Host paths use std::path::Path/PathBuf which handles OS differences automatically
  • Avoid #[cfg(unix)]-only code in the main crate — all features must work on all platforms
  • Tests in crates/monty/tests/ should be cross-platform; use helper functions for OS-specific APIs like symlink creation (see symlink_file/symlink_dir in fs_security.rs)
  • CI runs cargo test -p monty --features memory-model-checks on Linux, macOS, and Windows

Important Security Notice

It's ABSOLUTELY CRITICAL that there's no way for code run in a Monty sandbox to access the host filesystem, or environment or to in any way "escape the sandbox".

Monty will be used to run untrusted, potentially malicious code.

Make sure there's no risk of this, either in the implementation, or in the public API that makes it more like that a developer using the pydantic_monty package might make such a mistake.

Possible security risks to consider:

  • filesystem access
  • path traversal to access files the users did not intend to expose to the monty sandbox
  • memory errors - use of unsafe memory operations
  • excessive memory usage - evading monty's resource limits
  • infinite loops - evading monty's resource limits
  • network access - sockets, HTTP requests
  • subprocess/shell execution - os.system, subprocess, etc.
  • import system abuse - importing modules with side effects or accessing __import__
  • external function/callback misuse - callbacks run in host environment
  • deserialization attacks - loading untrusted serialized Monty/snapshot data
  • regex/string DoS - catastrophic backtracking or operations bypassing limits
  • information leakage via timing or error messages
  • Python/Javascript/Rust APIs that accidentally allow developers to expose their host to monty code

Filesystem Mounts (crates/monty/src/fs/)

The MountTable allows mounting real host directories into the sandbox at virtual paths, with configurable access modes (ReadWrite, ReadOnly, OverlayMemory).

CRITICAL SECURITY INVARIANT: The monty runtime MUST NEVER read, write, or obtain any information about any file or directory outside the specific directory that is mounted. This is enforced by:

  • Path canonicalization after mapping virtual → host paths
  • Boundary checks verifying canonical paths remain within the mount
  • Symlink resolution that rejects links pointing outside the mount
  • Virtual-space normalization that prevents .. escape
  • Resolve and Absolute returning virtual paths, never host paths
  • Null byte rejection in all paths

All path resolution goes through fs::path_security::resolve_path() which is the sole security boundary. Changes to path_security.rs require careful security review.

heap.rs and path_security.rs are the two most security-critical files in the codebase.

Subprocess isolation (monty-proto, monty --subprocess, monty-pool)

A monty process can never be made fully crash-proof against memory errors (stack overflow aborts, allocator aborts), so monty can run as isolated worker subprocesses:

  • crates/monty-proto — the wire protocol: a protobuf schema (proto/monty/v1/monty.proto), checked-in prost-generated code (regenerate with make generate-proto; CI enforces sync via make check-proto), 4-byte LE length-prefixed framing, and fallible conversions between wire types and MontyException/etc. Values are special-cased for performance: the monty.v1.MontyObject message is mapped via prost extern_path onto WireObject (src/wire.rs), a hand-written prost::Message impl that encodes borrowed MontyObjects and validates while decoding — no mirror struct, no deep clone on the hot path. tests/differential.rs proves it byte-compatible against a fully prost-generated oracle (tests/oracle/, regenerated and CI-checked together with the main codegen). Parents must treat frames from a (possibly compromised) child as untrusted — wire decoding and proto→Rust conversions validate everything and never panic.
  • monty --subprocess (in crates/monty-cli/src/subprocess.rs) — the child: reads framed requests on stdin, writes framed events on stdout, serving one REPL session per checkout. Strict alternation: one request in, zero or more streamed Print events out, then exactly one turn-ending event.
  • crates/monty-pool — the parent: an elastic pool of workers with crash detection/replacement and a watchdog enforcing a hard per-turn timeout.
  • pydantic_monty.Monty / pydantic_monty.AsyncMonty — the ONLY Python execution surface (there is no in-process Python API): sync and async pools of workers (with Monty() as pool: with pool.checkout() as session: session.feed_run(...), and the async with / await feed_run equivalents).

The contract for crash detection: a child that exits or EOFs without a FatalError event crashed hard; the parent discards it and replaces it. See limitations/pool-architecture.md for host-API divergences from in-process execution.

Bytecode VM Architecture

Monty is implemented as a bytecode VM, same as CPython.

HeapReader API — Safe Heap Access

All heap-allocated Python objects (lists, dicts, strings, etc.) are stored in a paged arena (Heap). The HeapReader API provides compile-time safe access to heap data. This is the primary mechanism for reading and mutating heap objects throughout the codebase.

heap.rs is a critical safety boundary. It contains unsafe code that underpins the soundness of the entire HeapReader/HeapRead system (pointer arithmetic, UnsafeCell access, reader-count invariants). Do NOT modify heap.rs without explicit user approval. Changes to this file require careful review of the safety invariants documented in the code comments.

Core concepts

  • HeapReader<'a, T> — A scoped borrow of the heap that produces HeapRead handles. Created exclusively via HeapReader::with, which takes a for<'a> closure bound makes the lifetime 'a universally quantified, so HeapRead pointers cannot escape the closure.
  • HeapRead<'a, T> — A typed handle to a specific heap entry. Created by heap.read(id) which returns a HeapReadOutput<'a> enum that you match on. Tracks a reader count that prevents the entry from being freed while the handle exists.
  • HeapReadOutput<'a> — Enum over all HeapRead<'a, T> variants (one per HeapData variant). Pattern match to get the typed handle.

Reading and mutating heap data

// Scoped heap access.
// The second argument allows for extra data to be
// passed into the closure, will be rebranded as
// `&'a mut ...` to match the `'a` lifetime of the
// `HeapRead` handle, so the closure can have additional
// context while still having the `for <'a>` safety guarantee.
HeapReader::with(heap, &mut (), |heap, ()| {
    let output = heap.read(some_id);  // returns HeapReadOutput<'a>
    match output {
        HeapReadOutput::List(list) => {
            let items = list.get(heap);           // &List, borrows heap immutably
            let items_mut = list.get_mut(heap);   // &mut List, borrows heap mutably
        }
        _ => { /* ... */ }
    }
})

Key borrowing rules:

  • get(&self, &HeapReader)&T — immutable access, prevents heap mutation while reference lives
  • get_mut(&mut self, &mut HeapReader)&mut T — mutable access, exclusive
  • Multiple HeapRead handles can coexist, but only one can be accessed via get_mut at a time
  • dec_ref() panics if any reader is active — prevents use-after-free

Implementing type methods with HeapRead

Type methods are implemented as impl<'h> HeapRead<'h, T> blocks. The PyTrait<'h> trait provides the common interface:

// Methods on a heap type
impl<'h> HeapRead<'h, List> {
    pub fn append(&mut self, vm: &mut VM<'h, impl ResourceTracker>, item: Value) -> RunResult<()> {
        self.get_mut(vm.heap).items.push(item);
        Ok(())
    }
}

// PyTrait implementation
impl<'h> PyTrait<'h> for HeapRead<'h, List> {
    fn py_type(&self, vm: &VM<'h, impl ResourceTracker>) -> Type { Type::List }
    fn py_len(&self, vm: &VM<'h, impl ResourceTracker>) -> Option<usize> {
        Some(self.get(vm.heap).items.len())
    }
    // ...
}

Reference Count Safety

All types that implement DropWithHeap hold heap references and must be cleaned up correctly on every code path — not just the happy path, but also early returns via ?, continue, conditional branches, etc. A missed drop_with_heap on any branch leaks reference counts. There are three mechanisms for ensuring this, listed in order of preference:

1. defer_drop! macro (preferred)

The simplest and safest approach. Use defer_drop! (or defer_drop_mut! when mutable access to the value is needed) to bind a value into a guard that automatically drops it when scope exits — whether that's normal completion, early return via ?, continue, or any other branch. The macro rebinds the value and heap variables as borrows from the guard, so you keep using them by name as before:

let value = self.pop();
defer_drop!(value, heap);          // value is now &Value, heap is now &mut Heap
let result = value.py_repr(heap)?; // guard handles cleanup on all paths

Beyond safety, defer_drop! is often much more concise than inserting drop_with_heap calls in every branch of complex control flow.

defer_drop! gives you an immutable reference to the value. Use defer_drop_mut! when you need a mutable reference (e.g. iterators, values you may swap):

let iter = vm.heap.get_iter(iter_ref);
defer_drop_mut!(iter, vm);
while let Some(item) = iter.for_next(vm)? { ... }

Limitation: because the macro rebinds the heap, it cannot be used inside &mut self methods on the VM where self owns the heap — first assign let this = self; and pass this instead.

2. HeapGuard (when you need control over the value's fate)

Use HeapGuard directly when defer_drop! is too restrictive — specifically when you need to conditionally extract the value instead of dropping it. HeapGuard provides into_inner() and into_parts() to reclaim ownership, while its Drop impl still guarantees cleanup on all other paths:

// HeapGuard needed here because on success we push lhs back onto the stack
// instead of dropping it
let mut lhs_guard = HeapGuard::new(self.pop(), self);
let (lhs, this) = lhs_guard.as_parts_mut();

if lhs.py_iadd(rhs, this.heap)? {
    let (lhs, this) = lhs_guard.into_parts(); // reclaim lhs, don't drop
    this.push(lhs);
    return Ok(());
}
// otherwise lhs_guard drops lhs automatically at scope exit

3. Manual drop_with_heap (for trivially simple cases)

For very simple cases with a single linear code path and no branching between acquiring and releasing the value, a direct drop_with_heap call is fine:

let iter = self.pop();
iter.drop_with_heap(self); // single path, no branching

Avoid manual drop_with_heap whenever there are multiple code paths (branching, ?, continue, early returns) between acquiring and releasing the value — that is exactly where defer_drop! or HeapGuard prevent leaks by guaranteeing cleanup on every path.

Resource-tracked string construction (StringBuilder)

Any code that builds a String whose final size is not already bounded by an already-tracked input must use StringBuilder (in crates/monty/src/string_builder.rs) rather than String::with_capacity(...).push(...). Intermediate Strings live on the Rust heap outside the ResourceTracker, so a loop-built string can OOM the host before allocate_string ever consults the tracker — this is exactly the class of bug that hit str.expandtabs (huge tabsize amplifying a single tab into a multi-gigabyte allocation).

StringBuilder actively reserves bytes with the tracker (via on_grow) as it grows, not just previews. This matters for nested builds: a str.join that invokes user-defined __str__ methods, an f-string spec that evaluates an inner expression, etc. With a preview-only check, each builder would only see the committed memory and miss the outer's in-progress buffer — together they could exceed the limit. Reservations are released on Drop (cleanup on ? / early-return paths) or in finish(heap) (which folds the handoff to allocate_string into the builder so the final size is re-added via on_allocate exactly once). Growth is amortized via 2× doubling:

// Bounded size known up front (padding to a given width):
let mut builder = StringBuilder::with_capacity(width * fillchar.len_utf8(), vm.heap.tracker())?;
builder.push_str(s)?;
for _ in 0..pad { builder.push(fillchar)?; }
builder.finish(vm.heap)

// Size not bounded up front (e.g. attacker-controlled multiplier):
let mut builder = StringBuilder::new(vm.heap.tracker());
for c in input.chars() { builder.push(c)?; }
builder.finish(vm.heap)

StringBuilder also implements fmt::Write, so write!(builder, ...), format_args!, and the existing py_repr_fmt(f, ...) machinery work against a tracker-protected buffer. fmt::Error is payload-free, so any ResourceError raised by a write is stashed on the builder and surfaced by finish(heap) — callers using write! don't need to thread the tracker error themselves.

When the input is already bounded (e.g. s.to_lowercase(), slicing, to_owned() of an existing tracked string), passing a plain String / &str to allocate_string is fine — the result is bounded by a known multiple of an already-tracked input, so no amplification is possible.

Dev Commands

IMPORTANT: before running cargo build or cargo run, it is likely necessary to run make install-py to ensure that the Python virtual environment is available for build.

Instead use the following make commands:

make install-py           Install python dependencies
make install-js           Install JS package dependencies
make install              Install the package, dependencies, and pre-commit for local development
make dev-py               Install the python package for development
make build-js             Build the JS package (compile TypeScript)
make lint-js              Lint JS code with oxlint
make test-js              Test the JS package (builds the monty binary the workers run)
make dev-py-release       Install the python package for development with a release build
make build-wasm           Build the wasm artifacts (requires the wasm32-wasip1-threads toolchain)
make test-wasm            Test the in-process API against the wasm build (requires a prior build-wasm)
make dev-py-pgo           Install the python package for development with profile-guided optimization
make format-rs            Format Rust code with fmt
make format-py            Format Python code - WARNING be careful about this command as it may modify code and break tests silently!
make format-js            Format JS code with prettier
make format               Format Rust code, this does not format Python code as we have to be careful with that
make lint-rs              Lint Rust code with clippy and import checks
make clippy-fix           Fix Rust code with clippy
make generate-proto       Regenerate monty-proto's checked-in code from the .proto schema
make check-proto          Verify monty-proto's checked-in code matches the .proto schema
make lint-py              Lint Python code with ruff
make lint                 Lint the code with ruff and clippy
make format-lint-rs       Format and lint Rust code with fmt and clippy
make format-lint-py       Format and lint Python code with ruff
make test-no-features     Run rust tests without any features enabled
make test-memory-model-checks Run rust tests with memory-model-checks enabled - THIS IS EXTREMELY SLOW, SHOULD MOSTLY BE RUN IN CI OR IF ABSOLUTELY NECESSARY
make test-ref-count-return Run rust tests with ref-count-return enabled
make test-cases           Run tests cases only
make test-type-checking   Run rust tests on monty_type_checking
make pytest               Run Python tests with pytest
make test-py              Build the python package (debug profile) and run tests
make test-docs            Test docs examples only
make test                 Run rust tests
make testcov              Run Rust tests with coverage, print table, and generate HTML report
make complete-tests       Fill in incomplete test expectations using CPython
make update-typeshed      Update vendored typeshed from upstream
make bench                Run benchmarks
make bench-pool           Run subprocess pool benchmarks (spawn, checkout, wire round-trips)
make dev-bench            Run benchmarks to test with dev profile
make profile              Profile the code with pprof and generate flamegraphs
make type-sizes           Write type sizes for the crate to ./type-sizes.txt (requires nightly and top-type-sizes)
make main                 run linting and the most important tests
make help                 Show this help (usage: make help)

Use the /python-playground skill to check cpython and monty behavior.

Releasing

See RELEASING.md for the release process.

Exception

It's important that exceptions raised/returned by this library match those raised by Python.

Wherever you see an Exception with a repeated message, create a dedicated method to create that exception src/exceptions.rs.

When writing exception messages, always check src/exceptions.rs for existing methods to generate that message.

Argument extraction — ALWAYS use #[derive(FromArgs)]

Whenever you add or modify a Rust-side function, method, type constructor, or OsFunction handler that takes anything beyond the trivial 0/1/2-positional shapes already covered by ArgValues::check_zero_args / get_one_arg / get_two_args / get_zero_one_arg / into_pos_only, you MUST use #[derive(FromArgs)] (re-exported as monty::args::FromArgs).

Hand-written args.into_parts() loops are not acceptable for any signature that has multiple positionals with defaults, keyword arguments, *args, or **kwargs — they are a known source of reference-count leaks, divergent error messages, and duplicated boilerplate. FromArgs generates the dispatch, conflict detection, default handling, and refcount cleanup mechanically. See crates/monty-macros/README.md for the full attribute surface (c_error, c_error_named, pos_only, kw_only, varargs, varkwargs, default, static_string, …) and how to extend the macro or add new FromValue impls.

If a callsite needs custom per-argument coercion (e.g. value_to_float for math, a TimeDelta type check, a bytes-or-str union), declare the field as Value and run the coercion in the function body after the from_args call — the macro still handles the parsing, your code just adds the final validation step.

Code style

Avoid local imports, unless there's a very good reason, all imports should be at the top of the file.

Avoid fn my_func<T: MyTrait>(..., param: T) style function definitions, STRONGLY prefer fn my_func(param: impl MyTrait) syntax since changes are more localized. This includes in trait definitions and implementations.

Also avoid using functions and structs via a path like std::borrow::Cow::Owned(...), instead import Cow globally with use std::borrow::Cow;.

STRONGLY prefer expression-oriented style: use if/match as expressions with a trailing (tail) expression rather than early return with a guard clause. E.g. prefer

if cond { a } else { b }

over

if cond {
    return a;
}
b

This applies to function bodies and block expressions alike. Only use early return when it genuinely simplifies control flow (e.g. several guard clauses at the top of a function).

This applies even more strongly to long if cond { ... } else if cond2 { ... } ... else { ... } chains — keep them as a single expression yielding a value, rather than scattering return statements through each branch.

NEVER use allow() in rust lint markers, instead use expect() so any unnecessary markers are removed. E.g. use

#[expect(clippy::too_many_arguments)]

NOT!

#[allow(clippy::too_many_arguments)]

Docstrings and comments.

IMPORTANT: every struct, enum and function should have an informative but concise docstring to explain what it does and why; and any considerations or potential foot-guns of using that type.

The only exception is trait implementation methods where a docstring is not necessary if the method is self-explanatory.

It's important that docstrings cover the motivation and primary usage patterns of code, not just the simple "what it does".

Similarly, you should add comments to code, especially if the code is complex or esoteric.

Only add examples to docstrings of public functions and structs, examples should be <=8 lines, if the example is more, remove it.

If you add example code to docstrings, it must be run in tests. NEVER add examples that are ignored.

If you encounter a comment or docstring that's out of date - you MUST update it to be correct.

Similarly, if you encounter code that has no docstrings or comments, or they are minimal, you should add more detail.

Always use single back-ticks in python docstrings - they should be markdown, not rst!

NOTE: COMMENTS AND DOCSTRINGS ARE EXTREMELY IMPORTANT TO THE LONG TERM HEALTH OF THE PROJECT.

NOTE: COMMENTS AND DOCSTRINGS SHOULD BE CONCISE - EXCESSIVELY VERBOSE DOCSTRINGS MAKE THE CODE HARDER TO READ AND MAINTAIN!

Tests

Do NOT write tests within modules unless explicitly prompted to do so.

Tests should live in the relevant tests/ directory.

Commands:

# Build the project
cargo build

# Run tests (this is the best way to run all tests as it enables the memory-model-checks feature)
make test-memory-model-checks

# Run crates/monty/test_cases tests only
make test-cases

# Run a specific test
cargo test -p monty --test TEST --features memory-model-checks str__ops
cargo run -p monty-datatest --features memory-model-checks str__ops

# Run the interpreter on a Python file
cargo run -- <file.py>

See more test commands above.

Experimentation and Playground

Read Makefile for other useful commands.

You can use the ./playground directory (excluded from git, create with mkdir -p playground) to write files when you want to experiment by running a file with cpython or monty, e.g.:

  • python3 playground/test.py to run the file with cpython
  • cargo run -- playground/test.py to run the file with monty

DO NOT use /tmp or pipe code to the interpreter, or use python3 -c ... as it requires extra permissions and can slow you down!

More details in the "python-playground" skill.

Test File Structure

Most functionality should be tested via python files in the crates/monty/test_cases directory.

DO NOT create many small test files. This would be unmaintainable.

ALWAYS consolidate related tests into single files using multiple assert statements. Follow crates/monty/test_cases/fstring__all.py as the gold standard pattern:

# === Section name ===
# brief comment if needed
assert condition, 'descriptive message'
assert another_condition, 'another descriptive message'

# === Next section ===
x = setup_value
assert x == expected, 'test description'

Each assert should have a descriptive message.

Do NOT Write tests like assert 'thing' in msg it's lazy and inexact unless explicitly told to do so, instead write tests like assert msg == 'expected message' to ensure clarity and accuracy and most importantly, to identify differences between Monty and CPython.

When to Create Separate Test Files

Only create a separate test file when you MUST use one of these special expectation formats:

  • """TRACEBACK:...""" - Test expects an exception with full traceback (PREFERRED for error tests)
  • # Raise=Exception('message') - Test expects an exception without traceback verification - NOT RECOMMENDED, use TRACEBACK instead
  • # ref-counts={...} - Test checks reference counts (special mode)
  • you're writing tests for a different behavior or section of the language

For everything else, add asserts to an existing test file or create ONE consolidated file for the feature.

File Naming

Name files by feature, not by micro-variant:

  • str__ops.py - all string operations (add, iadd, len, etc.)
  • list__methods.py - all list method tests
  • str__add_basic.py, str__add_empty.py, str__add_multiple.py - TOO GRANULAR

Expectation Formats (use sparingly)

Only use these when assert won't work (on last line of file):

  • # Return=value - Check repr() output (prefer assert instead)
  • # Return.str=value - Check str() output (prefer assert instead)
  • # Return.type=typename - Check type() output (prefer assert instead)
  • # Raise=Exception('message') - Expect exception without traceback (REQUIRES separate file)
  • """TRACEBACK:...""" - Expect exception with full traceback (PREFERRED over # Raise=)
  • # ref-counts={...} - Check reference counts (REQUIRES separate file)
  • No expectation comment - Assert-based test (PREFERRED)

Do NOT use # Return= when you could use assert instead

Traceback Tests (Preferred for Errors)

For tests that expect exceptions, prefer traceback tests over # Raise= or try / except because they verify:

  • The full traceback with all stack frames
  • Correct line numbers for each frame
  • Function names in the traceback
  • The caret markers (~) pointing to the error location

Traceback test format - add a triple-quoted string at the end of the file starting with \nTRACEBACK::

def foo():
    raise ValueError('oops')

foo()
"""
TRACEBACK:
Traceback (most recent call last):
  File "my_test.py", line 4, in <module>
    foo()
    ~~~~~
  File "my_test.py", line 2, in foo
    raise ValueError('oops')
ValueError: oops
"""

Key points:

  • The filename in the traceback should match the test file name (just the basename, not the full path)
  • Use ~ for caret markers (the test runner normalizes CPython's ^ to ~)
  • The <module> frame name is used for top-level code
  • Tests run against both Monty and CPython, so the traceback must match both

If you don't care about the traceback or it intentionally differs from cpython (e.g. for json) and you want to test multiple cases in the same file, use this style

try:
    ...
    assert False, 'expected <task> to fail'
except <ErrorType> as exc:
    assert str(exc) = '<expected exception message>'

IMPORTANT: don't just check that an exception is raised, you should always check the exception message.

IMPORTANT: DON'T BE LAZY. If the exception differs between cpython and Monty, either fix the exception message, or stop and report the problem!

Only use # Raise= when you only care about the exception type/message and not the traceback and you can't use a try/except block.

Python fixture markers

You may mark python files with:

  • # call-external to support calling external functions
  • # run-async to support running async code

NEVER MARK TESTS AS XFAIL UNDER ANY CIRCUMSTANCES!!! INSTEAD FIX THE BEHAVIOR SO THAT THE TEST PASSES.

Never mark tests as:

  • # xfail=cpython - Test is required to fail on CPython
  • # xfail=monty - Test is required to fail on Monty

NEVER MARK TESTS AS XFAIL UNDER ANY CIRCUMSTANCES!!! INSTEAD FIX THE BEHAVIOR SO THAT THE TEST PASSES.

All these markers must be at the start of comment lines to be recognized.

Other Notes

  • Prefer single quotes for strings in Python tests
  • Do NOT add # noqa or # pyright: ignore comments to test code, instead add the failing code to pyproject.toml
  • The ONLY exception is await expressions outside of async functions, where you should add # pyright: ignore
  • Run make lint-py after adding tests
  • Use make complete-tests to fill in blank expectations
  • Regression tests run via datatest-stable harness in crates/monty-datatest/src/main.rs, use make test-cases to run them

Rust integration tests and insta snapshots

In crates/*/tests/*.rs (but not crates/monty/test_cases/), use insta assert_snapshot! for multi-line strings, serialized output, error messages otherwise fuzz-checked via .contains(...), and any fixture currently compared via a hand-rolled UPDATE_EXPECT helper (use external snapshots under tests/snapshots/).

Keep assert_eq! for scalars, enums, and structural values (MontyObject, Vec, etc.), and for principled membership checks like vec.contains(...).

Workflow: write assert_snapshot!(value, @"");, then cargo insta test --accept to populate (plain INSTA_UPDATE=always does not update inline @"..." snapshots — you need the cargo insta subcommand, installed via cargo install cargo-insta). Add insta = { workspace = true } to [dev-dependencies] when introducing it to a new crate.

Python Package (pydantic-monty)

The Python package provides Python bindings for the Monty interpreter, located in crates/monty-python/. Execution always happens in monty worker subprocesses — there is no in-process execution API. The surface is Monty (sync pool) and AsyncMonty (async pool), each with pool.checkout(...) sessions driven by feed_run (a coroutine on async sessions).

Structure

  • crates/monty-python/src/ - Rust source for PyO3 bindings
  • crates/monty-python/python/pydantic_monty/_monty.pyi - Type stubs for the Python module
  • crates/monty-python/tests/ - Python tests using pytest

Building and Testing

Dependencies needed for python testing are installed in crates/monty-python/pyproject.toml. To install these dependencies, use uv sync --all-packages --only-dev.

# Build the Python package for development (required before running tests)
make dev-py

# Run Python tests
make test-py

# Or run pytest directly (after dev-py)
uv run pytest

# Run a specific test file
uv run pytest crates/monty-python/tests/test_basic.py

# Run a specific test
uv run pytest crates/monty-python/tests/test_basic.py::test_simple_expression

Python Test Guidelines

Check and follow the style of other python tests.

Make sure you put tests in the correct file.

DO NOT use python/pytest tests for monty core functionality! When testing core functionality, add tests to crates/monty/test_cases/ or crates/monty/tests/. Only use python/pytest tests for pydantic_monty functionality testing.

NEVER use class-based tests. All tests should be simple functions.

Use @pytest.mark.parametrize whenever testing multiple similar cases.

Use snapshot from inline-snapshot for all test asserts.

NEVER do the lazy assert '...' in ... instead always do assert value == snapshot(), then run the test and inline-snapshot will fill in the missing value in the snapshot() call.

Use pytest.raises for expected exceptions, like this

with pytest.raises(ValueError) as exc_info:
    session.feed_run(code, print_callback=callback)
assert exc_info.value.args[0] == snapshot('stopped at 3')

Reference Counting

Heap-allocated values (Value::Ref) use manual reference counting. Key rules:

  • Cloning: Use clone_with_heap(heap) which increments refcounts for Ref variants.
  • Dropping: Call drop_with_heap(heap) when discarding an Value that may be a Ref.

Container types (List, Tuple, Dict) also have clone_with_heap() methods.

Mutability of the heap parameter is asymmetric — do not assume the two methods take the same kind of borrow:

  • clone_with_heap takes &impl ContainsHeap (immutable). The refcount field lives behind interior mutability, so inc_ref is &self on Heap. This means you can call clone_with_heap while other immutable borrows of the heap (e.g. a HeapRead handle obtained via .get(heap)) are still live.
  • Heap::allocate is also &self for the same reason — entry storage and the allocation tracker are behind interior mutability. New heap entries can be created without a &mut Heap.
  • drop_with_heap takes &mut impl ContainsHeap, because dropping may free entries and run destructors, which mutates the heap.

If you find yourself fighting the borrow checker around clone_with_heap or allocate, the fix is almost never &mut — it is more likely that you are passing the wrong receiver (e.g. vm instead of vm.heap) or holding a &mut borrow elsewhere that should be &.

Cycle collection — Bacon–Rajan trial deletion

Reference counting alone cannot reclaim cycles. Monty uses Bacon–Rajan trial deletion (Heap::collect_cycles in crates/monty/src/heap.rs).

Resource limits: When resource limits (allocations, memory, time) are exceeded, execution terminates with a ResourceError. No guarantees are made about the state of the heap or reference counts after a resource limit is exceeded. The heap may contain orphaned objects with incorrect refcounts. This is acceptable because resource exhaustion is a terminal error - the execution context should be discarded.

JavaScript Package (@pydantic/monty, crates/monty-js/)

The JavaScript package is a napi-rs binding over monty-pool — the same Rust pool/protocol engine pydantic_monty uses — wrapped by a thin TypeScript layer. The native binding exposes turn-level primitives (NativePool, NativeSession.feed/resume*); the TypeScript drive loop answers suspension events (external functions, os callbacks, async futures) where promises are native. Pool elasticity, watchdogs, crash recovery, framing and value conversion all live in Rust.

Structure

  • crates/monty-js/src/ - Rust napi crate: pool.rs (NativePool / NativeSession over monty-pool), convert.rs (JS ↔ MontyObject), exceptions.rs, limits.rs, mount.rs, and monty_cls.rs (the legacy in-process API, the only surface available on wasm)
  • crates/monty-js/ts/ - TypeScript wrapper: pool.ts (Monty), session.ts (MontySession + drive loop), errors.ts, binary.ts (monty binary resolution), mount.ts, native.ts (turn-object typings), wasm.ts (in-process API wrapper, exported as @pydantic/monty/wasm)
  • index.js / index.d.ts - napi-generated loader (created by npm run build:napi; gitignored)
  • crates/monty-js/npm/ - generated platform packages shipping the napi .node library and the monty binary (@pydantic/monty-<platform>, selected via optionalDependencies; napi create-npm-dirs + scripts/create-platform-packages.mjs)
  • crates/monty-js/__test__/ - Tests using ava (wasm_*.spec.ts cover the in-process API and also run against the wasm build in CI)

Current API

import { Monty } from '@pydantic/monty'

await using pool = await Monty.create({ maxProcesses: 8, requestTimeout: 30 })
await using session = await pool.checkout({ typeCheck: false })

await session.feedRun('x = 21') // session state persists across feeds
const result = await session.feedRun('x * 2', {
  inputs: { y: 1 },
  externalFunctions: { fetch: async (url: string) => '...' }, // sync or async
  printCallback: (stream, text) => {},
})

Errors: MontyError (base), MontySyntaxError, MontyRuntimeError, MontyTypingError, and MontyCrashedError (worker death; pool recovers). MountDir and the os/NOT_HANDLED callback work like the Python package.

See crates/monty-js/README.md for full API documentation.

Building and Testing

make install-js   # npm install
make build-js     # napi debug build + compile TypeScript
make test-js      # builds the napi binding + debug monty binary, then runs ava
make lint-js      # oxlint
make format-js    # prettier
make smoke-test-js  # packs + installs the package and platform binary package

Tests run straight from ts/ via @oxc-node/core against the locally built .node; the workers resolve the monty binary from the workspace target/debug build automatically.

JavaScript Test Guidelines

  • Tests use ava and live in crates/monty-js/__test__/
  • Tests are written in TypeScript; use the setupPool helper from __test__/helpers.ts
  • Follow the existing test style in the __test__/ directory

WebAssembly build (@pydantic/monty/wasm)

The legacy in-process API (Monty, MontySnapshot, MontyRepl, runMontyAsync, ...) ships inside the same @pydantic/monty package under the /wasm subpath, for browsers and other environments where subprocesses are impossible (the crate's wasm32-wasip1-threads napi target; the subprocess pool is #[cfg]-gated off there). On Node.js, the subprocess pool is always preferred — a sandbox crash in the in-process API takes the host process with it. Built and tested in CI; building locally requires the wasm toolchain (make build-wasm).

Limitations documentation (./limitations/)

Every pull request that adds, changes, or removes user-visible behavior MUST land (or update) a markdown document under ./limitations/ describing how the feature DIVERGES from CPython and what subset of the CPython surface area Monty actually implements. The directory is the single source of truth for "what does Monty not do that CPython does" — module-level docstrings and inline comments are not sufficient on their own.

NOTE: ./limitations/ SHOULD ONLY INCLUDE INFORMATION ABOUT BEHAVIOR DIVERGENCES FROM CPython, not points that describe behavior that matches CPython's behavior.

One file per feature, named after the builtin / module / construct it covers (e.g. limitations/open.md, limitations/asyncio.md, limitations/bytecode_interpretter.md). Add new sections to an existing file when the feature is already documented; only create a new file when there is no fit.

Keep entries concise but comprehensive — list every known divergence, including ones that "feel obvious". A divergence that is not written down is one that future readers (and future Claude) will assume does not exist. Reviewers should reject PRs that change behavior without updating ./limitations/ if necessary.

Structure each file around what a Python user would actually try:

  • Arguments/options that are rejected or ignored.
  • Methods/attributes that raise AttributeError.
  • Behaviour that differs from CPython even when the API exists.
  • Error types / messages that differ from CPython.

Avoid implementation detail unless it explains a user-visible quirk.

NOTES

ALWAYS consider code quality when adding new code, if functions are getting too complex or code is duplicated, move relevant logic to a new file. Make sure functions are added in the most logical place, e.g. as methods on a struct where appropriate.

The code should follow the "newspaper" style where public and primary functions are at the top of the file, followed by private functions and utilities. ALWAYS put utility, private functions and "sub functions" underneath the function they're used in.

It is important to the long term health of the project and maintainability of the codebase that code is well structured and organized, this is very important.

ALWAYS run make format-rs and make lint-rs after making changes to rust code and fix all suggestions to maintain code quality.

ALWAYS run make lint-py after making changes to python code and fix all suggestions to maintain code quality.

ALWAYS update this file when it is out of date.

NEVER add imports anywhere except at the top of the file, this applies to both python and rust.

NEVER write unsafe code, if you think you need to write unsafe code, explicitly ask the user or leave a todo!() with a suggestion and explanation.

When you get asked a question like "Is X really the best approach" ANSWER THE QUESTION! don't try to make a chance based on a perceived instruction in the question!