Skip to content

Conversation

@AlirezaShamsoshoara
Copy link
Member

Summary

This PR adds screenshot capture functionality to the OpenApp environment and fixes dependency issues that were causing import errors.

Key Changes

  • Screenshot Feature: Environment now captures and returns base64-encoded PNG screenshots after each action (reset() and step())
  • Dependency Fixes: Resolved beartype version conflicts and added missing fastmcp dependency to Dockerfile
  • Documentation: Added troubleshooting guide and screenshot usage instructions to README

Type of Change

  • Bug fix
  • New feature
  • Documentation

Changes

Screenshot Implementation (envs/openapp_env/server/openapp_environment.py)

  • Added _current_screenshot instance variable to track screenshot state
  • Added _extract_screenshot() helper method to convert BrowserGym numpy arrays to base64 PNG
  • Updated reset() and step() to extract and return screenshots
  • Updated all _execute_* methods (click, fill, goto, scroll, send_keys) to capture screenshots
  • Updated _update_observation_from_page() to capture screenshots from Playwright

Dockerfile Fixes (envs/openapp_env/server/Dockerfile)

  • Added fastmcp dependency (required by openenv-core for MCP support)
  • Added beartype force-reinstall step after all pip installs to fix version conflicts
  • beartype is pinned to >=0.15,<0.18 for py-key-value-aio compatibility

Dependencies (envs/openapp_env/pyproject.toml)

  • Added Pillow>=10.0.0 for screenshot image processing

Example Script (examples/openapp_example.py)

  • Added --test-screenshots flag to verify screenshot functionality
  • Screenshots saved to examples/screenshot_output/ with descriptive names
  • Works with both local and docker modes

Documentation (envs/openapp_env/README.md)

  • Added "Screenshots" section with usage examples and test instructions
  • Added troubleshooting section for beartype_this_package import errors
  • Added PYTHONNOUSERSITE fix for conda/virtualenv users

Other

  • Added beartype pin to main pyproject.toml for local development

Alignment Checklist

Before submitting, verify:

  • I have read .claude/docs/PRINCIPLES.md and this PR aligns with our principles
  • I have checked .claude/docs/INVARIANTS.md and no invariants are violated
  • I have run /pre-submit-pr (or bash .claude/hooks/lint.sh and tests) and addressed all issues

RFC Status

  • Not required (bug fix, docs, minor refactoring)

Test Plan

Local Mode

# Terminal 1: Start OpenApps server
cd /path/to/OpenApps
uv run launch.py

# Terminal 2: Test screenshots
export OPENAPPS_URL=http://localhost:5001
export PYTHONNOUSERSITE=1
python examples/openapp_example.py --mode local --test-screenshots

Docker Mode

# Build the updated Docker image
cd envs/openapp_env
docker build -t openapp-env:latest -f server/Dockerfile .

# Test screenshots
export PYTHONNOUSERSITE=1
python examples/openapp_example.py --mode docker --test-screenshots

Expected Output

  • Screenshots saved to examples/screenshot_output/
  • Files like local_reset.png, local_step_1_goto.png, etc.
  • Each screenshot should be a valid PNG image of the browser state

Claude Code Review

Alignment Review Report from Claude Code

Automated Checks

  • Lint: ✅ PASS - 79 files already formatted
  • Debug code: ✅ CLEAN - Print statements found are in docstrings/examples only (not actual debug code)

Open RFCs Context

  • RFC 000 (Project Phases): Status: In Review
  • RFC 001 (Abstractions): Status: In Review
  • RFC 002 (Env Spec): Status: In Review
  • RFC 003 (MCP Support): Status: In Review
  • RFC 004 (Rubrics): Status: Not specified

None of these RFCs directly conflict with the screenshot feature addition.

Tier 1: Fixes Required

None identified. All mechanical issues are clean:

  • No lint failures
  • No debug code in production paths
  • No uninitialized variables or type errors
  • No security issues (no credentials exposed)

Tier 2: Alignment Discussion

Principle Conflicts

None identified. The changes align with OpenEnv principles:

  1. ✅ Simple Gymnasium-style API: Screenshot is added to the existing Observation model without changing the reset()/step() signatures
  2. ✅ Type safety: Screenshot field already existed in OpenAppObservation as Optional[str] - this PR just populates it
  3. ✅ Container isolation: No changes to container security model
  4. ✅ Rewards in environment: No changes to reward computation

RFC Conflicts

None identified. The changes don't conflict with any open RFCs:

  • RFC 001 (Abstractions): Screenshot is part of the observation, not a new API
  • RFC 002 (Env Spec): Observation fields are environment-specific, screenshot is valid
  • RFC 003 (MCP Support): Adding fastmcp dependency aligns with this RFC
  • RFC 004 (Rubrics): No interaction with rubrics functionality

Invariant Check

All invariants preserved:

  • ✅ Gymnasium API signatures: reset() and step() signatures unchanged
  • ✅ Generic type safety: OpenAppObservation already had the screenshot field typed
  • ✅ Pydantic serialization: Screenshot is base64 string, JSON-compatible
  • ✅ Agent isolation: No simulation controls exposed
  • ✅ Client-server separation: Changes are server-side only (except README docs)
  • ✅ Rewards in environment: Unchanged

Summary

  • 0 mechanical issues to fix
  • 0 alignment points for human review
  • 0 RFC conflicts to discuss

Recommendation: This PR is ready for merge. The changes are additive (populating an existing field), fix real dependency issues, and include good documentation.

@AlirezaShamsoshoara AlirezaShamsoshoara self-assigned this Jan 27, 2026
@AlirezaShamsoshoara AlirezaShamsoshoara added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Jan 27, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 27, 2026
@greptile-apps
Copy link

greptile-apps bot commented Jan 27, 2026

Greptile Overview

Greptile Summary

This PR adds screenshot capture functionality to the OpenApp environment and fixes critical dependency issues.

Key Changes:

  • Screenshot Feature: The screenshot field in OpenAppObservation (which already existed in the model) is now populated with base64-encoded PNG screenshots after reset() and step() operations
  • Dependency Fixes: Resolved beartype version conflicts (<0.18 pin) and added missing fastmcp dependency to support MCP functionality
  • Implementation: Added _extract_screenshot() helper that handles multiple screenshot formats (numpy arrays from BrowserGym, bytes from Playwright) with robust error handling
  • Testing: Added --test-screenshots flag to example script for validation, with screenshots saved to examples/screenshot_output/
  • Documentation: Comprehensive README updates with usage examples and troubleshooting for the PYTHONNOUSERSITE=1 fix

Alignment with OpenEnv Principles:

  • ✅ Gymnasium API signatures unchanged - this populates an existing optional field
  • ✅ Type safety preserved - screenshot field was already typed as Optional[str] in the Pydantic model
  • ✅ No invariants violated - changes are purely additive and server-side
  • ✅ Rewards inside environment - no changes to reward computation
  • ✅ Client-server separation maintained - no client code imports server modules

Technical Quality:

  • Clean implementation with proper error handling and logging
  • No security concerns (no credentials exposed, proper isolation maintained)
  • Lint checks pass, no debug code in production paths
  • Well-documented with clear usage examples and troubleshooting guides

Confidence Score: 5/5

  • This PR is safe to merge with no risk - all changes are additive and well-tested
  • Perfect score (5/5) because: (1) Screenshot field already existed in the model, this PR just populates it; (2) All changes are additive with no breaking modifications; (3) Dependency fixes address real compatibility issues; (4) Comprehensive error handling prevents failures; (5) No invariants violated, lint passes, no security concerns; (6) Excellent documentation and testing infrastructure
  • No files require special attention - all implementations are clean and well-tested

Important Files Changed

Filename Overview
envs/openapp_env/server/openapp_environment.py Added screenshot capture functionality with robust error handling for numpy/bytes/string formats
envs/openapp_env/server/Dockerfile Fixed beartype version conflict and added fastmcp dependency for MCP support
examples/openapp_example.py Added --test-screenshots flag with comprehensive testing and output saving functionality

Sequence Diagram

sequenceDiagram
    participant Agent
    participant OpenAppEnv
    participant BrowserGym
    participant Playwright
    participant OpenApps

    Note over Agent,OpenApps: Screenshot Capture Flow

    Agent->>OpenAppEnv: reset()
    OpenAppEnv->>BrowserGym: reset()
    BrowserGym->>Playwright: navigate to OpenApps URL
    Playwright->>OpenApps: GET /
    OpenApps-->>Playwright: HTML response
    Playwright-->>BrowserGym: screenshot (numpy array)
    BrowserGym-->>OpenAppEnv: obs with screenshot
    OpenAppEnv->>OpenAppEnv: _extract_screenshot(obs)
    Note over OpenAppEnv: Convert numpy array → PIL Image → PNG → base64
    OpenAppEnv-->>Agent: OpenAppObservation with screenshot

    Agent->>OpenAppEnv: step(action)
    alt BrowserGym Action (bid-based)
        OpenAppEnv->>BrowserGym: step(action_string)
        BrowserGym->>Playwright: execute action
        Playwright->>OpenApps: interact with page
        OpenApps-->>Playwright: updated page
        Playwright-->>BrowserGym: screenshot (numpy array)
        BrowserGym-->>OpenAppEnv: obs with screenshot
        OpenAppEnv->>OpenAppEnv: _extract_screenshot(obs)
    else Playwright Direct Action (CSS selector)
        OpenAppEnv->>Playwright: click/fill(selector)
        Playwright->>OpenApps: interact with page
        OpenApps-->>Playwright: updated page
        OpenAppEnv->>Playwright: screenshot()
        Playwright-->>OpenAppEnv: screenshot (bytes)
        OpenAppEnv->>OpenAppEnv: base64 encode
    end
    OpenAppEnv-->>Agent: OpenAppObservation with screenshot
Loading

@AlirezaShamsoshoara AlirezaShamsoshoara changed the title Ali/bug fix/openapp env 01 [Enhancement] OpenApp ScreenShot Feature Jan 27, 2026
@AlirezaShamsoshoara AlirezaShamsoshoara changed the title [Enhancement] OpenApp ScreenShot Feature [Enhancement] OpenApp ScreenShot feature Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CLA Signed This label is managed by the Meta Open Source bot. documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant