Skip to content

[Prompt registry] Phase 2 — Port static-tier sections behind the snap…#3659

Merged
VascoSch92 merged 4 commits into
mainfrom
vasco/static-part
Jun 15, 2026
Merged

[Prompt registry] Phase 2 — Port static-tier sections behind the snap…#3659
VascoSch92 merged 4 commits into
mainfrom
vasco/static-part

Conversation

@VascoSch92

@VascoSch92 VascoSch92 commented Jun 11, 2026

Copy link
Copy Markdown
Member

HUMAN:

This is pahse 2 of #3606. In this PR, we just port the static part of the system prompt inside the new structure.

  • A human has tested these changes.

AGENT:


Why

Part of the prompt-registry roadmap (#3606; proposal #2827), which replaces the monolithic system_prompt.j2 + post-render refine() with a typed section registry whose static/dynamic cache split is declared per-section and
unit-testable.

This PR is the static-tier half of Phase 2 (#3610): it ports the ~17 static blocks of system_prompt.j2 into typed PromptSection classes assembled by a default registry, so registry.build(ctx).static reproduces today's
prompt byte-for-byte. It is purely additive and lands behind the Phase 0 snapshot — nothing is wired into the runtime prompt yet (that is the Phase 3 cutover).

Summary

  • Add 17 pure-Python PromptSection classes (no Jinja) in context/prompts/sections/static.py, ported verbatim from system_prompt.j2, plus build_default_registry() (context/prompts/default_registry.py) registering them in
    template order.
  • registry.build(ctx).static reproduces AgentBase.static_system_message byte-for-byte across the full Phase 0 matrix (model family × browser × security-analyzer × cli_mode) and the win32 cell; every section also has a
    standalone unit test (no Jinja environment).
  • Additive only: no change to system_prompt.j2, the snapshot oracle, or any runtime path. refine() (win32 shell-term swap) is applied only in the two blocks that actually contain shell terms.

Issue Number

#3610 (roadmap #3606, proposal #2827)

How to Test

Library-only change with no runtime wiring yet (behind the snapshot), so the end-to-end evidence is the byte-for-byte equivalence test — it builds real Agent instances (not mocks), renders the real static_system_message,
and asserts the registry reproduces it:

uv run pytest tests/sdk/context/prompts/test_default_registry.py -q
# 34 passed — every Phase 0 matrix cell byte-for-byte + win32 + per-section unit tests

Regression (legacy render path untouched) + Phase 1:

uv run pytest tests/sdk/context/prompts/ tests/sdk/context/test_agent_context.py tests/sdk/agent/test_build_prompt_context.py -q
# 167 passed — includes the 48-cell snapshot oracle, unchanged

Full lint/type gate:

uv run pre-commit run --files \
  openhands-sdk/openhands/sdk/context/prompts/sections/__init__.py \
  openhands-sdk/openhands/sdk/context/prompts/sections/static.py \
  openhands-sdk/openhands/sdk/context/prompts/default_registry.py \
  tests/sdk/context/prompts/test_default_registry.py
# ruff format, ruff lint, pycodestyle, pyright, import rules, tool-registration — all pass

Video/Screenshots

N/A — no UI or runtime surface (the registry is not wired into the prompt yet). Evidence is the test output above; the equivalence test exercises real Agent/LLM objects, not mocks.

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Notes

  • Behind the snapshot: static_system_message is not yet routed through the registry — that is the Phase 3 cutover. This PR only proves equivalence.
  • Inter-section spacing: the registry joins sections with one blank line; the legacy template leaves 2–5 around guarded {% if %} blocks. The equivalence test normalizes only those </TAG>…3+ blanks…<TAG> boundaries (one
    tag-anchored regex), so every section body is asserted byte-for-byte. The literal whitespace shift lands at the Phase 3 cutover.
  • Lint: static.py carries a file-level # ruff: noqa: E501 for its verbatim long prompt lines; pyproject.toml is untouched.
  • Intentional edge-case divergences (outside the matrix): <SECURITY> is guarded on security_policy_filename (omitted when empty, vs. the template's empty tags); a custom security_policy_filename would resolve its content
    into the context (follow-up).
  • Follow-up: point 2 of [Prompt registry] Phase 2 — Port sections behind the snapshot #3610 — the dynamic-tier sections (DateTime/RepoContext/AvailableSkills/CustomSuffix/CustomSecrets) from system_message_suffix.j2.

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:505e243-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-505e243-python \
  ghcr.io/openhands/agent-server:505e243-python

All tags pushed for this build

ghcr.io/openhands/agent-server:505e243-golang-amd64
ghcr.io/openhands/agent-server:505e243ea318470fb9f8f0b16d988a5cc8283bcb-golang-amd64
ghcr.io/openhands/agent-server:vasco-static-part-golang-amd64
ghcr.io/openhands/agent-server:505e243-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:505e243-golang-arm64
ghcr.io/openhands/agent-server:505e243ea318470fb9f8f0b16d988a5cc8283bcb-golang-arm64
ghcr.io/openhands/agent-server:vasco-static-part-golang-arm64
ghcr.io/openhands/agent-server:505e243-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:505e243-java-amd64
ghcr.io/openhands/agent-server:505e243ea318470fb9f8f0b16d988a5cc8283bcb-java-amd64
ghcr.io/openhands/agent-server:vasco-static-part-java-amd64
ghcr.io/openhands/agent-server:505e243-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:505e243-java-arm64
ghcr.io/openhands/agent-server:505e243ea318470fb9f8f0b16d988a5cc8283bcb-java-arm64
ghcr.io/openhands/agent-server:vasco-static-part-java-arm64
ghcr.io/openhands/agent-server:505e243-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:505e243-python-amd64
ghcr.io/openhands/agent-server:505e243ea318470fb9f8f0b16d988a5cc8283bcb-python-amd64
ghcr.io/openhands/agent-server:vasco-static-part-python-amd64
ghcr.io/openhands/agent-server:505e243-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:505e243-python-arm64
ghcr.io/openhands/agent-server:505e243ea318470fb9f8f0b16d988a5cc8283bcb-python-arm64
ghcr.io/openhands/agent-server:vasco-static-part-python-arm64
ghcr.io/openhands/agent-server:505e243-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:505e243-golang
ghcr.io/openhands/agent-server:505e243ea318470fb9f8f0b16d988a5cc8283bcb-golang
ghcr.io/openhands/agent-server:vasco-static-part-golang
ghcr.io/openhands/agent-server:505e243-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:505e243-java
ghcr.io/openhands/agent-server:505e243ea318470fb9f8f0b16d988a5cc8283bcb-java
ghcr.io/openhands/agent-server:vasco-static-part-java
ghcr.io/openhands/agent-server:505e243-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:505e243-python
ghcr.io/openhands/agent-server:505e243ea318470fb9f8f0b16d988a5cc8283bcb-python
ghcr.io/openhands/agent-server:vasco-static-part-python
ghcr.io/openhands/agent-server:505e243-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 505e243-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 505e243-python-amd64) are also available if needed

@VascoSch92 VascoSch92 requested a review from all-hands-bot June 11, 2026 14:03
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL30648844072% 
report-only-changed-files is enabled. No files were changed during this commit :)

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

The new static prompt registry works as an additive SDK API and matches legacy section content after documented gap normalization, but literal byte-for-byte parity with Agent.static_system_message is not true.

Does this PR achieve its stated goal?

Partially. I exercised the SDK as a user would by importing build_default_registry(), constructing real Agent/LLM objects, building prompt contexts, and comparing the registry output to the existing static prompt path across default, analyzer, sandbox-tier, browser-enabled, and Windows-simulated scenarios. Those scenarios matched after the PR's documented inter-section gap normalization and produced no dynamic block, so the static-tier port is functionally usable behind the snapshot. However, exact equality was False for the real default prompt (15187 vs 15191 bytes), so the PR does not literally satisfy the repeated byte-for-byte parity claim.

Phase Result
Environment Setup make build completed successfully.
CI Status 🟡 21 checks passing, 8 pending, 2 skipped at review time; no checks were rerun.
Functional Verification ⚠️ Registry behavior verified with real SDK objects; exact byte parity issue found.
Functional Verification

Test 1: Establish base behavior without the PR registry

Step 1 — Reproduce / establish baseline (without the fix):
Ran git worktree add --detach /tmp/qa-pr3659-base origin/main and then executed the SDK from that base worktree:

base_has_default_registry= False
base_static_len= 15191
base_starts= You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks.
base_has_role= True
base_has_browser= False
base_has_security_policy= True

This shows the base branch has the legacy Agent.static_system_message path but no openhands.sdk.context.prompts.default_registry entry point.

Step 2 — Apply the PR's changes:
Returned to PR head 3fd47bf95719a408dfc5132369a20d811c655da1 on vasco/static-part after bootstrapping with make build.

Step 3 — Re-run with the PR in place:
Executed real SDK agents and the new registry API:

CASE=default_gpt5
  registry_static_len= 15187
  legacy_static_len= 15191
  exact_equal= False
  canonical_equal= True
  dynamic_is_none= True
  has_security= True
  has_security_risk= True
  has_important= True
CASE=anthropic_with_analyzer
  registry_static_len= 14489
  legacy_static_len= 14493
  exact_equal= False
  canonical_equal= True
  dynamic_is_none= True
  has_security= True
  has_security_risk= True
  has_important= True
CASE=sandbox_risk_tiers
  registry_static_len= 15139
  legacy_static_len= 15143
  exact_equal= False
  canonical_equal= True
  dynamic_is_none= True
  has_security= True
  has_security_risk= True
  has_important= True

This shows the new API is usable and the static bodies match after gap normalization, but exact byte-for-byte parity with the live prompt path is not achieved.

Test 2: Browser, Windows, and edge-path behavior

Step 1 — Reproduce / establish baseline (without the fix):
On base, the new registry import was absent, so these sectionized paths could not be exercised there.

Step 2 — Apply the PR's changes:
Used the PR branch's new registry and section classes directly with SDK PromptContext/Agent objects.

Step 3 — Re-run with the PR in place:
Ran browser-enabled, Windows-simulated, and disabled-security-policy cases:

CASE=browser_enabled_agent
  ctx_enable_browser= True
  registry_has_browser= True
  legacy_has_browser= True
  canonical_equal= True
CASE=browser_guard_direct_context
  disabled= False
  enabled= True
CASE=windows_agent
  ctx_platform= windows
  registry_contains_powershell= True
  registry_contains_execute_powershell= False
  canonical_equal= True
CASE=security_policy_disabled
  registry_static_len= 13674
  legacy_static_len= 13703
  exact_equal= False
  canonical_equal= False
  dynamic_is_none= True
  has_security= False
  has_security_risk= True
  has_important= True

This confirms browser and Windows paths behave as expected after normalization. The disabled security policy divergence matches the PR notes and is not runtime-wired yet, but it reinforces that the new registry is not a literal byte-for-byte replacement for all legacy static prompt outputs.

Exact mismatch evidence

Ran a compact diff summary on the default PR prompt:

exact_equal= False
registry_len= 15187 legacy_len= 15191 delta= 4
first_diff_index= 9476
registry_excerpt= 'ser has explicitly requested and would expect\n\n</SECURITY>\n\n<SECURITY_RISK_ASSESSMENT>\n# Security Risk Policy\nWhen using tools that support '
legacy_excerpt= 'ser has explicitly requested and would expect\n\n</SECURITY>\n\n\n<SECURITY_RISK_ASSESSMENT>\n# Security Risk Policy\nWhen using tools that support'

This pinpoints the observed difference as inter-section blank-line spacing.

Issues Found

  • 🟡 Minor: The new registry is functionally usable and section bodies match after normalization, but registry.build(ctx).static is not literally byte-for-byte equal to Agent.static_system_message despite the docstring/PR wording implying that level of parity.

This review was created by an AI agent (OpenHands) on behalf of the user.

Comment thread openhands-sdk/openhands/sdk/context/prompts/default_registry.py
@VascoSch92 VascoSch92 marked this pull request as ready for review June 12, 2026 12:27

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Verified the new prompt registry by importing it as SDK library code and building real Agent/LLM instances; nominal static-tier scenarios match legacy output, with one documented out-of-matrix divergence for disabled security policy.

Does this PR achieve its stated goal?

Yes, for the stated additive/static-tier goal. On the PR commit, build_default_registry().build(ctx).static matched canonicalized Agent.static_system_message for real GPT-5/browser/security, Anthropic sandbox/security, and simulated win32 shell-refinement scenarios, and the legacy Agent.static_system_message hashes stayed identical to main for the same Linux scenarios. The registry also remained all-static (dynamic_is_none=True). I found one edge divergence for security_policy_filename=""; the PR description already calls this out as outside the matrix/follow-up, so it is not blocking for the behind-snapshot phase but should be tracked before runtime cutover.

Phase Result
Environment Setup make build completed and installed the uv-managed workspace packages.
CI Status 🟡 At review time: 22 checks successful, 8 in progress, and one unresolved-review-threads failure present.
Functional Verification ⚠️ Nominal registry behavior verified with real SDK objects; one documented edge divergence observed.
Functional Verification

Test 1: Baseline on main — registry is absent, legacy prompt still renders

Step 1 — Establish baseline without the PR:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_prompt_registry_baseline.py:

DEFAULT_REGISTRY_IMPORT=unavailable (ModuleNotFoundError: No module named 'openhands.sdk.context.prompts.default_registry')
SCENARIO gpt5-cli-browser-security
  static_chars=15908 static_sha256_16=eb14bbdf59f0f3c0
  has_browser=True
  has_security=True
  has_security_risk=True
  has_important=True
SCENARIO anthropic-sandbox-security
  static_chars=14460 static_sha256_16=06f4ecdf0f669296
  has_browser=False
  has_security=True
  has_security_risk=True
  has_important=True
SCENARIO gemini-no-security-custom-soul
  static_chars=12797 static_sha256_16=10128628f125ef51
  has_browser=False
  has_security=True
  has_security_risk=True
  has_important=True
RESULT legacy Agent.static_system_message renders for all baseline scenarios

This shows the new user-facing library entry point did not exist on main, while the legacy Agent.static_system_message path rendered successfully for the exercised scenarios.

Step 2 — Apply the PR changes:
Checked out PR commit 505e243ea318470fb9f8f0b16d988a5cc8283bcb.

Step 3 — Re-run with the PR in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_prompt_registry_pr.py:

SCENARIO gpt5-cli-browser-security
  legacy_chars=15908 legacy_sha256_16=eb14bbdf59f0f3c0 legacy_same_as_main=True
  canonical_chars=15904 registry_chars=15904 static_matches_legacy=True
  dynamic_is_none=True
  has_browser=True
  has_security=True
  has_security_risk=True
  has_important=True
SCENARIO anthropic-sandbox-security
  legacy_chars=14460 legacy_sha256_16=06f4ecdf0f669296 legacy_same_as_main=True
  canonical_chars=14456 registry_chars=14456 static_matches_legacy=True
  dynamic_is_none=True
  has_browser=False
  has_security=True
  has_security_risk=True
  has_important=True
SCENARIO gemini-no-security-custom-soul
  legacy_chars=12797 legacy_sha256_16=10128628f125ef51 legacy_same_as_main=True
  canonical_chars=12793 registry_chars=12768 static_matches_legacy=False
  dynamic_is_none=True
  has_browser=False
  has_security=False
  has_security_risk=True
  has_important=True
  first_diff_at=7922 registry='_' legacy='>'
SCENARIO win32-gpt5-cli-security
  legacy_chars=15218 legacy_sha256_16=36fda475513147a3 legacy_same_as_main=False
  canonical_chars=15214 registry_chars=15214 static_matches_legacy=True
  dynamic_is_none=True
  has_browser=False
  has_security=True
  has_security_risk=True
  has_important=True
  win32_terms powershell=True bash=False
RESULT nominal registry scenarios match canonical legacy static output; documented empty-security-policy edge diverges

This shows the PR adds the registry entry point and that nominal registry output matches the legacy static prompt after the PR's documented inter-section gap canonicalization. The unchanged legacy_sha256_16 values for the Linux scenarios confirm the live legacy prompt path remains additive/unchanged. The win32 simulation confirms the shell-term refinement still matches legacy behavior for the exercised Windows path.

Issues Found

  • 🟡 Minor: security_policy_filename="" diverges from legacy static output (static_matches_legacy=False; registry omits <SECURITY> while legacy includes an empty wrapper). This is documented in the PR as outside the Phase 0 matrix/follow-up, so I do not consider it blocking for this behind-snapshot PR, but it should be resolved or explicitly carried into the Phase 3 cutover plan.

This review was created by an AI agent (OpenHands) on behalf of the user.

</SECURITY>"""

def guard(self, ctx: PromptContext) -> bool:
return bool(ctx.template_kwargs.get("security_policy_filename"))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: QA functional verification found an out-of-matrix parity divergence for Agent(security_policy_filename=""): legacy static_system_message still contains an empty <SECURITY> wrapper, while the registry output omits it (static_matches_legacy=False, has_security=False). The PR description notes this as intentional/follow-up, so it is not blocking for the current behind-snapshot phase, but please track it before the registry is wired into runtime to avoid a cutover behavior change for users who disable the default policy.

This comment was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ I wonder why would it miss <SECURITY>?

@enyst enyst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@VascoSch92 VascoSch92 merged commit 42464c7 into main Jun 15, 2026
42 of 43 checks passed
@VascoSch92 VascoSch92 deleted the vasco/static-part branch June 15, 2026 08:28
VascoSch92 pushed a commit that referenced this pull request Jun 15, 2026
Resolve conflicts after #3659 (static-part) landed on main: the static-tier
files now exist on both sides. Keep the dynamic-tier additions (sections,
registry registration, equivalence tests) on top of the merged static tier.

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants