Skip to content

refactor(sdk): remove Jinja prompts superseded by the Python prompt registry#3796

Open
VascoSch92 wants to merge 8 commits into
mainfrom
vasco/clean-jinja
Open

refactor(sdk): remove Jinja prompts superseded by the Python prompt registry#3796
VascoSch92 wants to merge 8 commits into
mainfrom
vasco/clean-jinja

Conversation

@VascoSch92

@VascoSch92 VascoSch92 commented Jun 18, 2026

Copy link
Copy Markdown
Member

HUMAN:

Deleteing old jinja prompts not used anymore.


AGENT:

End-to-end evidence — instantiated real agents and rendered the full system prompt
(not just unit tests). Before deleting system_prompt.j2, I confirmed the registry
reproduces its Jinja render byte-for-byte for the default and custom policies
(including the Windows shell refinement). Post-deletion render:

$ uv run python -c "<build agents, print static_system_message markers>"
default(claude): len=14504  <ROLE>=True  <SECURITY>=True  default-policy=True  <IMPORTANT>=True
gpt-5: model-specific block present = True
custom policy: CUSTOM_RULE present =True  default-policy leaked =False
disabled policy: any <SECURITY> tag = False

Suites: ruff clean; full agent/, context/, fncall-converter, delegate/, and the
API-breakage signature test pass (1260 passed in the broad sweep).

Why

The default system prompt is now assembled from the pure-Python section registry
(#3610 port), but the original Jinja templates were left behind — mostly unused or
duplicating the registry. Keeping both is a drift/maintenance hazard: two sources of
truth for the same prompt.

Summary

  • Delete 10 redundant .j2 templates: 6 dead (unreferenced {% include %} wrappers,
    Python-ported in-context examples, an orphaned delegate description) and 4
    model_specific/* (byte-for-byte duplicated in ModelSpecificSection).
  • Port the last Jinja-only behavior — the custom security-policy {% include %} — into
    SecuritySection (AgentBase._read_custom_security_policy), then delete
    system_prompt.j2 + self_documentation.j2 + security_risk_assessment.j2 +
    security_policy.j2; a custom security_policy_filename now routes through the registry.
  • Net −743/+80 across 20 files; render_template and system_prompt_planning.j2 are
    retained for the planning preset, custom user templates, and subclasses.

Issue Number

Relates to #3610 (prompt-registry port).

How to Test

uv run pytest tests/sdk/context/prompts/ \
  tests/sdk/context/test_prompt_model_spec.py \
  tests/sdk/context/test_prompt_absolute_path.py \
  tests/sdk/agent/test_system_prompt.py \
  tests/sdk/agent/test_security_policy_integration.py

Then render a prompt directly:

uv run python -c "
from openhands.sdk.agent import Agent
from openhands.sdk.llm import LLM
print('SECURITY' in Agent(llm=LLM(model='claude-sonnet-4-5', usage_id='t'), tools=[]).static_system_message)
"

Video/Screenshots

N/A — library/CLI change with no UI. Render output included in the AGENT section above.

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Notes

Minor behavior changes for custom security policies (default-policy agents are
unaffected — already on the registry): content is identical, but inter-section spacing
is normalized to single blank lines; a disabled policy (security_policy_filename="")
no longer emits empty <SECURITY></SECURITY> tags; and custom policy files are inserted
verbatim (no longer Jinja-rendered). The static "matches-legacy" test oracle had become
self-referential (default agents take the registry path), so it's removed along with
system_prompt.j2; test_prompt_snapshot.py (byte-exact registry snapshots) carries the
static regression guard, and the dynamic oracle against system_message_suffix.j2 stays.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:73475ac-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-73475ac-python \
  ghcr.io/openhands/agent-server:73475ac-python

All tags pushed for this build

ghcr.io/openhands/agent-server:73475ac-golang-amd64
ghcr.io/openhands/agent-server:73475ac47cf212fe2c02395604a2bb8628c5a02e-golang-amd64
ghcr.io/openhands/agent-server:vasco-clean-jinja-golang-amd64
ghcr.io/openhands/agent-server:73475ac-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:73475ac-golang-arm64
ghcr.io/openhands/agent-server:73475ac47cf212fe2c02395604a2bb8628c5a02e-golang-arm64
ghcr.io/openhands/agent-server:vasco-clean-jinja-golang-arm64
ghcr.io/openhands/agent-server:73475ac-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:73475ac-java-amd64
ghcr.io/openhands/agent-server:73475ac47cf212fe2c02395604a2bb8628c5a02e-java-amd64
ghcr.io/openhands/agent-server:vasco-clean-jinja-java-amd64
ghcr.io/openhands/agent-server:73475ac-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:73475ac-java-arm64
ghcr.io/openhands/agent-server:73475ac47cf212fe2c02395604a2bb8628c5a02e-java-arm64
ghcr.io/openhands/agent-server:vasco-clean-jinja-java-arm64
ghcr.io/openhands/agent-server:73475ac-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:73475ac-python-amd64
ghcr.io/openhands/agent-server:73475ac47cf212fe2c02395604a2bb8628c5a02e-python-amd64
ghcr.io/openhands/agent-server:vasco-clean-jinja-python-amd64
ghcr.io/openhands/agent-server:73475ac-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:73475ac-python-arm64
ghcr.io/openhands/agent-server:73475ac47cf212fe2c02395604a2bb8628c5a02e-python-arm64
ghcr.io/openhands/agent-server:vasco-clean-jinja-python-arm64
ghcr.io/openhands/agent-server:73475ac-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:73475ac-golang
ghcr.io/openhands/agent-server:73475ac47cf212fe2c02395604a2bb8628c5a02e-golang
ghcr.io/openhands/agent-server:vasco-clean-jinja-golang
ghcr.io/openhands/agent-server:73475ac-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:73475ac-java
ghcr.io/openhands/agent-server:73475ac47cf212fe2c02395604a2bb8628c5a02e-java
ghcr.io/openhands/agent-server:vasco-clean-jinja-java
ghcr.io/openhands/agent-server:73475ac-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:73475ac-python
ghcr.io/openhands/agent-server:73475ac47cf212fe2c02395604a2bb8628c5a02e-python
ghcr.io/openhands/agent-server:vasco-clean-jinja-python
ghcr.io/openhands/agent-server:73475ac-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 73475ac-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 73475ac-python-amd64) are also available if needed

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   base.py3543789%63, 67, 76, 79, 82, 87, 129–130, 269, 300, 326, 330, 334–335, 356, 452, 632, 679–681, 708, 718, 726–727, 769, 771–772, 844–846, 848–850, 886–887, 897–898
TOTAL32462902072% 

@VascoSch92 VascoSch92 requested a review from all-hands-bot June 18, 2026 16:07

all-hands-bot commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

I verified the SDK prompt-rendering paths as a real user would; the main refactor works for default, disabled, normal custom-policy, custom-template, and planning-template usage, but an empty custom policy file now leaks the built-in default policy.

Does this PR achieve its stated goal?

Mostly yes. The PR set out to remove redundant built-in Jinja prompts while keeping the Python prompt registry behavior equivalent and routing custom security policies through it; running Agent(...).static_system_message on base vs. the PR showed identical default Claude/GPT-5 prompt hashes and working normal custom/disabled policy behavior. However, a zero-byte custom security_policy_filename regresses: base rendered no default policy for that custom file, while the PR falls back to the built-in security policy.

Phase Result
Environment Setup make build completed successfully.
CI Status 🟡 No failures observed; many checks pass, with coverage/QA/build jobs still pending when checked.
Functional Verification ⚠️ Core paths pass; empty custom policy file regression found.
Functional Verification

Test 1: Default prompt parity and normal custom-policy behavior

Step 1 — Establish baseline on origin/main:
Ran git checkout --quiet origin/main && uv run python /tmp/qa_prompt_check.py:

default_claude: len=14504 sha=c9d7d3f665687ed7 security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=False claude_specific=True
gpt5: len=15202 sha=0ecf2532b0abacfc security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=True claude_specific=False
disabled_policy: security_tag=True default_policy=False len=13020 sha=6ac4a8b90cd27349
custom_policy: custom_rule=True jinja_variable_literal=False jinja_variable_rendered=False default_policy_leaked=False security_tag_count=1 len=13062 sha=adb80e715e640b47
custom_template: CUSTOM_TEMPLATE model_family=openai_gpt policy=security_policy.j2
planning_template: len=5249 sha=9dfe768574be2daf planning_marker=True

This establishes the old user-facing behavior: default prompts include the expected sections/model-specific instructions; custom policy content replaces the default policy; custom system templates and the planning template render through Jinja.

Step 2 — Apply the PR's changes:
Checked out vasco/clean-jinja at b140712e05c8d9c77b40c1f6116e14223ffd2da5.

Step 3 — Re-run with the PR in place:
Ran git checkout --quiet vasco/clean-jinja && uv run python /tmp/qa_prompt_check.py:

default_claude: len=14504 sha=c9d7d3f665687ed7 security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=False claude_specific=True
gpt5: len=15202 sha=0ecf2532b0abacfc security_tag=True default_policy=True self_doc=True important=True gpt5_inline_review=True claude_specific=False
disabled_policy: security_tag=False default_policy=False len=12991 sha=5b8bface75f397d6
custom_policy: custom_rule=True jinja_variable_literal=True jinja_variable_rendered=False default_policy_leaked=False security_tag_count=1 len=13073 sha=27cb578ec84d66b9
custom_template: CUSTOM_TEMPLATE model_family=openai_gpt policy=security_policy.j2
planning_template: len=5249 sha=9dfe768574be2daf planning_marker=True

This confirms the main refactor goal for normal usage: default Claude and GPT-5 prompts are byte-identical by hash, normal custom policy content is included without default-policy leakage, disabled policy no longer emits empty <SECURITY> tags, and retained Jinja escape hatches still render.

Test 2: Edge case — empty custom security policy file

Step 1 — Establish baseline on origin/main:
Ran git checkout --quiet origin/main && uv run python /tmp/qa_empty_policy_check.py:

empty_custom_policy: security_tag=True default_policy=False policy_path=/tmp/tmp9zhd0u84/empty_policy.j2 len=13022

This shows that, before the PR, choosing a custom policy file did not silently reintroduce the built-in policy even when the file was empty.

Step 2 — Apply the PR's changes:
Checked out vasco/clean-jinja at b140712e05c8d9c77b40c1f6116e14223ffd2da5.

Step 3 — Re-run with the PR in place:
Ran git checkout --quiet vasco/clean-jinja && uv run python /tmp/qa_empty_policy_check.py:

empty_custom_policy: security_tag=True default_policy=True policy_path=/tmp/tmp3c64fja7/empty_policy.j2 len=14504

This shows a regression: a user-supplied empty custom policy file is treated like no custom content and the built-in default policy appears instead.

Issues Found

  • 🟠 Issue: Empty custom security policy files now fall back to the built-in default security policy instead of honoring the selected custom file content.

This review was created by an AI agent (OpenHands) on behalf of the user.

Comment thread openhands-sdk/openhands/sdk/context/prompts/sections/static.py Outdated

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable — the refactor removes a dead second prompt implementation, but the custom security-policy path now changes a public “template filename” behavior and has a small fallback bug.

[CRITICAL ISSUES]

  • None found that require a blocking review from this automation.

[IMPROVEMENT OPPORTUNITIES]

  • See the inline comments on custom security policy rendering. The data-flow simplification is good, but custom policy files should either preserve the documented Jinja-template contract or the breaking change should be made explicit with docs/deprecation and stronger tests.

[TESTING GAPS]

  • The PR description includes targeted prompt-render evidence and relevant test commands. I did not see eval-monitor evidence plus human confirmation; because this changes prompt assembly/agent behavior, the repo-specific review guide says not to approve yet and to defer to a human maintainer/eval validation.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM
    This removes legacy prompt templates and changes the custom security-policy path in agent prompt construction. The deletion itself is low-complexity and covered by snapshots/tests, but prompt assembly is eval-sensitive and can affect benchmark/agent behavior; custom-policy compatibility is the main user-facing risk.

VERDICT:
Worth merging after maintainer validation: The core cleanup direction is sound, but please address/acknowledge the custom policy compatibility concerns and have a human maintainer decide after the appropriate lightweight evals.

KEY INSIGHT:
The refactor is only safe if the Python registry remains a drop-in replacement not just for default prompts, but also for documented custom security-policy template behavior.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread openhands-sdk/openhands/sdk/agent/base.py
Comment thread openhands-sdk/openhands/sdk/context/prompts/sections/static.py Outdated
@VascoSch92 VascoSch92 requested a review from all-hands-bot June 19, 2026 14:01

Copy link
Copy Markdown
Collaborator

🔍 Review in progress…

We are performing the review through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Prompt rendering works as described: default/model-specific prompts remain stable, custom security policies now route through the registry as raw text, and disabled policies no longer emit empty <SECURITY> tags.

Does this PR achieve its stated goal?

Yes. The PR set out to remove superseded Jinja prompt files while preserving the default Python-registry prompt path and moving custom security policies into that registry path. I exercised the SDK as a user would by constructing real Agent/LLM objects and rendering static_system_message on both origin/main and commit db2ada2e90078f0107613f37c24c9ea6f92fa8da; the default Claude and GPT-5 prompt markers/lengths stayed identical, while the documented custom-policy behavior changed from Jinja-rendered to verbatim raw content and disabled policies stopped producing empty security tags.

Phase Result
Environment Setup uv run python created the project .venv, installed packages, and imported Agent/LLM successfully.
CI Status 🟡 At review time: 21 checks passing, 8 pending, 1 skipped; no failing checks observed.
Functional Verification ✅ Real SDK prompt rendering exercised before/after for default, GPT-5, disabled security policy, custom security policy, and retained planning-template rendering.
Functional Verification

Test 1: Default and model-specific static prompts remain stable

Step 1 — Establish baseline on origin/main:
Ran git checkout --quiet origin/main && uv run python - <<'PY' ... PY, where the script constructed real Agent/LLM objects for default Claude, GPT-5, disabled policy, and custom policy cases and printed prompt markers.

Observed baseline excerpt:

default len 14504
default.has_security_tag=True
default.has_default_policy=True
default.has_self_documentation=True
default.has_claude_important=True
gpt5 len 15202
gpt5.has_gpt5_inline_review_guidance=True

This establishes the user-visible static prompt content that the refactor needs to preserve for normal/default agents.

Step 2 — Apply the PR's changes:
Checked out db2ada2e90078f0107613f37c24c9ea6f92fa8da.

Step 3 — Re-run with the PR in place:
Ran the same SDK script and observed:

default len 14504
default.has_security_tag=True
default.has_default_policy=True
default.has_self_documentation=True
default.has_claude_important=True
gpt5 len 15202
gpt5.has_gpt5_inline_review_guidance=True

This confirms the default registry path still renders the expected security/self-documentation/model-specific sections after deleting the redundant Jinja files.

Test 2: Custom and disabled security policies behave as documented

Step 1 — Establish baseline on origin/main:
The same baseline script created a temporary custom custom_policy.j2 containing CUSTOM_RULE plus Jinja syntax and rendered it through Agent.static_system_message.

Observed baseline excerpt:

disabled.has_security_tag=True
disabled.has_default_policy=False
disabled.has_empty_security_block=True
custom.has_security_tag=True
custom.has_default_policy=False
custom.has_custom_rule=True
custom.literal_jinja_kept=False
custom.rendered_jinja_seen=True

This shows the old behavior: disabled policies left an empty <SECURITY> wrapper, and custom policy files were rendered as Jinja.

Step 2 — Apply the PR's changes:
Checked out db2ada2e90078f0107613f37c24c9ea6f92fa8da.

Step 3 — Re-run with the PR in place:
Ran the same SDK script and observed:

disabled.has_security_tag=False
disabled.has_default_policy=False
disabled.has_empty_security_block=False
custom.has_security_tag=True
custom.has_default_policy=False
custom.has_custom_rule=True
custom.literal_jinja_kept=True

This confirms the intended behavior change: custom policy content is inserted verbatim via the registry, the default policy does not leak into custom prompts, and disabling the policy removes the security block rather than emitting an empty one.

Test 3: Retained planning Jinja render path still executes

Step 1 — Apply the PR's changes:
Used the PR commit db2ada2e90078f0107613f37c24c9ea6f92fa8da.

Step 2 — Exercise the surviving render path:
Ran uv run python - <<'PY' ... render_template(agent.prompt_dir, 'system_prompt_planning.j2') ... PY.

Observed:

planning_rendered True
planning_len 5249

This confirms render_template can still render the retained built-in planning template after the deleted prompt files are removed.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@VascoSch92 VascoSch92 requested a review from all-hands-bot June 19, 2026 14:56

Copy link
Copy Markdown
Collaborator

🔍 Review in progress…

We are performing the review through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the SDK prompt-rendering paths affected by this refactor with real Agent/LLM instantiation; behavior matches the PR’s stated goal and no functional issues were found.

Does this PR achieve its stated goal?

Yes. The PR set out to remove superseded built-in Jinja prompt templates while preserving default prompt behavior and routing custom security policies through the Python prompt registry. Running Agent(...).static_system_message on origin/main and on vasco/clean-jinja showed the default Claude/GPT-5 prompts still include the expected security, self-documentation, and model-specific content, while the custom security-policy behavior changed exactly as documented: the PR injects the custom policy file verbatim, avoids default-policy leakage, and no longer emits <SECURITY> tags when the policy is disabled. I also verified the retained Jinja escape hatch still renders the planning prompt and an absolute custom template.

Phase Result
Environment Setup make build completed; uv dependencies were synced. No test suite, linter, formatter, or pre-commit run was executed.
CI Status 🟡 33 passing, 3 skipped, 1 pending (qa-changes) at verification time.
Functional Verification ✅ SDK prompt rendering and retained Jinja template paths exercised successfully.
Functional Verification

Test 1: Default prompt behavior and custom security-policy delta

Step 1 — Establish baseline without the PR:
Ran git switch --detach origin/main && uv run python /tmp/qa_prompt_probe.py using a script that imports Agent/LLM, instantiates real agents, and reads static_system_message:

default_claude: len=14504
default_claude: security_tag=True
default_claude: default_policy=True
default_claude: self_documentation=True
default_claude: gpt5_inline_thread_help=False
gpt5: len=15202
gpt5: security_tag=True
gpt5: default_policy=True
gpt5: self_documentation=True
gpt5: gpt5_inline_thread_help=True
custom: security_tag=True
custom: custom_rule=True
custom: default_policy_leaked=False
custom: literal_model_expr=False
custom: rendered_model_name=True
custom: false_branch_visible=False
custom: jinja_if_literal=False
disabled: security_tag=True
disabled: default_policy=False
status: baseline=0 pr=0

This confirms the pre-PR state: default prompts rendered expected sections, custom security policies were rendered as Jinja (rendered_model_name=True, literal Jinja markers absent), and disabling the policy still left an empty <SECURITY> block.

Step 2 — Apply the PR's changes:
Checked out vasco/clean-jinja at db2ada2e90078f0107613f37c24c9ea6f92fa8da.

Step 3 — Re-run with the PR in place:
Ran git switch vasco/clean-jinja && uv run python /tmp/qa_prompt_probe.py:

default_claude: len=14504
default_claude: security_tag=True
default_claude: default_policy=True
default_claude: self_documentation=True
default_claude: gpt5_inline_thread_help=False
gpt5: len=15202
gpt5: security_tag=True
gpt5: default_policy=True
gpt5: self_documentation=True
gpt5: gpt5_inline_thread_help=True
custom: security_tag=True
custom: custom_rule=True
custom: default_policy_leaked=False
custom: literal_model_expr=True
custom: rendered_model_name=False
custom: false_branch_visible=True
custom: jinja_if_literal=True
disabled: security_tag=False
disabled: default_policy=False
status: baseline=0 pr=0

This shows the default prompt path is preserved for the sampled Claude/GPT-5 models (same lengths and expected markers), while the changed custom-policy behavior matches the PR description: Jinja syntax remains literal, the built-in default policy does not leak, and a disabled security policy emits no <SECURITY> tag.

Test 2: Retained Jinja prompt escape hatch still works

Step 1 — Baseline expectation:
The PR description says render_template and system_prompt_planning.j2 are retained for the planning preset, custom user templates, and subclasses; this path should continue to be executable after the deletion.

Step 2 — Apply the PR's changes:
Using the same checked-out PR branch, I exercised Agent.static_system_message with system_prompt_filename="system_prompt_planning.j2" and then with an absolute custom Jinja template path.

Step 3 — Run the retained path:
Ran uv run python /tmp/qa_jinja_escape_probe.py:

planning: rendered=True
planning: role_tag=True
planning: security_tag=False
custom_template: CUSTOM TEMPLATE FOR test-model / cli=

This confirms the surviving built-in planning template renders successfully, and an absolute custom system-prompt template is still Jinja-rendered with SDK template variables.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

@VascoSch92 VascoSch92 requested a review from enyst June 19, 2026 15:11
@VascoSch92 VascoSch92 marked this pull request as ready for review June 19, 2026 15:11

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Functional SDK prompt rendering works as claimed; the only issue I observed is a non-functional PR description validator failure.

Does this PR achieve its stated goal?

Yes. The PR set out to remove redundant Jinja prompt templates while preserving default prompt behavior and routing custom security policies through the Python registry. I exercised the SDK by instantiating real Agent/LLM objects and rendering static_system_message: default prompt markers remained present, custom policies were inserted verbatim without leaking the default policy, disabled security policies emitted no <SECURITY> block, and retained Jinja escape hatches still rendered.

Phase Result
Environment Setup make build completed and installed the uv workspace environment.
CI Status ⚠️ Most checks are green, but Validate PR description fails with 3 template-section errors. I did not rerun tests.
Functional Verification ✅ Before/after SDK execution confirmed the changed behavior.
Functional Verification

Test 1: Default prompt and custom security policy rendering

Step 1 — Establish baseline on origin/main (b0324013):
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python - <<'PY' ... PY with a script that instantiated Agent(llm=LLM(model="gpt-5"), tools=[]), rendered static_system_message, then repeated with a custom security_policy_filename containing Jinja syntax and with security_policy_filename="".

Output excerpt:

commit b0324013
default markers True True True
custom security block
<SECURITY>

# Custom Policy for gpt-5
-

</SECURITY>
disabled security block <SECURITY>

</SECURITY>

This shows the baseline default prompt worked, but custom policy files were Jinja-rendered and disabling the policy still left an empty <SECURITY> block.

Step 2 — Apply the PR changes:
Checked out vasco/clean-jinja at db2ada2e.

Step 3 — Re-run with the PR in place:
Ran the same SDK rendering flow.

Output excerpt:

commit db2ada2e
default markers True True True
custom security block
<SECURITY>

# Custom Policy for {{ model_name }}
- {% if cli_mode %}CLI_TRUE{% endif %}


</SECURITY>
default policy leaked False
disabled security block <NO SECURITY BLOCK>

This confirms the PR preserves default prompt content (<ROLE>, default security policy, and GPT-5 model-specific preamble all present), inserts custom security policy files verbatim, avoids default-policy leakage, and removes the disabled-policy empty tag block.

Test 2: Retained Jinja escape hatches still render

Step 1 — Baseline:
On origin/main, the same SDK script rendered a temporary custom system prompt and the retained planning template:

custom_system_prompt:  CUSTOM TEMPLATE: gpt-5 / security_policy.j2
planning_template: len= 5249 has_PLANNING= True

This establishes that user-supplied system prompt templates and system_prompt_planning.j2 were valid entry points before the refactor.

Step 2 — PR branch:
On db2ada2e, I re-ran equivalent user-facing rendering:

custom system prompt CUSTOM TEMPLATE: gpt-5 / security_policy.j2
planning template True True

This confirms the retained Jinja pathways still work after deleting the obsolete built-in prompt templates.

CI note

I checked the failing validator log with gh run view 27833734611 --repo OpenHands/software-agent-sdk --job 82376587822 --log-failed.

Output excerpt:

##[error]Keep the `## Why` section from the PR template.
##[error]Keep the `## Summary` section from the PR template.
##[error]Keep the `## How to Test` section from the PR template.
PR description validation failed with 3 error(s).

I did not edit the PR description.

Issues Found

  • 🟡 PR hygiene / CI: Validate PR description is failing because the PR template section headings ## Why, ## Summary, and ## How to Test are not present in the expected form. No functional SDK issue was found in the exercised behavior.

This review was created by an AI agent (OpenHands) on behalf of the user.

The default prompt is assembled from the typed section registry, which also
resolves a custom ``security_policy_filename``. Escape hatches keep the Jinja
path: an inline ``system_prompt`` is returned verbatim; a custom
``system_prompt_filename`` or subclass ``prompt_dir`` renders its own template.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still support a .j2 file if custom?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

"verbatim (NOT rendered as a Jinja template). Can be either:\n"
"- A relative filename (e.g., 'security_policy.j2') loaded from the "
"agent's prompts directory\n"
"- An absolute path (e.g., '/path/to/custom_security_policy.j2')\n"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here, the comment says it's not rendered as jinja, but then it continues saying it can be jinja? Sorry I jumped directly in the code, maybe it was explained somewhere?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't see this comment and it is missleading. I delete it.

I mean: if you pass a custum jinja will be rendered as a jinja (with the method render_template) but if you don't do it, we use the built-in prompts which are verbatim.

# Security Risk Policy
When using tools that support the security_risk parameter, assess the safety risk of your actions:

{% if cli_mode | default(true) %}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one reason why I think maybe we need jinja. I know we first made them python, but that way is not as easily customizable as this... I wonder if the next step could be, on the contrary, to put more in jinja rather than less 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right: a template can be more customizable than Python, but I checked, and the prompts never actually used that: no {% block %}/{% extends %} anywhere, and system_prompt.j2 had no inline logic beyond {{ soul_content }} and include-guards.

So every hook the old Jinja exposed is preserved on the registry, just relocated from {% if %}/{% include %} into .guard()/dict lookups.

The only thing genuinely removed is Jinja inside a custom policy file (now verbatim), which was deliberate.

The "override one block without forking" story you're describing would be new capability, neither the old templates nor the registry expose it today.

If that's what we want, I think the right move is to design a real section-override API (works for both), rather than keep the duplicate templates that only gave us that power in theory.

Moreover, you can always pass a template and kwargs argument for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants