Skip to content

fix(codex): preserve UTF-8 in stop summaries#564

Open
valbarko wants to merge 1 commit into
zilliztech:mainfrom
valbarko:codex/codex-stop-utf8-safe-truncation
Open

fix(codex): preserve UTF-8 in stop summaries#564
valbarko wants to merge 1 commit into
zilliztech:mainfrom
valbarko:codex/codex-stop-utf8-safe-truncation

Conversation

@valbarko

@valbarko valbarko commented Jun 1, 2026

Copy link
Copy Markdown

Summary

  • replace byte-based truncation in the Codex stop hook with Unicode-safe character truncation
  • sanitize fallback summary output before appending to memory markdown
  • add a regression test that runs the stop worker with long Cyrillic text and reads the resulting memory file as UTF-8

Verification

  • bash -n plugins/codex/hooks/stop.sh
  • uv run pytest tests/test_codex_stop_hook_utf8.py tests/test_codex_parse_rollout.py
  • uv run ruff check tests/test_codex_stop_hook_utf8.py
  • git diff --check

_json_val "$work_input" "$key" ""
}

_truncate_chars() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit this issue locally with the Codex hook as well. The byte-oriented truncation path can leave invalid UTF-8 in the markdown memory file, and then later
memsearch index .memsearch/memory reports a UnicodeDecodeError when reading that file.

@googs1025

Copy link
Copy Markdown

This PR looks like the right source-side fix for Codex:

  • character-based truncation avoids splitting multi-byte UTF-8 sequences
  • the regression test exercises the worker fallback path with non-ASCII text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants