·
2 commits
to refs/heads/main
since this release
Changes
New Features ✨
- feat: Add OPENAI_SESSION_BACKEND environment variable for session backend selection (#135)
- feat: implement LLM cost savings in CI (#127)
- feat: Add agent identity debug logging to expose coordination issues (#132)
- feat: add internal monologue and smaller steps (#113)
- feat: run evals weekly with specific dataset (#118)
- feat: find prior experiment run across datasets (#117)
- feat: filter evals by connector (#114)
- feat: add evals using arize phoenix (#91)
- feat: explicit locally scoped secrets directory (#95)
- feat: add proper secrets handling and ci-based execution workflows (#90)
- feat: add reporting tools and misc fix bugs (#88)
- feat: web search without playwright, split manager/developer model config (#85)
Bug Fixes 🐛
- fix: conversation id doesn't work with custom session ID values (#137)
- fix: Add validation for manifest streams structure to prevent AttributeError (#130)
- fix(validation): Auto-enable raw responses when zero records extracted (#128)
- fix(connector-builder-agents): Improve emoji detection in update_progress_log (#122)
- fix: add node to evals action (#111)
- fix: make evals work with standalone phoenix client package (#110)
- fix: use
.secrets
dir withincwd
(not parent) (#96) - fix: update static-args to treat PR number as optional in slash command dispatch (#94)
- fix: return raw responses when requested even with 0 records (#89)
Under the Hood ⚙️
- chore: Remove redundant openai-agents-mcp dependency (#134)
- chore: Upgrade openai-agents from 0.2.11 to 0.3.3 (#133)
- ci(deps): bump pypa/gh-action-pypi-publish from 1.12.4 to 1.13.0 in the minor-and-patch group across 1 directory (#123)
- ci(deps): bump actions/checkout from 4 to 5 (#107)
- ci(deps): bump actions/setup-python from 5 to 6 (#106)
- ci(deps): bump actions/github-script from 7 to 8 (#108)
- chore(evals): restructure YAML to use input/expected top-level keys (#116)
- chore: refactor to remove global state (#92)
Documentation 📖
- docs: split evals runbook into working and non-working sections (#98)