Skip to content

v0.3.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 11 Oct 13:46
· 2 commits to refs/heads/main since this release
4e01db7

Changes

  • feat: add eval summary report (#112)
  • feat: add action to run evals w/ phoenix (#109)

New Features ✨

  • feat: Add OPENAI_SESSION_BACKEND environment variable for session backend selection (#135)
  • feat: implement LLM cost savings in CI (#127)
  • feat: Add agent identity debug logging to expose coordination issues (#132)
  • feat: add internal monologue and smaller steps (#113)
  • feat: run evals weekly with specific dataset (#118)
  • feat: find prior experiment run across datasets (#117)
  • feat: filter evals by connector (#114)
  • feat: add evals using arize phoenix (#91)
  • feat: explicit locally scoped secrets directory (#95)
  • feat: add proper secrets handling and ci-based execution workflows (#90)
  • feat: add reporting tools and misc fix bugs (#88)
  • feat: web search without playwright, split manager/developer model config (#85)

Bug Fixes 🐛

  • fix: conversation id doesn't work with custom session ID values (#137)
  • fix: Add validation for manifest streams structure to prevent AttributeError (#130)
  • fix(validation): Auto-enable raw responses when zero records extracted (#128)
  • fix(connector-builder-agents): Improve emoji detection in update_progress_log (#122)
  • fix: add node to evals action (#111)
  • fix: make evals work with standalone phoenix client package (#110)
  • fix: use .secrets dir within cwd (not parent) (#96)
  • fix: update static-args to treat PR number as optional in slash command dispatch (#94)
  • fix: return raw responses when requested even with 0 records (#89)

Under the Hood ⚙️

  • chore: Remove redundant openai-agents-mcp dependency (#134)
  • chore: Upgrade openai-agents from 0.2.11 to 0.3.3 (#133)
  • ci(deps): bump pypa/gh-action-pypi-publish from 1.12.4 to 1.13.0 in the minor-and-patch group across 1 directory (#123)
  • ci(deps): bump actions/checkout from 4 to 5 (#107)
  • ci(deps): bump actions/setup-python from 5 to 6 (#106)
  • ci(deps): bump actions/github-script from 7 to 8 (#108)
  • chore(evals): restructure YAML to use input/expected top-level keys (#116)
  • chore: refactor to remove global state (#92)

Documentation 📖

  • docs: split evals runbook into working and non-working sections (#98)