Skip to content

[draft] Support more eval configs #1

Draft
jfoote wants to merge 1 commit intoSwival:mainfrom
jfoote:foote/knobs
Draft

[draft] Support more eval configs #1
jfoote wants to merge 1 commit intoSwival:mainfrom
jfoote:foote/knobs

Conversation

@jfoote
Copy link
Copy Markdown

@jfoote jfoote commented May 6, 2026

This PR adds some features that I drafted to support evaluating updates to https://github.com/fastly/fastly-agent-toolkit. I figured I'd share them in case they might be useful upstream.

  • forward more skills-related cli directives from calibra to swival
  • forward aws-profile similarly (for bedrock support)
  • expose trial information to verify.sh
    • via CALIBRA_ env vars
  • support opt-in running of both verify.sh and reviewer (llm-as-judge) in a single trial
  • support per-task reviewer criteria

I'll remove draft when I think this is ready for review.

- forward more skills-related cli directives from calibra to swival
- forward aws-profile similarly (for bedrock support)
- expose trial information to verify.sh
    - via CALIBRA_<DIMENSION> env vars
- support running both verify.sh and reviewer (llm-as-judge) in a single
  trial
- support per-task reviewer criteria
Comment thread calibra/runner.py
Comment on lines +463 to +468
if spec.variant.skills.skills_dirs:
for skills_dir in spec.variant.skills.skills_dirs:
argv.extend(["--skills-dir", str(skills_dir)])
else:
argv.append("--no-skills")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is already defined in [session], we will end up with duplicate flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants