Skip to content

Commit d1bac82

Browse files
dsblankDouglas Blankclaude
authored
refactor: restructure into a multi-action package (#26)
Move Scout's flat top-level scripts into an installable `scout` package (src/scout/) and split the distributable GitHub Actions so the repo can host more than one. Each action is now a thin composite wrapper that pip-installs the package and runs a console-script entry point — no checkout/requirements coupling in consumer workflows. - src/scout/: triage.py, feedback.py, agent.py, init.py, providers/ - evals/: run_eval.py (was scout_eval.py) + seeders/scenarios (dev-only) - tests/: test_triage / test_feedback / test_init - pyproject.toml: build-system, [project.scripts] (scout-triage / scout-feedback / scout-init), dev extras, pytest pythonpath - action.yml: triage wrapper (root path preserved for existing consumers) - actions/feedback/action.yml: new feedback-sync action - remove scout-feedback.yml (belongs in the repo that runs Scout) and requirements*.txt (consolidated into pyproject) - update READMEs, CI (pip install -e .[dev], mypy src/scout), .gitignore Co-authored-by: Douglas Blank <doug@comet.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 3c978f3 commit d1bac82

27 files changed

Lines changed: 223 additions & 183 deletions

.github/workflows/lint.yml

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,21 +18,26 @@ jobs:
1818
cache: pip
1919

2020
- name: Install dev dependencies
21-
run: pip install -r requirements-dev.txt
21+
run: pip install -e ".[dev]"
2222

2323
- name: Ruff (lint)
2424
run: ruff check .
2525

2626
- name: Mypy (type check)
27-
run: mypy scout.py scout_feedback.py init_scout.py
27+
run: mypy src/scout
2828

2929
- name: Validate action + workflow YAML
30-
# actionlint (below) checks workflow files but not action.yml, so parse it here.
30+
# actionlint (below) checks workflow files but not action.yml files, so parse them here.
3131
run: |
3232
python - <<'PY'
3333
import glob
3434
import yaml
35-
for f in ["action.yml", *sorted(glob.glob(".github/workflows/*.yml"))]:
35+
targets = [
36+
"action.yml",
37+
*sorted(glob.glob("actions/*/action.yml")),
38+
*sorted(glob.glob(".github/workflows/*.yml")),
39+
]
40+
for f in targets:
3641
with open(f) as fh:
3742
yaml.safe_load(fh)
3843
print(f"ok: {f}")

.github/workflows/scout-feedback.yml

Lines changed: 0 additions & 64 deletions
This file was deleted.

.github/workflows/test.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,13 @@ jobs:
2525
- name: Validate secrets
2626
env:
2727
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
28-
OPIK_API_KEY: ${{ secrets.SCOUT_OPIK_API_KEY }}
29-
OPIK_WORKSPACE: ${{ vars.OPIK_WORKSPACE }}
28+
OPIK_API_KEY: ${{ secrets.OPIK_API_KEY }}
29+
OPIK_WORKSPACE: ${{ secrets.OPIK_WORKSPACE }}
3030
run: |
3131
MISSING=""
3232
[ -z "$ANTHROPIC_API_KEY" ] && MISSING="$MISSING ANTHROPIC_API_KEY"
3333
# Opik is a hard requirement — Scout sources its prompt from Opik and traces there.
34-
[ -z "$OPIK_API_KEY" ] && MISSING="$MISSING SCOUT_OPIK_API_KEY"
34+
[ -z "$OPIK_API_KEY" ] && MISSING="$MISSING OPIK_API_KEY"
3535
[ -z "$OPIK_WORKSPACE" ] && MISSING="$MISSING OPIK_WORKSPACE"
3636
if [ -n "$MISSING" ]; then
3737
echo "::error::Missing required config:$MISSING"
@@ -51,6 +51,6 @@ jobs:
5151
SCOUT_ESCALATION_TAG: ${{ vars.SCOUT_ESCALATION_TAG || 'Escalated request' }}
5252
SCOUT_GITHUB_REPO_OWNER: ${{ vars.SCOUT_GITHUB_REPO_OWNER }}
5353
SCOUT_GITHUB_REPO_NAME: ${{ vars.SCOUT_GITHUB_REPO_NAME }}
54-
OPIK_API_KEY: ${{ secrets.SCOUT_OPIK_API_KEY }}
55-
OPIK_WORKSPACE: ${{ vars.OPIK_WORKSPACE }}
54+
OPIK_API_KEY: ${{ secrets.OPIK_API_KEY }}
55+
OPIK_WORKSPACE: ${{ secrets.OPIK_WORKSPACE }}
5656
ISSUE_NUMBER: ${{ github.event.inputs.issue_number }}

.github/workflows/unit-tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
cache: pip
1919

2020
- name: Install dependencies
21-
run: pip install -r requirements-dev.txt
21+
run: pip install -e ".[dev]"
2222

2323
- name: Run tests
2424
run: pytest

.gitignore

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,9 @@
33
__pycache__/
44
.pytest_cache/
55
.mypy_cache/
6-
.ruff_cache/
6+
.ruff_cache/
7+
8+
# Packaging artifacts (pip install -e . / python -m build)
9+
build/
10+
dist/
11+
*.egg-info/

README-OPIK-INTEGRATION.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -15,26 +15,26 @@ The fix: an in-memory simulator that behaves like GitHub at the seams Scout actu
1515
## Architecture
1616

1717
```
18-
scout.py scout_eval.py
18+
scout.triage evals/run_eval.py
1919
│ │
2020
▼ ▼
2121
GitHubProvider GitHubSimulator ◄── implements ──┐
2222
(PyGithub) (in-memory) │
2323
2424
RepositoryProvider
25-
(providers/base.py)
25+
(scout/providers/base.py)
2626
2727
2828
agent.run_agent
29-
(agent.py)
29+
(scout/agent.py)
3030
```
3131

3232
`agent.run_agent` is the single agent loop. It depends only on the `RepositoryProvider` protocol — not on PyGithub, not on any module globals. Both backends satisfy the same interface; swapping them changes nothing about the loop, the prompt, the model, or the tool dispatch.
3333

3434
The protocol covers everything Scout's tools (and `main()`) need from "GitHub":
3535

3636
```python
37-
# providers/base.py
37+
# src/scout/providers/base.py
3838
class RepositoryProvider(Protocol):
3939
# read
4040
def get_issue_data(self, issue_number: int) -> dict: ...
@@ -52,7 +52,7 @@ class RepositoryProvider(Protocol):
5252

5353
## The simulator
5454

55-
`providers/simulator.py` defines `GitHubSimulator` — a real object with mutable state, not a passive fixture. Scenarios build it up with a fluent API; the agent reads and writes against it; assertions inspect both the output text and the resulting state.
55+
`src/scout/providers/simulator.py` defines `GitHubSimulator` — a real object with mutable state, not a passive fixture. Scenarios build it up with a fluent API; the agent reads and writes against it; assertions inspect both the output text and the resulting state.
5656

5757
```python
5858
sim = (
@@ -113,7 +113,7 @@ Example real-GitHub spec:
113113

114114
## Scenarios — bridging JSON to the simulator
115115

116-
Opik dataset rows are JSON; simulator behavior is Python. `providers/scenarios.py` reconciles them with a small registry:
116+
Opik dataset rows are JSON; simulator behavior is Python. `src/scout/providers/scenarios.py` reconciles them with a small registry:
117117

118118
```python
119119
SCENARIO_BUILDERS: dict[str, Callable[[dict], GitHubSimulator]] = {}
@@ -203,7 +203,7 @@ See `evals/starter_scenarios.py` for five worked examples covering duplicate cit
203203

204204
## How the eval driver works
205205

206-
`scout_eval.py` is the bridge between Opik's `run_tests` and the agent loop:
206+
`evals/run_eval.py` is the bridge between Opik's `run_tests` and the agent loop:
207207

208208
```python
209209
def task(item: dict) -> dict:
@@ -251,8 +251,8 @@ Assertions can reference any of these. Examples:
251251

252252
Both providers route traces through Opik identically — tracing lives at the agent/tool/LLM layer, not the provider layer. What differs is the project name:
253253

254-
- Production runs (`scout.py``GitHubProvider`) trace to `scout:<owner>/<repo>`.
255-
- Eval runs (`scout_eval.py``GitHubSimulator`) trace to `scout-eval` (override with `SCOUT_EVAL_OPIK_PROJECT`).
254+
- Production runs (`scout.triage``GitHubProvider`) trace to `scout:<owner>/<repo>`.
255+
- Eval runs (`evals/run_eval.py``GitHubSimulator`) trace to `scout-eval` (override with `SCOUT_EVAL_OPIK_PROJECT`).
256256

257257
Different projects keep prod triage and eval experiments visually separate in the Opik UI. Eval runs are noisy — you may run a 5-item suite many times while iterating on the prompt — and you don't want that drowning out real triage traces.
258258

@@ -276,34 +276,34 @@ ANTHROPIC_API_KEY=... OPIK_API_KEY=... OPIK_WORKSPACE=... \
276276
GITHUB_TOKEN=unused \
277277
SCOUT_GITHUB_REPO_OWNER=x SCOUT_GITHUB_REPO_NAME=y \
278278
SCOUT_EXPERIMENT_NAME=baseline-v1 \
279-
python scout_eval.py
279+
python -m evals.run_eval
280280
```
281281

282-
`GITHUB_TOKEN` must be set because `scout.py` validates it at import time. For all-simulated suites the value is unused — `unused` is fine. **For suites that include real-GitHub-mode scenarios** (specs with no `files` key), it must be a real token with read access to the target repo. `SCOUT_EXPERIMENT_NAME` is treated as a *prefix*: each run gets `{prefix}-YYYY-MM-DD-HH-MM-SS` appended, so re-running without changing the env var produces a fresh, chronologically sortable experiment in the Opik UI.
282+
`GITHUB_TOKEN` must be set because the triage module validates it at import time. For all-simulated suites the value is unused — `unused` is fine. **For suites that include real-GitHub-mode scenarios** (specs with no `files` key), it must be a real token with read access to the target repo. `SCOUT_EXPERIMENT_NAME` is treated as a *prefix*: each run gets `{prefix}-YYYY-MM-DD-HH-MM-SS` appended, so re-running without changing the env var produces a fresh, chronologically sortable experiment in the Opik UI.
283283

284284
## Adding a scenario
285285

286286
1. **Open `evals/starter_scenarios.py`** and append a new item to `STARTER_SCENARIOS` following the shape above.
287287
2. **Write 3–6 specific assertions** that a judge can answer yes/no clearly. Reference the surfaced output keys (`output`, `final_labels`, `applied_labels`, `search_queries`) when behavior matters more than text.
288-
3. **If your scenario needs programmable behavior** (flaky tools, multi-call state changes), register a new builder with `@register("your-name")` in `providers/scenarios.py` and reference it via `"scenario": "your-name"`. The base `_default` builder is composable — call it inside your builder and then mutate the result.
289-
4. **Run `pytest test_scout.py -k Starter`** — three parametrized validation tests will check your scenario is structurally valid before you push.
288+
3. **If your scenario needs programmable behavior** (flaky tools, multi-call state changes), register a new builder with `@register("your-name")` in `src/scout/providers/scenarios.py` and reference it via `"scenario": "your-name"`. The base `_default` builder is composable — call it inside your builder and then mutate the result.
289+
4. **Run `pytest tests/test_triage.py -k Starter`** — three parametrized validation tests will check your scenario is structurally valid before you push.
290290
5. **Re-seed and re-run:**
291291

292292
```bash
293293
python -m evals.seed_test_suite
294-
python scout_eval.py
294+
python -m evals.run_eval
295295
```
296296

297297
## Files
298298

299299
| Path | Purpose |
300300
|---|---|
301-
| `providers/base.py` | `RepositoryProvider` protocol |
302-
| `providers/github.py` | Production backend (PyGithub) |
303-
| `providers/simulator.py` | In-memory simulator with fluent builders |
304-
| `providers/scenarios.py` | Scenario builder registry + default/search-rate-limited builders |
305-
| `agent.py` | Agent loop, tool definitions, `make_tools`, `make_client` |
306-
| `scout.py` | Production entry point (env parsing, prompt loading, `main()`) |
307-
| `scout_eval.py` | Opik Test Suite driver |
301+
| `src/scout/providers/base.py` | `RepositoryProvider` protocol |
302+
| `src/scout/providers/github.py` | Production backend (PyGithub) |
303+
| `src/scout/providers/simulator.py` | In-memory simulator with fluent builders |
304+
| `src/scout/providers/scenarios.py` | Scenario builder registry + default/search-rate-limited builders |
305+
| `src/scout/agent.py` | Agent loop, tool definitions, `make_tools`, `make_client` |
306+
| `src/scout/triage.py` | Production entry point (env parsing, prompt loading, `main()`) |
307+
| `evals/run_eval.py` | Opik Test Suite driver |
308308
| `evals/starter_scenarios.py` | Five worked starter scenarios |
309309
| `evals/seed_test_suite.py` | Idempotent suite seeder |

README.md

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -209,9 +209,47 @@ Anyone viewing an issue can rate Scout's triage comment by adding a 👍 or 👎
209209
- **1.0** = all 👍, **0.0** = all 👎, otherwise the ratio `👍 / (👍 + 👎)` (e.g. 3 👍 and 1 👎 → `0.75`). Reactions other than 👍/👎 are ignored.
210210
- The score carries a `reason` that attributes the votes by GitHub login, e.g. `👍 2 (alice, bob) / 👎 1 (carol) from GitHub`, so you can see *who* reacted in the Opik UI alongside the trace.
211211

212-
**How it works.** GitHub fires no event when a reaction is added, so a scheduled workflow (`.github/workflows/scout-feedback.yml`, every 30 min) polls recent issues, reads the reaction counts on Scout's comments, and upserts the score. Each Scout comment carries a hidden marker (`<!-- scout-feedback trace_id=… -->`) that maps it back to its Opik trace. The sync is idempotent — re-running simply recomputes the score from current reactions — so feedback lands in Opik within one cron interval and self-corrects as votes change.
212+
**How it works.** GitHub fires no event when a reaction is added, so a scheduled workflow polls recent issues every 30 minutes, reads the reaction counts on Scout's comments, and upserts the score. Each Scout comment carries a hidden marker (`<!-- scout-feedback trace_id=… -->`) that maps it back to its Opik trace. The sync is idempotent — re-running simply recomputes the score from current reactions — so feedback lands in Opik within one cron interval and self-corrects as votes change.
213213

214-
**Enabling it.** Add `.github/workflows/scout-feedback.yml` to the repo that runs Scout (it reuses the same `SCOUT_OPIK_API_KEY` secret and `OPIK_WORKSPACE` / `SCOUT_GITHUB_REPO_OWNER` / `SCOUT_GITHUB_REPO_NAME` variables). Trigger it manually from the Actions tab for an immediate sync.
214+
**Enabling it.** The feedback sync ships as a second action published from this repo, `comet-ml/scout-repo-agent/actions/feedback`, alongside the triage action. Add a scheduled workflow to the repo that runs Scout — it reuses the same `OPIK_API_KEY` secret and `OPIK_WORKSPACE` value as the triage action:
215+
216+
```yaml
217+
name: Scout Feedback Sync
218+
219+
on:
220+
schedule:
221+
- cron: '*/30 * * * *' # every 30 minutes
222+
workflow_dispatch:
223+
inputs:
224+
since_days:
225+
description: How many days back to scan issues for reactions
226+
required: false
227+
default: '7'
228+
type: string
229+
230+
concurrency:
231+
group: scout-feedback-sync
232+
cancel-in-progress: false
233+
234+
jobs:
235+
sync-feedback:
236+
runs-on: ubuntu-latest
237+
timeout-minutes: 15
238+
permissions:
239+
issues: read
240+
contents: read
241+
steps:
242+
- name: Sync reactions to Opik
243+
uses: comet-ml/scout-repo-agent/actions/feedback@main
244+
with:
245+
github_token: ${{ github.token }}
246+
since_days: ${{ github.event.inputs.since_days || '7' }}
247+
env:
248+
OPIK_API_KEY: ${{ secrets.OPIK_API_KEY }}
249+
OPIK_WORKSPACE: ${{ vars.OPIK_WORKSPACE }}
250+
```
251+
252+
Trigger it manually from the Actions tab for an immediate sync.
215253

216254
> **Scan window.** GitHub does not bump an issue's `updated_at` when a reaction is added, so the sync only re-checks issues with other activity within `SCOUT_FEEDBACK_SINCE_DAYS` (default 7). Reactions on otherwise-quiet older issues may be missed — run the workflow manually with a larger `since_days` to backfill. Because the upsert is idempotent, re-syncing is always safe.
217255

@@ -221,15 +259,21 @@ Use the manual trigger workflow in this repo's Actions tab (`Test Scout (Manual)
221259

222260
## Local development
223261

262+
The code is an installable package under `src/scout/`. Install it (with dev extras) in editable mode:
263+
224264
```bash
225-
pip install -r requirements.txt
265+
pip install -e ".[dev]"
226266
227267
# Copy and fill in the template
228268
cp .env.example .env
229269
230-
python scout.py
270+
# Run triage locally (console script registered by the install).
271+
# Equivalent to `python -m scout.triage`.
272+
scout-triage
231273
```
232274

275+
Run the unit tests and linters with `pytest`, `ruff check .`, and `mypy src/scout`.
276+
233277
`.env.example`:
234278
```
235279
ANTHROPIC_API_KEY=

action.yml

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,19 +26,16 @@ runs:
2626
with:
2727
python-version: '3.12'
2828

29-
- name: Cache pip dependencies
30-
uses: actions/cache@v4
31-
with:
32-
path: ~/.cache/pip
33-
key: scout-pip-${{ hashFiles(format('{0}/requirements.txt', github.action_path)) }}
34-
35-
- name: Install dependencies
29+
- name: Install Scout
3630
shell: bash
37-
run: pip install -r ${{ github.action_path }}/requirements.txt
31+
# ${{ github.action_path }} is this action's directory — the repo root for
32+
# the root action. Installing the package pulls runtime deps and registers
33+
# the scout-triage console script.
34+
run: pip install "${{ github.action_path }}"
3835

3936
- name: Run Scout
4037
shell: bash
41-
run: python ${{ github.action_path }}/scout.py
38+
run: scout-triage
4239
env:
4340
ANTHROPIC_API_KEY: ${{ inputs.anthropic_api_key }}
4441
GITHUB_TOKEN: ${{ inputs.github_token }}

actions/feedback/action.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: Scout Feedback Sync
2+
description: Sync 👍/👎 reactions on Scout's issue comments to Opik as human feedback scores
3+
author: Comet (comet.com)
4+
5+
inputs:
6+
github_token:
7+
description: GitHub token with issues:read permission
8+
required: true
9+
since_days:
10+
description: How many days back to scan issues for reactions
11+
required: false
12+
default: '7'
13+
14+
runs:
15+
using: composite
16+
steps:
17+
- name: Set up Python
18+
uses: actions/setup-python@v5
19+
with:
20+
python-version: '3.12'
21+
22+
- name: Install Scout
23+
shell: bash
24+
# ${{ github.action_path }} is this action's directory (actions/feedback);
25+
# ../.. is the repo root, where the installable package lives. Installing it
26+
# pulls runtime deps and registers the scout-feedback console script.
27+
run: pip install "${{ github.action_path }}/../.."
28+
29+
- name: Sync reactions to Opik
30+
shell: bash
31+
run: scout-feedback
32+
env:
33+
GITHUB_TOKEN: ${{ inputs.github_token }}
34+
SCOUT_FEEDBACK_SINCE_DAYS: ${{ inputs.since_days }}
35+
# OPIK_API_KEY / OPIK_WORKSPACE are read from the environment the caller
36+
# sets on the `uses:` step (see README), mirroring the triage action.

0 commit comments

Comments
 (0)