You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add autocompare tracing, UI model add, and CI tests
Implements OpenAI/Anthropic autocompare instrumentation, trace-first logging and dashboard views, monthly model presets, and secondary metrics callbacks. Adds UI + model replay flow and a minimal CI workflow with pytest + Playwright coverage. kumquat
Made-with: Cursor
Opens a web dashboard with projects in the sidebar, a results table with truncation for long outputs, latency and cost per model, and aggregate match rates. The image above shows the UI, which you can reproduce by cloning this repo and running: `python examples/demo_dashboard.py`
85
137
138
+
The dashboard now includes:
86
139
87
-
## Roadmap
140
+
- Trace view with input/output inspection
141
+
- Model size badges
142
+
- Secondary metrics display
143
+
- A `+` column action to add another model and replay saved traces against it
88
144
89
-
- Allow adding additional models directly through the UI
90
-
- Add LLM as judge to score outputs that are not structured
91
-
- Let developers eaisly fine tune models on outputs
145
+
## Examples
146
+
147
+
Two runnable example groups are provided:
148
+
149
+
-`examples/mock/` for quick local seeding to inspect UI states
150
+
-`seed_basic.py`
151
+
-`seed_traces.py`
152
+
-`seed_secondary_metrics.py`
153
+
-`examples/real/` for real SDK usage patterns (requires API keys)
0 commit comments