Improve SimpleQA eval observability with structured logging and latency metadata by SeanHe727 · Pull Request #1790 · assafelovic/gpt-researcher

SeanHe727 · 2026-05-30T02:23:21Z

Summary

This PR improves the SimpleQA evaluation pipeline by adding structured logging, per-query latency tracking, and reproducibility metadata.

Changes

Added per-query latency measurement using time.perf_counter()
Added structured JSON logging with:
- run metadata: researcher model, timestamp, git commit
- aggregate metrics: accuracy, average latency, p50 latency, p95 latency
- per-query evaluation results
Updated deprecated SMART_LLM_MODEL usage to SMART_LLM
Added a guard for empty generator results to avoid next() failures
Added documentation / example output for the structured eval log

Motivation

The previous eval output was primarily console-based, which made it harder to compare runs, track latency regressions, and reproduce evaluation results across model and configuration changes.

Compatibility

Existing eval behavior is preserved unless structured logging is enabled.

Testing

Ran SimpleQA eval on a small sample
Verified structured log output is valid JSON
Verified aggregate latency metrics are generated correctly
Verified existing console output still works

…taset

SeanHe727 · 2026-05-30T02:27:23Z

Thanks for reviewing! This PR is intended as a small, backward-compatible improvement to make SimpleQA eval runs easier to reproduce and compare.

It adds structured JSON logging, per-query latency tracking, and run metadata without changing the default eval behavior. Happy to adjust the output format or scope based on maintainer preferences.

eval: add latency tracking, structured JSON logs, and multi-domain da…

d45c2c7

…taset

SeanHe727 force-pushed the add-simpleqa-eval-logging branch from 863ce4b to d45c2c7 Compare May 31, 2026 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve SimpleQA eval observability with structured logging and latency metadata#1790

Improve SimpleQA eval observability with structured logging and latency metadata#1790
SeanHe727 wants to merge 1 commit into
assafelovic:mainfrom
SeanHe727:add-simpleqa-eval-logging

SeanHe727 commented May 30, 2026

Uh oh!

SeanHe727 commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SeanHe727 commented May 30, 2026

Summary

Changes

Motivation

Compatibility

Testing

Uh oh!

SeanHe727 commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant