Skip to content

feat: add 80-question kbase evaluation test suite#8

Merged
emac-E merged 1 commit into
emac-E:mainfrom
Lifto:feat/kbase-eval-tests
Apr 15, 2026
Merged

feat: add 80-question kbase evaluation test suite#8
emac-E merged 1 commit into
emac-E:mainfrom
Lifto:feat/kbase-eval-tests

Conversation

@Lifto

@Lifto Lifto commented Apr 15, 2026

Copy link
Copy Markdown

Summary

  • Adds config/kbase_tests.yaml with 80 evaluation questions whose answers come from kbase articles (Solutions + Articles) in OKP
  • All source documents are post-January 2025 (after Gemini 2.5 Flash cutoff)
  • Covers 15+ topic categories: containers, storage, security/crypto, networking, virtualization, identity/auth, kernel/boot, upgrades/migration, package management, subscriptions, HA clustering, cloud, and more

Purpose

Demonstrate the measurable value of having kbase content available in CLA backend. Run with --metrics custom:answer_correctness using the standard 3-run protocol.

Results (80-question experiment, 3 runs each)

No-Kbase With-Kbase Delta
PASS (reliable) 9 (11%) 29 (36%) +20
FAIL (reliable) 47 (59%) 15 (19%) -32
TRANSIENT 24 (30%) 36 (45%) +12
Mean score 0.495 0.780 +0.286
Avg pass rate 25.8% 60.8% +35.0 pp
  • 53 questions improved, 18 same, 9 regressed
  • 14 questions flipped from reliable FAIL → reliable PASS
  • Kbase produces a 2.35× improvement in reliable pass rate

Add config/kbase_tests.yaml with 80 questions whose answers come from
kbase articles (Solutions + Articles) in OKP. All source documents are
post-January 2025 (after Gemini 2.5 Flash cutoff).

Purpose: Demonstrate the value of having kbase content in search_portal
by comparing correctness with and without kbase suppression.

Topics covered: containers, storage, security/crypto, networking,
virtualization, identity/auth, kernel/boot, upgrades/migration,
package management, subscriptions, HA clustering, cloud, and more.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

@emac-E emac-E left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding these questions!

@emac-E emac-E merged commit 030a1c6 into emac-E:main Apr 15, 2026
5 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants