Skip to content

feat: Add agent-test-generation skill#26

Open
hocokahu wants to merge 1 commit into
warpdotdev:mainfrom
hocokahu:add-agent-test-generation-skill
Open

feat: Add agent-test-generation skill#26
hocokahu wants to merge 1 commit into
warpdotdev:mainfrom
hocokahu:add-agent-test-generation-skill

Conversation

@hocokahu

@hocokahu hocokahu commented May 29, 2026

Copy link
Copy Markdown

@zachbai I thought it would be useful to add a skill to generate test cases against all agentic coding. Let me know your thought. Monocle2AI is an open source under the Linux Foundation.

Summary

Adds agent-test-generation — a skill that scaffolds monocle_test_tools pytest tests for Python AI agent apps across LangGraph, Google ADK, OpenAI, Microsoft Agent Framework, CrewAI, LlamaIndex, and Strands.

Covers seven test categories:

  1. Agent & Tool Routing — positive + negative tests that the right agent/tool runs for each request
  2. Input Validation — verify user inputs are forwarded into agent/tool calls
  3. Output Validation — verify outputs contain expected content
  4. Performance — token-limit and duration bounds
  5. Quality Assessment — dual-mode (see below)
  6. Multi-task Orchestration — complex multi-agent requests
  7. Individual Agent Testing — each sub-agent in isolation

Dual-mode quality assessment

The quality file works with or without a cloud key:

  • Local (default, no key) — deterministic contains_output/does_not_have_any_output assertions plus optional BERTScore semantic similarity (auto-skips if bert_score isn't pip-installed). No network, no LLM call.
  • Cloud (OKAHU_API_KEY set) — LLM-as-judge classification via Okahu (sentiment, toxicity, bias, hallucination, etc.). Tests are gated by pytest.mark.skipif so they skip cleanly when the key is absent. The Cloud eval is provided at no cost.

Scaffold monocle_test_tools pytest tests for Python AI agent apps
(LangGraph, Google ADK, CrewAI, LlamaIndex, Strands). Covers seven
test categories: routing, input/output validation, performance,
multi-agent orchestration, individual-agent isolation, and quality
assessment.

Quality assessment is dual-mode: deterministic + optional BERTScore
similarity locally (no API key, no LLM call); LLM-as-judge
classification via Okahu cloud when OKAHU_API_KEY is set.
@hocokahu hocokahu force-pushed the add-agent-test-generation-skill branch from 3caa53b to abd8f01 Compare June 5, 2026 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant