Test Cookbook

Copy‑pasteable recipes for common scenarios.

1) Label PR on open

- name: label-flow
  event: pr_opened
  fixture: gh.pr_open.minimal
  mocks:
    overview:
      text: "Overview body"
      tags: { label: feature, review-effort: 2 }
  expect:
    calls:
      - step: overview
        exactly: 1
      - step: apply-overview-labels
        exactly: 1
      - provider: github
        op: labels.add
        at_least: 1
        args: { contains: [feature, "review/effort:2"] }

2) Ignore normal comment

- name: standard-comment
  event: issue_comment
  fixture: gh.issue_comment.standard
  mocks:
    comment-assistant: { text: "", intent: comment_reply }  # empty text → no reply
  expect:
    no_calls:
      - provider: github
        op: issues.createComment
    calls:
      - step: comment-assistant
        exactly: 1

3) `/visor help` reply and prompt check

- name: visor-plain
  event: issue_comment
  fixture: gh.issue_comment.visor_help
  mocks:
    comment-assistant: { text: "Sure, here’s how I can help.", intent: comment_reply }
  expect:
    calls:
      - step: comment-assistant
        exactly: 1
      - provider: github
        op: issues.createComment
        exactly: 1
    prompts:
      - step: comment-assistant
        matches: "(?i)\\/visor\\s+help"

4) Regenerate reviews on command

- name: visor-retrigger
  event: issue_comment
  fixture: gh.issue_comment.visor_regenerate
  mocks:
    comment-assistant: { text: "Regenerating.", intent: comment_retrigger }
    overview: { text: "Overview (regenerated)", tags: { label: feature, review-effort: 2 } }
  expect:
    calls:
      - step: comment-assistant
        exactly: 1
      - step: overview
        exactly: 1

5) Facts enabled (one fact)

- name: facts-enabled
  event: issue_comment
  fixture: gh.issue_comment.visor_help
  env: { ENABLE_FACT_VALIDATION: "true" }
  mocks:
    comment-assistant: { text: "We rely on defaults/visor.yaml line 11 for max_parallelism=4.", intent: comment_reply }
    extract-facts:
      - { id: f1, category: Configuration, claim: "max_parallelism defaults to 4", verifiable: true }
    validate-fact[]:
      - { fact_id: f1, claim: "max_parallelism defaults to 4", is_valid: true, confidence: high, evidence: "defaults/visor.yaml:11" }
  expect:
    calls:
      - step: comment-assistant
        exactly: 1
      - step: extract-facts
        exactly: 1
      - step: validate-fact
        at_least: 1

6) Facts invalid (correction reply)

When a fact is invalid, the correction flow triggers a re-run. Due to goto forward-running dependents, extract-facts and validate-fact also run again.

- name: facts-invalid
  event: issue_comment
  fixture: gh.issue_comment.visor_help
  env: { ENABLE_FACT_VALIDATION: "true" }
  routing:
    max_loops: 1
  mocks:
    comment-assistant: { text: "We rely on defaults/visor.yaml line 11 for max_parallelism=4.", intent: comment_reply }
    extract-facts:
      - { id: f1, category: Configuration, claim: "max_parallelism defaults to 4", verifiable: true }
    validate-fact[]:
      - { fact_id: f1, claim: "max_parallelism defaults to 4", is_valid: false, confidence: high, evidence: "defaults/visor.yaml:11 does not set 4", correction: "max_parallelism defaults to 3" }
  expect:
    calls:
      - step: comment-assistant
        exactly: 2
      - step: extract-facts
        exactly: 2
      - step: validate-fact
        exactly: 2
      - step: aggregate
        exactly: 1
    outputs:
      - step: validate-fact
        where: { path: fact_id, equals: f1 }
        path: correction
        equals: "max_parallelism defaults to 3"

7) Two facts (one invalid)

With two facts extracted where only one is invalid, the correction pass runs for the invalid fact. Due to goto forward-running dependents, extract-facts and validate-fact run again on retry.

- name: facts-two-items
  event: issue_comment
  fixture: gh.issue_comment.visor_help
  env: { ENABLE_FACT_VALIDATION: "true" }
  routing:
    max_loops: 1
  mocks:
    comment-assistant: { text: "We rely on defaults/visor.yaml for concurrency defaults.", intent: comment_reply }
    extract-facts:
      - { id: f1, category: Configuration, claim: "max_parallelism defaults to 4", verifiable: true }
      - { id: f2, category: Feature,       claim: "Fast mode is enabled by default", verifiable: true }
    validate-fact[]:
      - { fact_id: f1, claim: "max_parallelism defaults to 4", is_valid: false, confidence: high, evidence: "defaults/visor.yaml:11", correction: "max_parallelism defaults to 3" }
      - { fact_id: f2, claim: "Fast mode is enabled by default", is_valid: true,  confidence: high, evidence: "src/config.ts:FAST_MODE=true" }
  expect:
    calls:
      - step: comment-assistant
        exactly: 2
      - step: extract-facts
        exactly: 2
      - step: validate-fact
        exactly: 4
      - step: aggregate
        exactly: 1
    outputs:
      - step: validate-fact
        where: { path: fact_id, equals: f1 }
        path: is_valid
        equals: false
      - step: validate-fact
        where: { path: fact_id, equals: f2 }
        path: is_valid
        equals: true

8) GitHub negative mode

- name: github-negative-mode
  event: pr_opened
  fixture: gh.pr_open.minimal
  github_recorder: { error_code: 429 }
  mocks: { overview: { text: "Overview body", tags: { label: feature, review-effort: 2 } } }
  expect:
    calls:
      - step: overview
        exactly: 1
      - step: apply-overview-labels
        exactly: 1
    fail:
      message_contains: "github/op_failed"

9) API tool (`type: api`) with YAML tests

You can verify OpenAPI-to-MCP conversion without real network calls by asserting generated-tool validation behavior:

tools:
  users-api:
    type: api
    name: users-api
    spec: ./fixtures/api-tool-openapi.json
    headers:
      Authorization: "Bearer ${API_TEST_BEARER_TOKEN}"
      X-Tenant-Id: "${API_TEST_TENANT_ID}"
    whitelist: [get*]

checks:
  api-tool-missing-required-input:
    type: mcp
    transport: custom
    method: getUser
    methodArgs: {}
    on: [manual]

tests:
  cases:
    - name: api-tool-generated-operation-validates-input
      event: manual
      fixture: gh.pr_open.minimal
      expect:
        calls:
          - step: api-tool-missing-required-input
            exactly: 1
        outputs:
          - step: api-tool-missing-required-input
            path: issues[0].message
            matches: "(?i)required property 'id'"

This confirms generated operation tools are registered and invoked through transport: custom. The same config supports env-backed custom headers (for example Authorization: "Bearer ${API_TEST_BEARER_TOKEN}").

Also see end-to-end example suites:

examples/api-tools-mcp-example.yaml (embedded tests)
examples/api-tools-ai-example.yaml (embedded tests)
examples/api-tools-inline-overlay-example.yaml (embedded tests)

10) Multi-turn conversation with cross-turn assertions

Simulate a multi-message conversation and assert on each response — including looking back at earlier turns from a later stage.

- name: multi-turn-support-conversation
  flow:
    - name: user-reports-issue
      event: manual
      fixture: local.minimal
      routing: { max_loops: 0 }
      execution_context:
        conversation:
          transport: slack
          thread: { id: "support-thread" }
          messages:
            - { role: user, text: "My API is returning 502 errors" }
          current: { role: user, text: "My API is returning 502 errors" }
      mocks:
        chat[]:
          - text: "A 502 error typically means the upstream service is unreachable. Can you check if your backend is running and the target URL in your API definition is correct?"
          - intent: chat
      expect:
        calls:
          - step: chat
            exactly: 1
        llm_judge:
          - step: chat
            path: text
            prompt: Does the response acknowledge the 502 error and suggest diagnostic steps?

    - name: user-provides-details
      event: manual
      fixture: local.minimal
      routing: { max_loops: 0 }
      execution_context:
        conversation:
          transport: slack
          thread: { id: "support-thread" }
          messages:
            - { role: user, text: "My API is returning 502 errors" }
            - { role: assistant, text: "A 502 error typically means the upstream service is unreachable..." }
            - { role: user, text: "The backend is running. I checked with curl and it works directly." }
          current: { role: user, text: "The backend is running. I checked with curl and it works directly." }
      mocks:
        chat[]:
          - text: "If curl works directly but Tyk returns 502, check: 1) The `target_url` in your API definition matches what curl uses 2) Tyk can resolve the hostname (DNS) 3) Any TLS certificate issues between Tyk and the upstream."
          - intent: chat
      expect:
        calls:
          - step: chat
            exactly: 1
        llm_judge:
          # Assert current response narrows down based on user's info
          - step: chat
            index: last
            path: text
            prompt: |
              The user said curl works directly but Tyk gives 502.
              Does the response narrow down Tyk-specific causes (not repeat generic advice)?
          # Verify first response was appropriately general (before details were known)
          - step: chat
            index: first
            path: text
            prompt: |
              This was the first response before the user provided details.
              Was it appropriately exploratory (asking for info) rather than jumping to conclusions?

11) LLM-as-judge: semantic evaluation

Use llm_judge to evaluate whether AI responses meet semantic criteria that can't be expressed with regex or exact matching.

- name: response-quality-check
  event: manual
  fixture: local.minimal
  mocks:
    chat[]:
      - text: |
          Tyk Gateway uses Redis-based distributed rate limiting through its
          middleware chain. Rate limits are configured per API key or policy
          with `rate` and `per` fields. When exceeded, returns HTTP 429.
      - intent: chat
      - skills: [code-explorer]
  expect:
    calls:
      - step: chat
        exactly: 1
    llm_judge:
      # Simple pass/fail verdict
      - step: chat
        path: text
        prompt: |
          Does this response accurately explain rate limiting?
          It should mention specific mechanisms, not be generic.

      # Structured extraction with assertions
      - step: chat
        path: text
        prompt: Analyze this technical response about rate limiting.
        schema:
          properties:
            mentions_redis:
              type: boolean
              description: "Mentions Redis for distributed rate limiting?"
            mentions_status_code:
              type: boolean
              description: "Mentions HTTP 429 status code?"
            technical_depth:
              type: string
              enum: [shallow, moderate, deep]
          required: [mentions_redis, mentions_status_code, technical_depth]
        assert:
          mentions_redis: true
          mentions_status_code: true

Configure the judge model globally:

tests:
  defaults:
    llm_judge:
      model: gemini-2.0-flash
      provider: google

Or per-assertion with the model field. Set VISOR_JUDGE_MODEL env var as a fallback.

12) Conversation sugar: multi-turn without boilerplate

The conversation: format auto-builds message history from prior turns, removing the need to duplicate execution_context.conversation.messages across stages.

- name: support-conversation
  strict: false
  conversation:
    - role: user
      text: "My API is returning 502 errors"
      mocks:
        chat: { text: "A 502 error typically means the upstream service is unreachable. Can you check if your backend is running?", intent: chat }
      expect:
        calls:
          - step: chat
            exactly: 1
    - role: user
      text: "The backend is running. Curl works directly."
      mocks:
        chat: { text: "If curl works directly but Tyk returns 502, check the target_url in your API definition and DNS resolution.", intent: chat }
      expect:
        outputs:
          - step: chat
            turn: current
            path: text
            matches: "(?i)target_url"
        llm_judge:
          - step: chat
            turn: current
            path: text
            prompt: Does the response narrow down Tyk-specific causes?
          - step: chat
            turn: 1
            path: text
            prompt: Was the first response appropriately exploratory?

Compare this with the equivalent flow: format in recipe #10 — the conversation: format is significantly more concise.

13) Multi-user conversation: group chat isolation

Use the user: field on turns to simulate different users in the same conversation thread. The value is exposed as conversation.current.user in Liquid templates.

- name: group-chat-data-isolation
  conversation:
    turns:
      # User 1 asks about their data
      - role: user
        user: "user-1"
        text: "What are my open tickets?"
        mocks:
          chat: { text: "You have 3 open tickets: AUTH-1, AUTH-2, AUTH-3.", intent: chat }
          getUserContext:
            user: { id: 1, name: "Alice", role: "engineer" }
            tickets: [{ id: "AUTH-1" }, { id: "AUTH-2" }, { id: "AUTH-3" }]
        expect:
          outputs:
            - step: chat
              path: text
              matches: "(?i)3|AUTH|ticket"

      # Different user in the same thread — sees DIFFERENT data
      - role: user
        user: "user-2"
        text: "Show me my tickets too"
        mocks:
          chat: { text: "You have 1 open ticket: API-42.", intent: chat }
          getUserContext:
            user: { id: 2, name: "Bob", role: "manager" }
            tickets: [{ id: "API-42" }]
        expect:
          outputs:
            - step: chat
              path: text
              matches: "(?i)1|API-42|ticket"

How it works in --no-mocks mode:

The test runner sets conversation.current.user from the turn's user: field
Your system prompt uses Liquid: provider_id: "{{ conversation.current.user }}"
The AI reads the rendered prompt and passes the identity as a tool argument
Your backend resolves the identity and returns user-specific data

This enables end-to-end testing of per-user data isolation without mocks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Cookbook

1) Label PR on open

2) Ignore normal comment

3) `/visor help` reply and prompt check

4) Regenerate reviews on command

5) Facts enabled (one fact)

6) Facts invalid (correction reply)

7) Two facts (one invalid)

8) GitHub negative mode

9) API tool (`type: api`) with YAML tests

10) Multi-turn conversation with cross-turn assertions

11) LLM-as-judge: semantic evaluation

12) Conversation sugar: multi-turn without boilerplate

13) Multi-user conversation: group chat isolation

Related Documentation

FilesExpand file tree

cookbook.md

Latest commit

History

cookbook.md

File metadata and controls

Test Cookbook

1) Label PR on open

2) Ignore normal comment

3) /visor help reply and prompt check

4) Regenerate reviews on command

5) Facts enabled (one fact)

6) Facts invalid (correction reply)

7) Two facts (one invalid)

8) GitHub negative mode

9) API tool (type: api) with YAML tests

10) Multi-turn conversation with cross-turn assertions

11) LLM-as-judge: semantic evaluation

12) Conversation sugar: multi-turn without boilerplate

13) Multi-user conversation: group chat isolation

Related Documentation

3) `/visor help` reply and prompt check

9) API tool (`type: api`) with YAML tests