Skip to content

feat: add AgentLoopDetectionMetric for detecting infinite loops and cyclical patterns#2819

Open
Ruthwik-Data wants to merge 11 commits into
confident-ai:mainfrom
Ruthwik-Data:feat/agent-loop-detection-metric
Open

feat: add AgentLoopDetectionMetric for detecting infinite loops and cyclical patterns#2819
Ruthwik-Data wants to merge 11 commits into
confident-ai:mainfrom
Ruthwik-Data:feat/agent-loop-detection-metric

Conversation

@Ruthwik-Data

Copy link
Copy Markdown
Contributor

Summary

Adds AgentLoopDetectionMetric to detect production failures where agents get stuck in infinite loops before completing their tasks. Addresses issue #2643.

What's included

Core detection mechanisms

  • Tool call repetition: Detects when the same tool is called repeatedly with identical (or nearly identical) arguments
  • Reasoning stagnation: Identifies when LLM outputs become highly similar across consecutive steps using n-gram overlap (default) or optional embedding-based similarity
  • Call cycles: Finds repeating patterns in tool call sequences (A→B→C→A)

API

from deepeval.metrics import AgentLoopDetectionMetric

loop_metric = AgentLoopDetectionMetric(
    threshold=0.5,
    repetition_threshold=3,          # max identical tool calls before flagging  
    min_identical_args_ratio=0.9,    # ratio of matching args to count as duplicate
    reasoning_stagnation_detector="ngram",  # "ngram" | "embedding"
    similarity_threshold=0.85,       # for reasoning stagnation window
    stall_steps=5,                   # max planning-only steps
)

loop_metric.measure(test_case)
print(loop_metric.score)           # 0.0-1.0 (1.0 = no loops)
print(loop_metric.reason)          # Human-readable explanation  
print(loop_metric.loop_triggers)   # List of LoopTrigger objects with step indices

Output structure

# Example
metric.score = 0.12
metric.reason = "search_web called 4x with identical args in steps [3,4,5,6]"
metric.loop_triggers = [
    LoopTrigger(
        type="tool_repeat",
        tool="search_web",
        steps=[3, 4, 5, 6],
        args_fingerprint="abc123",
        description="search_web called 4x with identical args in steps [3,4,5,6]"
    )
]

Design decisions

  1. Deterministic by default: Uses n-gram overlap for reasoning stagnation to keep the metric zero-latency. Embedding-based detection is opt-in for higher recall.

  2. min_identical_args_ratio: Prevents false positives on legitimate retry-with-variation patterns (e.g., a search tool re-querying with slightly different params).

  3. Actionable output: The loop_triggers field annotates exactly which steps triggered detection and why, enabling teams to debug and fix agent logic.

  4. Trace-only: Requires test_case._trace_dict with a steps field. Each step should include:

    • tool_name and tool_args for tool calls
    • llm_output or reasoning for reasoning detection

Related

Testing

Before merging, will add:

  • Unit tests with fixture traces (healthy, infinite loop, cyclical)
  • Integration test with @observe decorator
  • Type stubs if needed

Happy to coordinate with @rohitmannur007 to avoid conflicts. Let me know if the API/approach aligns with the intended design!

@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown

@Ruthwik-Data is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: AgentLoopDetectionMetric — detect infinite loops and cyclical tool-call patterns in agent traces

1 participant