Skip to content

Conversation

@DannylSyph3r
Copy link

@DannylSyph3r DannylSyph3r commented Oct 17, 2025

User description

TraceHound - Observability Enforcement Agent

TraceHound is a multi-language observability guard agent that ensures critical API endpoints are properly instrumented with logging, tracing, error handling, and metrics. It helps teams catch missing observability signals before they become production blind spots.

What It Does

  • Scans backend code across supported languages
  • Validates observability coverage across controllers, services, and data layers
  • Detects missing logging, tracing, error handling, and metrics instrumentation
  • Calculates compliance scores and classifies violations by severity
  • Generates actionable JSON reports with specific recommendations for each gap

Use Case

Production issues are often caused by missing logs or incomplete tracing.
TraceHound helps teams enforce observability standards across APIs, ensuring full visibility into request flows and faster incident detection.
It’s ideal for pre-deployment checks, CI/CD observability gates, and ongoing quality audits.

Files Added

  • agent.toml – Full agent configuration and instruction flow
  • agent.yaml – YAML variant for alternate configuration
  • observability-template.json – Template config json file that helps coordinate the agent
  • README.md – Complete documentation with examples, configuration structure, and integration guides

Testing

Validated across a sample projects to ensure accurate signal detection and deterministic compliance scoring.
Tested using a mix of API routes to ensure dynamic output
The demo video provides a full walkthrough of config validation, and report generation.

Documentation

Comprehensive documentation included in README.md, covering:

  • Observability contract structure
  • Supported patterns per language
  • Severity classification and scoring
  • CI/CD integration examples (GitHub Actions, GitLab CI, Jenkins)

Breaking Changes

None. This is a new agent with no impact on existing functionality.

Demo

🎥 See the walkthrough: https://x.com/Slethware/status/1979287741205745882


PR Type

Enhancement


Description

  • Add TraceHound observability enforcement agent with multi-language support

    • Validates logging, tracing, error handling, and metrics instrumentation
    • Generates compliance scores and actionable violation reports
  • Add Undertaker dead code detection agent with confidence scoring

    • Identifies unused functions, classes, variables, imports, and unreachable code
    • Provides deterministic confidence-based scoring (50-100%)
  • Comprehensive documentation and configuration templates for both agents


Diagram Walkthrough

flowchart LR
  A["Agent Framework"] --> B["TraceHound Agent"]
  A --> C["Undertaker Agent"]
  B --> D["Config Validation"]
  B --> E["Pattern Detection"]
  B --> F["Compliance Report"]
  C --> G["Code Discovery"]
  C --> H["Reference Analysis"]
  C --> I["Confidence Scoring"]
Loading

File Walkthrough

Relevant files
Documentation
README.md
TraceHound agent documentation and user guide                       

agents/tracehound/README.md

  • Comprehensive documentation for TraceHound observability enforcement
    agent
  • Quick start guide with configuration examples and CLI usage
  • Detailed explanation of execution flow across 4 phases
  • Integration examples for GitHub Actions, GitLab CI, and Jenkins
  • Severity levels and output format specifications
+276/-0 
README.md
Undertaker dead code detection agent documentation             

agents/undertaker/README.md

  • Complete documentation for Undertaker dead code detection agent
  • Quick start guide with CLI usage examples and parameter descriptions
  • Detailed analysis process covering 6 steps from discovery to scoring
  • Confidence scoring rules and interpretation guidelines
  • Use cases, tools used, and integration examples
+189/-0 
Configuration changes
agent.toml
TraceHound agent TOML configuration and instructions         

agents/tracehound/agent.toml

  • Complete agent configuration in TOML format with 558 lines
  • Detailed 9-phase execution flow from config validation to completion
  • Language-specific observability patterns for 7 languages (TypeScript,
    Python, Java, Go, Rust, C#)
  • Comprehensive instructions for contract verification and compliance
    scoring
  • JSON output schema with metadata, summary, violations, and
    recommendations
+558/-0 
agent.yaml
TraceHound agent YAML configuration variant                           

agents/tracehound/agent.yaml

  • YAML variant of TraceHound agent configuration
  • Simplified execution flow with 4 main phases
  • Language-specific pattern definitions for observability signal
    detection
  • Contract verification logic and report generation
  • Output schema matching TOML version for consistency
+293/-0 
observability-template.json
TraceHound observability configuration template                   

agents/tracehound/observability-template.json

  • Template configuration file for TraceHound with 118 lines
  • Quick start guide and help text for users
  • Example contracts for order creation and user authentication
  • Template structure for custom endpoint monitoring
  • Metadata and contract structure documentation
+118/-0 
agent.toml
Undertaker agent TOML configuration and instructions         

agents/undertaker/agent.toml

  • Complete agent configuration for Undertaker in TOML format
  • Core mission and analysis process with 6 steps
  • Confidence scoring rules with tier classifications
  • Technical requirements for ripgrep and filesystem operations
  • Output schema with summary statistics and dead code items
+164/-0 
agent.yaml
Undertaker agent YAML configuration variant                           

agents/undertaker/agent.yaml

  • YAML variant of Undertaker agent configuration
  • Analysis process and confidence scoring rules
  • Technical requirements and output specifications
  • Dead code item schema with identifier, type, location, and confidence
  • Error handling and reliability focus principles
+178/-0 

@qodo-free-for-open-source-projects
Copy link
Contributor

qodo-free-for-open-source-projects bot commented Oct 17, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
- [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true -->

</details></td></tr>
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-free-for-open-source-projects
Copy link
Contributor

qodo-free-for-open-source-projects bot commented Oct 17, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Re-evaluate the core analysis approach

The current static analysis method uses simple string matching, which is
unreliable. It should be replaced with an approach that parses code into an
Abstract Syntax Tree (AST) for more accurate results.

Examples:

agents/tracehound/agent.toml [90-112]
CRITICAL INSTRUCTION: Do NOT rely on ripgrep for pattern detection in Phase 4 Step 3. Ripgrep has platform-specific issues with regex patterns. Instead, use the following reliable file-reading approach:

For each layer in the traced code flow:
  
  Substep A: Determine required signals based on layer.layer type:
    If layer is "controller": MUST have logging AND error_handling (CRITICAL severity if missing).
    If layer is "service": MUST have logging, SHOULD have error_handling and tracing (HIGH severity if missing).
    If layer is "database": MUST have logging AND error_handling (MEDIUM severity if missing).
  
  Substep B: Read file content using filesystem_read_files for the layer's filepath.

 ... (clipped 13 lines)
agents/undertaker/agent.toml [17-22]
1. Project Discovery: Scan ALL source files throughout the entire repository, excluding only generated/vendor folders. Do not limit analysis to specific routes, directories, or file patterns. Use the filesystem tool for comprehensive directory traversal starting from the project root and identify all files. Analyze all source directories and subdirectories when available for complete coverage.
2. Definition Detection: Find function/class/variable/import definitions using language-specific patterns. Use ripgrep for efficient pattern matching across all source files in the codebase.
3. Reference Counting: For each identifier, count actual usage across the codebase (excluding its definition). Use ripgrep with precise patterns to count references and exclude false positives.
4. Export Analysis: Detect if code is exported/public, which affects confidence scoring.
5. Unreachable Code: Find code after return/throw/break statements in the same block.
6. Confidence Scoring: Apply deterministic confidence rules based on usage patterns.

Solution Walkthrough:

Before:

// From agents/tracehound/agent.toml
// PHASE 4: CHECK OBSERVABILITY SIGNALS

For each layer in the code flow:
  file_content = filesystem_read_files(layer.filepath)
  
  // LOGGING DETECTION
  is_compliant = false
  logging_patterns = ["logger.", "console.log", ...]
  for pattern in logging_patterns:
    if file_content.contains(pattern):
      is_compliant = true
      break
  if not is_compliant:
    create_violation("logging")

  // ... similar logic for error handling, tracing, etc.

After:

// Hypothetical AST-based approach

For each file in the project:
  ast = parse_file_to_ast(file_path)

  // For TraceHound
  For each contract:
    Find function_node in AST corresponding to the contract's entry point.
    Traverse the call graph starting from function_node.
    Check if nodes representing logging, error handling, etc., are present in the execution path.
    If a required signal is missing, create a violation.

  // For Undertaker
  Find all function/class/variable definitions in the AST.
  For each definition_node:
    Search the entire AST for reference_nodes pointing to it.
    If no references are found, and it's not an exported symbol, mark as dead code.
Suggestion importance[1-10]: 10

__

Why: The suggestion correctly identifies a fundamental architectural flaw in both the TraceHound and Undertaker agents, whose reliance on simple string matching for static analysis will lead to highly unreliable results.

High
Possible issue
Align severity logic with documentation

Update the severity assignment logic in agent.toml to consider both the layer
and signal type, aligning it with the more detailed rules specified in the
README.md.

agents/tracehound/agent.toml [155-158]

-Step 1: Determine severity based on layer type:
-  controller layer: CRITICAL
-  service layer: HIGH
-  database layer: MEDIUM
+Step 1: Determine severity based on the layer and the missing signal type, following the rules from the README:
+  - If signal_type is 'error_handling': severity = "CRITICAL"
+  - If signal_type is 'logging' and layer is 'database': severity = "CRITICAL"
+  - If signal_type is 'logging' and (layer is 'controller' or layer is 'service'): severity = "HIGH"
+  - If signal_type is 'metrics' and layer is 'service': severity = "HIGH"
+  - If signal_type is 'tracing': severity = "MEDIUM"
+  - If signal_type is 'metrics' and layer is 'controller': severity = "MEDIUM"
+  - Otherwise: severity = "LOW"
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a significant contradiction between the severity logic in the agent.toml instructions and the README.md documentation, which would lead to incorrect violation prioritization.

Medium
Correct the required signals logic

Align the required signals logic in PHASE 4 of agent.toml with the README.md and
PHASE 6 by including metrics for controller and service layers to ensure
consistent violation detection.

agents/tracehound/agent.toml [94-97]

-Substep A: Determine required signals based on layer.layer type:
-  If layer is "controller": MUST have logging AND error_handling (CRITICAL severity if missing).
-  If layer is "service": MUST have logging, SHOULD have error_handling and tracing (HIGH severity if missing).
-  If layer is "database": MUST have logging AND error_handling (MEDIUM severity if missing).
+Substep A: Determine required signals based on layer.layer type, aligning with the README and compliance calculation:
+  If layer is "controller": MUST have logging, error_handling. SHOULD have metrics.
+  If layer is "service": MUST have logging, error_handling, metrics. SHOULD have tracing.
+  If layer is "database": MUST have logging, error_handling.
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a critical inconsistency between the required signals defined in PHASE 4, the compliance calculation in PHASE 6, and the README.md, which would cause incorrect violation detection and scoring.

Medium
Improve Rust error handling patterns

Update the Rust error handling patterns in agent.yaml to match the more specific
and accurate patterns defined in agent.toml and README.md, such as match and
Err(.

agents/tracehound/agent.yaml [53-57]

 For rust:
   Logging patterns: log::, tracing::, println!, debug!, info!, warn!, error!
-  Error handling patterns: Result<, ?, unwrap, expect
+  Error handling patterns: match, Err(, Result::, ?, .map_err, unwrap, expect
   Tracing patterns: span!, tracing::instrument, Span
   Metrics patterns: Metrics, gauge!, counter!, histogram!
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that the Rust error handling patterns in agent.yaml are inconsistent with the more accurate patterns in agent.toml and README.md, improving the accuracy of signal detection.

Low
Clarify agent output instructions

Clarify the agent's output instructions in agent.toml to explicitly state the
order of operations: first write the output file, then return the JSON to
standard output before terminating.

agents/undertaker/agent.toml [44-48]

 OUTPUT REQUIREMENTS:
-First, you MUST use the filesystem tool to write the final JSON output to a file named 'dead_code_analysis.json'.
-Second, you MUST return the same valid JSON matching the output_schema to standard output.
-Include summary statistics and detailed findings.
-After writing the file, stop all operations. Do not perform any additional analysis or processing.
+You MUST perform two final actions in this order:
+1. Use the filesystem tool to write the final JSON output to a file named 'dead_code_analysis.json'.
+2. Return the exact same valid JSON matching the output_schema to standard output.
+Ensure both steps are completed before finishing the task. Do not perform any additional analysis after these steps.
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies an ambiguity in the agent's instructions that could lead to incorrect termination, clarifying the sequence of final operations to ensure both file writing and standard output are completed.

Low
  • Update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant