Skip to content

MCP Tool Responses Do Not Match API Documentation #72

Description

@nikopuf

Summary

The Security Analysis and Compliance & Reporting MCP tools return minimal raw Wazuh data instead of the rich, structured responses described in the API documentation. Every tool tested returns a thin wrapper over the Wazuh API with no enrichment, scoring, correlation, or framework-specific mapping.

Environment

  • Wazuh Version: 4.x (2 agents: xxx, xxxx)
  • MCP Server: http://127.0.0.1:3000
  • Date Tested: 2026-03-31

Tools Tested

1. get_top_security_threats

Called with: { "limit": 5, "time_range": "24h" }

Expected (per docs): Ranked threats with threat_score, threat_name, severity, affected_systems, indicators (source IPs, target ports, attack patterns), impact_assessment (CIA triad), timeline (first detected, peak, status), mitigation_status, and ranking_criteria.

Actual response:

{
  "data": {
    "time_range": "24h",
    "threats": [
      {
        "rule_id": "100200",
        "description": "File modified in /root directory.",
        "level": 7,
        "count": 533,
        "groups": ["syscheck"]
      }
    ],
    "total_unique_rules": 10
  }
}

Missing: threat_score, threat_name, affected_systems, indicators, impact_assessment, timeline, mitigation_status, ranking_criteria, ranking_timestamp.


2. perform_risk_assessment

Called with: {} (environment-wide)

Expected (per docs): overall_risk_score, risk_categories (vulnerability, threat exposure, configuration, compliance with individual scores), critical_findings, risk_trends, mitigation_priorities, and executive_summary.

Actual response:

{
  "data": {
    "total_agents": 2,
    "risk_factors": [],
    "risk_level": "low"
  }
}

Missing: overall_risk_score, risk_categories, critical_findings, risk_trends, mitigation_priorities, executive_summary, confidence.


3. run_compliance_check

Called with: { "framework": "PCI-DSS" } and { "framework": "NIST" }

Expected (per docs): overall_compliance (score, status, requirements met/total), requirement_categories with per-category scores, detailed_findings with per-requirement status/severity/remediation, risk_assessment, remediation_roadmap, and compliance_trends.

Actual response (identical for both PCI-DSS and NIST):

{
  "data": {
    "framework": "PCI-DSS",
    "agents_checked": 2,
    "results": [
      {
        "agent_id": "000",
        "agent_name": "xxxx",
        "sca": {
          "pass": 51,
          "fail": 45,
          "invalid": 87,
          "total_checks": 183,
          "score": 53,
          "policy_id": "cis_amazon_linux_2023",
          "name": "CIS Benchmark for Amazon Linux 2023 Benchmark v1.0.0."
        }
      },
      {
        "agent_id": "002",
        "agent_name": "xxxx",
        "sca": {
          "pass": 118,
          "fail": 119,
          "invalid": 42,
          "total_checks": 279,
          "score": 49,
          "policy_id": "cis_ubuntu24-04",
          "name": "CIS Ubuntu Linux 24.04 LTS Benchmark v1.0.0."
        }
      }
    ]
  }
}

Issues:

  • PCI-DSS and NIST return identical data — no framework-specific filtering or mapping
  • Returns raw SCA (CIS Benchmark) results, not compliance requirement assessments
  • No per-requirement breakdown, no detailed findings, no remediation roadmap

Additionally: ISO27001 and FISMA are listed in the docs but return:

Invalid parameter 'framework': invalid value 'ISO27001'. Use one of: GDPR, HIPAA, NIST, PCI-DSS, SOX

4. generate_security_report

Expected (per docs): Full report with executive_summary, threat_landscape_analysis, vulnerability_management, compliance_status, security_metrics, risk_assessment, financial_analysis, recommendations. Seven report types documented: daily, weekly, monthly, quarterly, incident, compliance, executive.

4a. daily report

Called with: { "report_type": "daily", "include_recommendations": true }

Actual response:

{
  "data": {
    "report_type": "daily",
    "generated_at": "2026-03-31T05:36:33.624128+00:00",
    "sections": {
      "agents": {
        "total": 2,
        "active": 2,
        "disconnected": 0
      },
      "manager": {
        "title": "Wazuh API REST",
        "api_version": "4.14.1",
        "revision": "rc2",
        "hostname": "xxx"
      },
      "vulnerabilities": {
        "total_vulnerabilities": 0,
        "affected_agents": 0,
        "by_severity": {},
        "critical": 0,
        "high": 0,
        "medium": 0,
        "low": 0
      }
    }
  }
}

Missing: executive_summary, threat_landscape_analysis, security_metrics (MTTD, MTTR), incident_summary, recommendations, risk_assessment. The include_recommendations: true parameter has no effect — no recommendations are returned.

4b. incident report

Called with: { "report_type": "incident", "include_recommendations": true }

Actual response: Identical to daily — same agent count, manager info, and vulnerability summary. No incident-specific data (incident timeline, affected systems, root cause analysis, lessons learned, containment actions).

4c. compliance report type — NOT SUPPORTED

Called with: { "report_type": "compliance" }

Response:

Invalid parameter 'report_type': invalid value 'compliance'. Use one of: daily, incident, monthly, weekly

4d. executive report type — NOT SUPPORTED

Called with: { "report_type": "executive" }

Response:

Invalid parameter 'report_type': invalid value 'executive'. Use one of: daily, incident, monthly, weekly

4e. quarterly report type — NOT SUPPORTED (documented but not listed in allowed values)

Report type support summary:

Report Type Documented Supported Returns Unique Data
daily Yes Yes No — generic agent/vuln summary
weekly Yes Yes Not tested, likely same
monthly Yes Yes Not tested, likely same
quarterly Yes No N/A
incident Yes Yes No — identical to daily
compliance Yes No N/A
executive Yes No N/A

Root Cause

The MCP tools are thin wrappers over the Wazuh Manager API:

  • get_top_security_threats → single terms aggregation on rule.id sorted by count
  • perform_risk_assessment → agent count + basic threshold check
  • run_compliance_check → SCA scan results from /sca/{agent_id} endpoint, same data regardless of framework
  • generate_security_report → agent summary + manager info + vulnerability counts, identical across all supported report types

No enrichment, correlation, scoring, or framework mapping is performed.

Gap Summary

Capability Documented Implemented
Security Analysis
Threat scoring (0-100) Yes No
Source IP / indicator extraction Yes No
Impact assessment (CIA triad) Yes No
Attack timeline construction Yes No
Mitigation status tracking Yes No
Risk score calculation Yes No — returns only "low"/"medium"/"high"
Risk category breakdown Yes No
Critical findings with remediation Yes No
Compliance
Framework-specific requirement mapping Yes No — all frameworks return same SCA data
Per-requirement compliance status Yes No
Remediation roadmap Yes No
Compliance trends / history Yes No
ISO27001 / FISMA support Yes No — returns error
Reporting
Executive summaries Yes No
Financial impact analysis Yes No
compliance report type Yes No — returns "invalid value"
executive report type Yes No — returns "invalid value"
quarterly report type Yes No — not in allowed values
Incident-specific report content Yes No — identical to daily report
Report type differentiation Yes No — daily/incident return same data
include_recommendations parameter Yes No effect — no recommendations returned
Threat landscape analysis in reports Yes No
Security metrics (MTTD, MTTR) Yes No
Team performance metrics Yes No

Suggested Implementation Path

Option A: Direct Elasticsearch Enrichment (No LLM)

Build enrichment in the MCP tool layer by running additional Elasticsearch queries:

  1. get_top_security_threats: After getting top rule IDs, run sub-queries to extract src.ip, agent.id, @timestamp ranges per rule. Calculate threat score from level * log(count). Build timeline from min/max timestamps.

  2. perform_risk_assessment: Combine SCA scores + alert severity distribution + vulnerability counts + failed auth rates into a weighted risk score. Break down by category.

  3. run_compliance_check: Map CIS benchmark check IDs to framework requirements (PCI-DSS requirement numbers, NIST functions). Return per-requirement pass/fail. This requires a static mapping table.

Option B: LLM Enrichment Layer

Pass raw data through an LLM to generate:

  • Executive summaries and recommendations
  • Threat scoring with reasoning
  • Impact assessments
  • Remediation roadmaps

Adds latency (~2-5s) and cost per call, but produces the richest output.

Option C: Hybrid

  • Factual data (IPs, counts, timelines, scores): Direct ES queries
  • Analysis (summaries, recommendations, impact narrative): LLM

Reproduction

# Get auth token
TOKEN=$(curl -s -X POST http://127.0.0.1:3000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"api_key":"YOUR_API_KEY"}' | python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])")

# Test any tool
curl -s -X POST http://127.0.0.1:3000/mcp \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "MCP-Protocol-Version: 2024-11-05" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "get_top_security_threats",
      "arguments": { "limit": 5, "time_range": "24h" }
    }
  }'

Thanks
@nikopuf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions