Agent Audit (Argus) — Project Context for Claude Code

Overview

Agent Audit is a security scanner for AI agent code, MCP configurations, and DeFi contracts. It detects agent-specific vulnerabilities that traditional SAST tools miss, mapped to the OWASP Agentic Top 10 (2026) with 10/10 coverage.

Version: 0.18.2
Python: 3.9-3.12
License: MIT
Entry point: agent-audit = "agent_audit.cli.main:cli"
Metrics: 94.6% recall, 87.5% precision, F1=0.91, 1239+ tests

Repository Structure

agent-security-suite/
├── packages/audit/                    # Main Python package
│   ├── agent_audit/
│   │   ├── __init__.py
│   │   ├── version.py                # __version__ = "0.18.2"
│   │   ├── analysis/                 # 21 analyzer modules (confidence scoring, taint, context)
│   │   │   ├── semantic_analyzer.py  # 3-stage credential detection (regex → value → context)
│   │   │   ├── taint_tracker.py      # Data flow: source → sanitizer → sink
│   │   │   ├── tool_boundary_detector.py  # @tool entry point gate for AGENT-034
│   │   │   ├── ts_tool_boundary_detector.py
│   │   │   ├── confidence_matrix.py  # Tier computation and adjustment rules
│   │   │   ├── context_classifier.py # File type detection (test/example/infra)
│   │   │   ├── dangerous_operation_analyzer.py
│   │   │   ├── framework_detector.py # Pydantic/LangChain/CrewAI internal suppression
│   │   │   ├── placeholder_detector.py
│   │   │   ├── value_analyzer.py
│   │   │   ├── identifier_analyzer.py
│   │   │   ├── tool_description_analyzer.py
│   │   │   ├── memory_method_detector.py
│   │   │   ├── env_tracer.py
│   │   │   ├── entropy.py
│   │   │   └── rule_context_config.py
│   │   ├── analyzers/
│   │   │   └── memory_context.py
│   │   ├── scanners/                 # 14 scanner modules
│   │   │   ├── base.py              # BaseScanner ABC
│   │   │   ├── python_scanner.py    # 4017 LOC — AST-based Python analysis
│   │   │   ├── typescript_scanner.py # 952 LOC — TS/JS via tree-sitter
│   │   │   ├── go_scanner.py        # 296 LOC — Go patterns
│   │   │   ├── solidity_scanner.py  # 411 LOC — Solidity contracts
│   │   │   ├── secret_scanner.py    # 855 LOC — Regex + semantic secret detection
│   │   │   ├── config_scanner.py    # 390 LOC — YAML/JSON/ENV files
│   │   │   ├── mcp_config_scanner.py # 1369 LOC — MCP config auditing
│   │   │   ├── mcp_baseline.py      # 530 LOC — MCP baseline drift (rug pull)
│   │   │   ├── mcp_inspector.py     # 607 LOC — Live MCP server introspection
│   │   │   ├── privilege_scanner.py # 1138 LOC — OS privilege escalation
│   │   │   ├── skill_body_scanner.py # 402 LOC — OpenClaw skill instructions
│   │   │   ├── skill_meta_scanner.py # 352 LOC — OpenClaw skill metadata
│   │   │   └── __init__.py
│   │   ├── rules/
│   │   │   ├── engine.py            # RULE_CWE_MAPPING (109 rules), PATTERN_TYPE_TO_RULE_MAP (55+ patterns)
│   │   │   ├── loader.py            # YAML rule loader
│   │   │   └── builtin/             # YAML rule definitions (mirrored from monorepo)
│   │   │       ├── owasp_agentic_v2.yaml
│   │   │       ├── owasp_agentic.yaml
│   │   │       ├── asi_coverage_v030.yaml
│   │   │       ├── mcp_security_v030.yaml
│   │   │       └── langchain_security_v030.yaml
│   │   ├── models/
│   │   │   ├── finding.py           # Finding dataclass, confidence_to_tier(), TIER_THRESHOLDS
│   │   │   ├── risk.py              # Severity/Category enums, Location, RiskScore
│   │   │   ├── suppression.py       # Suppression/ignore model
│   │   │   └── tool.py              # ToolDefinition, PermissionType
│   │   ├── cli/
│   │   │   ├── main.py              # @click group
│   │   │   ├── commands/
│   │   │   │   ├── scan.py          # Main scan orchestration
│   │   │   │   ├── inspect.py       # Live MCP inspection
│   │   │   │   └── init.py          # Config init
│   │   │   └── formatters/
│   │   │       ├── terminal.py      # Rich terminal output + calculate_risk_score()
│   │   │       ├── json.py          # JSON output
│   │   │       └── sarif.py         # SARIF v2.1.0 output
│   │   ├── profiles/
│   │   │   └── defi/                # DeFi profile (AGENT-090 to AGENT-109)
│   │   │       ├── rules.py
│   │   │       ├── scanners/        # solidity, js_ts, defi_secret, agent_payment
│   │   │       ├── analysis/        # llm_analyzer, rpc_analyzer, defi_taint
│   │   │       └── constants/       # defi_protocols, web3_apis, rpc_endpoints
│   │   ├── config/
│   │   │   └── ignore.py            # .agent-audit.yaml loader
│   │   ├── utils/
│   │   │   ├── mcp_client.py
│   │   │   └── compat.py
│   │   └── parsers/
│   │       └── treesitter_parser.py
│   ├── tests/                        # 1239+ tests
│   │   ├── test_agent004_semantic.py
│   │   ├── test_expanded_rules.py
│   │   ├── test_privilege_rules.py
│   │   ├── test_defi_profile.py
│   │   ├── test_go_scanner.py
│   │   ├── test_analysis/            # 17 analyzer test modules
│   │   ├── test_cli/                 # 5 CLI test modules
│   │   ├── test_formatters/          # 4 formatter test modules
│   │   ├── test_config/              # 3 config test modules
│   │   ├── fixtures/                 # Test fixture code
│   │   ├── benchmark/                # Layer 2 benchmark (91 projects)
│   │   ├── ground_truth/             # AVB oracle baseline
│   │   └── e2e/                      # End-to-end tests
│   └── pyproject.toml                # Poetry config, version = "0.18.2"
├── rules/builtin/                    # Monorepo YAML rules (sync to packages/audit before publish)
│   ├── owasp_agentic_v2.yaml
│   ├── owasp_agentic.yaml
│   ├── asi_coverage_v030.yaml
│   ├── mcp_security_v030.yaml
│   └── langchain_security_v030.yaml
├── docs/
│   ├── RULES.md                      # Rule documentation
│   └── reports/                      # Published scan reports
├── CHANGELOG.md
├── README.md / README_CN.md
└── CONTRIBUTING.md

Rule System

Rule ID Ranges

Range	Category	Count
AGENT-001 to AGENT-005	Core injection/privilege/SSRF/credentials/supply chain	5
AGENT-010 to AGENT-011	Prompt injection / goal hijack	2
AGENT-013 to AGENT-020	Identity, supply chain, RCE, memory, inter-agent	8
AGENT-021 to AGENT-025	Cascading failures, trust exploitation	5
AGENT-026 to AGENT-028	LangChain-specific (SSRF, XSS, resource)	3
AGENT-029 to AGENT-033	MCP configuration security	5
AGENT-034 to AGENT-042	Tool misuse, expanded detection	9
AGENT-043 to AGENT-047	OS privilege escalation	5
AGENT-049 to AGENT-053	Supply chain, logging, rogue agents	5
AGENT-054 to AGENT-057	MCP enhanced (drift, shadowing, poisoning)	4
AGENT-058 to AGENT-064	OpenClaw skill security	7
AGENT-083 to AGENT-085	Solidity/Go specific	3
AGENT-090 to AGENT-109	DeFi profile (28 rules)	20
AGENT-110 to AGENT-119	Agent architecture security (v0.19.0)	10

Adding a New Rule (Checklist)

YAML definition: Create/update rules/builtin/<category>_v0XX.yaml
- Required fields: id, title, description, severity, category, owasp_agentic_id, cwe_id, detection (type + patterns), remediation (description + code_example)
Mirror YAML: Copy to packages/audit/agent_audit/rules/builtin/
engine.py: Add entry to RULE_CWE_MAPPING dict and PATTERN_TYPE_TO_RULE_MAP dict
Scanner: Add detection logic to appropriate scanner (python_scanner.py, typescript_scanner.py, config_scanner.py, etc.) — or create new scanner inheriting from BaseScanner
Tests: Create fixture files in tests/fixtures/<category>/ and test functions in tests/test_<category>.py
Docs: Update docs/RULES.md, CHANGELOG.md, README.md/README_CN.md

Confidence Tiers

Tier	Threshold	Meaning
BLOCK	>= 0.92	High confidence — fail CI
WARN	>= 0.60	Medium — warn user
INFO	>= 0.30	Low — informational
SUPPRESSED	< 0.30	Very low — hidden by default

Risk Score Formula

raw = sum(confidence * severity_weight * context_weight) for BLOCK+WARN findings
base_score = 1.8 * ln(1 + raw)
block_bonus = min(2.0, block_count * 0.3)
score = min(9.8, base_score + block_bonus)

Severity weights: critical=3.0, high=1.5, medium=0.5, low=0.2, info=0.1

Scanners

Scanner	LOC	Method	Detects
python_scanner.py	4017	AST + taint	AGENT-001/010/017/018/026/034/041 + more
mcp_config_scanner.py	1369	Config parse	AGENT-029-033, AGENT-054-057
privilege_scanner.py	1138	Regex + config	AGENT-043-047
typescript_scanner.py	952	Tree-sitter	AGENT-010/026/034/041/049
secret_scanner.py	855	Regex + semantic	AGENT-004
mcp_inspector.py	607	Live introspection	MCP server audit
mcp_baseline.py	530	Drift detection	AGENT-054/055
solidity_scanner.py	411	Regex	AGENT-083/084
skill_body_scanner.py	402	Regex	AGENT-058/059/061/062
config_scanner.py	390	Config parse	YAML/JSON/ENV patterns
skill_meta_scanner.py	352	YAML parse	AGENT-063/064
go_scanner.py	296	Regex	AGENT-034/041/085

Build & Test Commands

cd packages/audit
poetry install                          # Install dependencies
poetry run pytest tests/ -x -q          # Run tests (stop on first failure)
poetry run pytest tests/ -v --tb=short  # Verbose with short traceback
poetry run agent-audit scan <path>      # Run scanner
poetry run python ../../tests/benchmark/run_benchmark.py  # Layer 2 benchmark

Critical Implementation Notes

THREE pattern types map to AGENT-034: tool_no_input_validation, eval_exec_expanded, subprocess_expanded — all must be handled when modifying AGENT-034 behavior
Tool boundary gate: is_tool_entry_point() in tool_boundary_detector.py — AGENT-034 only fires within @tool functions
YAML rule sync: Rules in rules/builtin/ must be copied to packages/audit/agent_audit/rules/builtin/ before publishing to PyPI
Version tracked in 3 files: pyproject.toml, version.py, __init__.py
Framework path suppression must apply to ALL code paths producing a rule, not just one
AVB oracle uses FILE + LINE matching (5-line tolerance), no rule_id check

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Audit (Argus) — Project Context for Claude Code

Overview

Repository Structure

Rule System

Rule ID Ranges

Adding a New Rule (Checklist)

Confidence Tiers

Risk Score Formula

Scanners

Build & Test Commands

Critical Implementation Notes

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

Agent Audit (Argus) — Project Context for Claude Code

Overview

Repository Structure

Rule System

Rule ID Ranges

Adding a New Rule (Checklist)

Confidence Tiers

Risk Score Formula

Scanners

Build & Test Commands

Critical Implementation Notes