-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Description
Feature Area
Documentation
Is your feature request related to a an existing bug? Please link it here.
NA — This is not a bug report. This is an EU AI Act compliance analysis of CrewAI's codebase using the AIR Blackbox open-source scanner. We've posted similar analyses to Haystack (#10810), LlamaIndex (#20979), and Semantic Kernel (#13657).
Describe the solution you'd like
EU AI Act Compliance Scan — CrewAI Framework Analysis
We ran CrewAI's codebase through AIR Blackbox (air-blackbox comply --scan), an open-source EU AI Act compliance scanner that checks Python code against Articles 9–15.
Result: 15 passing / 11 warnings / 11 failing out of 37 checks.
CrewAI scored the best of any framework we've scanned so far, particularly on Article 14 (Human Oversight) with 6/9 passing.
Standout Findings
| Pattern | Files | Notes |
|---|---|---|
| Input validation (Pydantic) | 384/1,015 (38%) | Highest Pydantic adoption we've seen |
| Fallback/recovery patterns | 107 | Strong recovery architecture |
| Rate limiting / budget controls | 70 | max_rpm, max_execution_time, max_tokens |
| Prompt injection defense | 65 | Dedicated security module |
| Retry/backoff logic | 60 | Robust error recovery |
| Output validation | 52 | output_pydantic enforces structured LLM responses |
| Tracing / observability | 72 | Event bus with typed events across every layer |
| Token expiry / execution bounding | 32 | max_iter, timeouts |
| Human oversight patterns | 31 | allow_delegation, crew delegation controls |
| Agent action audit trail | 15 | Fingerprint-based agent identity |
Notable Architecture Patterns
Security Module (Fingerprint): Dual identifiers per agent (human-readable ID + UUID fingerprint), metadata validation with depth limiting and size caps (10KB max) to prevent DoS. Most security-conscious agent identity system we've scanned.
Built-in Guardrails: hallucination_guardrail.py and llm_guardrail.py integrated into the event bus via llm_guardrail_events. Most frameworks require external tools for this.
A2A Protocol: Full agent communication protocol with AgentCard discovery, authentication (API key + HTTP digest), TLS verification, and extension registry.
Event Bus: Typed events across every layer — agent_events, crew_events, tool_usage_events, llm_events, llm_guardrail_events, flow_events, knowledge_events, mcp_events, a2a_events, system_events.
Flagged Items Worth Reviewing
- LLM call error handling — 77/113 files (68%). Missing in:
security/security_config.py,agents/agent_adapter.py,utilities/internal_instructor.py,rag/chromadb/client.py,rag/core/base_client.py - Unsafe input handling — 17 files flagged for potentially passing raw user input into prompts:
lite_agent.py,a2a/utils/content_type.py,a2a/utils/task.py - Application logging — 100/1,015 files (10%). The event bus handles tracing, but structured
loggingmodule usage is sparse
Questions for Maintainers
- SecurityConfig TODOs — The security module has TODO markers for authentication, scoping rules, and impersonation tokens. What's the roadmap?
- Hallucination guardrail — How does
hallucination_guardrail.pywork? Pattern-based, LLM-judge-based, or something else? - A2A authentication — Is the API key + HTTP digest auth used in production multi-agent deployments, or primarily for CrewAI Enterprise?
- allow_delegation semantics — When an agent delegates, does the delegated agent inherit the original agent's permissions, or operate under its own scope?
- Unsafe input in lite_agent — The 17 flagged files include core agent paths. Can you confirm these handle user input safely before it reaches the LLM?
How to Reproduce
pip install air-blackbox
git clone https://github.com/crewAIInc/crewAI.git
air-blackbox comply --scan ./crewAI -vScanner is Apache 2.0, runs entirely local, no data leaves your machine. Full PDF report available in the AIR Blackbox gateway repo.
Any corrections or context from maintainers will be used to improve the scanner — we've already updated patterns based on feedback from Haystack and LlamaIndex teams.
Describe alternatives you've considered
No response
Additional context
Full PDF report with per-article breakdowns is available at: https://github.com/airblackbox/gateway/blob/main/docs/AIR_Blackbox_CrewAI_Report_v1.pdf
Scanner: air-blackbox on PyPI (Apache 2.0, runs locally)
Previous framework scans and maintainer responses:
- Haystack — Julian Risch responded with detailed corrections (docstrings count higher than detected, Haystack uses its own pipeline error handling)
- LlamaIndex — Logan Markewich responded within 3 minutes confirming AgentMesh is experimental, callback_manager is deprecated, packs are being deleted
- Semantic Kernel — Issue posted, awaiting response
Each maintainer response directly improves scanner accuracy. We'd love the same from CrewAI's team.
Willingness to Contribute
I could provide more detailed specifications