Skip to content

[FEATURE] EU AI Act Compliance Analysis — CrewAI Scores Best of Any Framework Scanned #4845

@shotwellj

Description

@shotwellj

Feature Area

Documentation

Is your feature request related to a an existing bug? Please link it here.

NA — This is not a bug report. This is an EU AI Act compliance analysis of CrewAI's codebase using the AIR Blackbox open-source scanner. We've posted similar analyses to Haystack (#10810), LlamaIndex (#20979), and Semantic Kernel (#13657).

Describe the solution you'd like

EU AI Act Compliance Scan — CrewAI Framework Analysis

We ran CrewAI's codebase through AIR Blackbox (air-blackbox comply --scan), an open-source EU AI Act compliance scanner that checks Python code against Articles 9–15.

Result: 15 passing / 11 warnings / 11 failing out of 37 checks.

CrewAI scored the best of any framework we've scanned so far, particularly on Article 14 (Human Oversight) with 6/9 passing.

Standout Findings

Pattern Files Notes
Input validation (Pydantic) 384/1,015 (38%) Highest Pydantic adoption we've seen
Fallback/recovery patterns 107 Strong recovery architecture
Rate limiting / budget controls 70 max_rpm, max_execution_time, max_tokens
Prompt injection defense 65 Dedicated security module
Retry/backoff logic 60 Robust error recovery
Output validation 52 output_pydantic enforces structured LLM responses
Tracing / observability 72 Event bus with typed events across every layer
Token expiry / execution bounding 32 max_iter, timeouts
Human oversight patterns 31 allow_delegation, crew delegation controls
Agent action audit trail 15 Fingerprint-based agent identity

Notable Architecture Patterns

Security Module (Fingerprint): Dual identifiers per agent (human-readable ID + UUID fingerprint), metadata validation with depth limiting and size caps (10KB max) to prevent DoS. Most security-conscious agent identity system we've scanned.

Built-in Guardrails: hallucination_guardrail.py and llm_guardrail.py integrated into the event bus via llm_guardrail_events. Most frameworks require external tools for this.

A2A Protocol: Full agent communication protocol with AgentCard discovery, authentication (API key + HTTP digest), TLS verification, and extension registry.

Event Bus: Typed events across every layer — agent_events, crew_events, tool_usage_events, llm_events, llm_guardrail_events, flow_events, knowledge_events, mcp_events, a2a_events, system_events.

Flagged Items Worth Reviewing

  1. LLM call error handling — 77/113 files (68%). Missing in: security/security_config.py, agents/agent_adapter.py, utilities/internal_instructor.py, rag/chromadb/client.py, rag/core/base_client.py
  2. Unsafe input handling — 17 files flagged for potentially passing raw user input into prompts: lite_agent.py, a2a/utils/content_type.py, a2a/utils/task.py
  3. Application logging — 100/1,015 files (10%). The event bus handles tracing, but structured logging module usage is sparse

Questions for Maintainers

  1. SecurityConfig TODOs — The security module has TODO markers for authentication, scoping rules, and impersonation tokens. What's the roadmap?
  2. Hallucination guardrail — How does hallucination_guardrail.py work? Pattern-based, LLM-judge-based, or something else?
  3. A2A authentication — Is the API key + HTTP digest auth used in production multi-agent deployments, or primarily for CrewAI Enterprise?
  4. allow_delegation semantics — When an agent delegates, does the delegated agent inherit the original agent's permissions, or operate under its own scope?
  5. Unsafe input in lite_agent — The 17 flagged files include core agent paths. Can you confirm these handle user input safely before it reaches the LLM?

How to Reproduce

pip install air-blackbox
git clone https://github.com/crewAIInc/crewAI.git
air-blackbox comply --scan ./crewAI -v

Scanner is Apache 2.0, runs entirely local, no data leaves your machine. Full PDF report available in the AIR Blackbox gateway repo.

Any corrections or context from maintainers will be used to improve the scanner — we've already updated patterns based on feedback from Haystack and LlamaIndex teams.

Describe alternatives you've considered

No response

Additional context

Full PDF report with per-article breakdowns is available at: https://github.com/airblackbox/gateway/blob/main/docs/AIR_Blackbox_CrewAI_Report_v1.pdf

Scanner: air-blackbox on PyPI (Apache 2.0, runs locally)

Previous framework scans and maintainer responses:

  • Haystack — Julian Risch responded with detailed corrections (docstrings count higher than detected, Haystack uses its own pipeline error handling)
  • LlamaIndex — Logan Markewich responded within 3 minutes confirming AgentMesh is experimental, callback_manager is deprecated, packs are being deleted
  • Semantic Kernel — Issue posted, awaiting response

Each maintainer response directly improves scanner accuracy. We'd love the same from CrewAI's team.

Willingness to Contribute

I could provide more detailed specifications

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions