Expand cloud scanner to 52 Terraform + 25 CloudFormation resource types

Zie619 · claude · Zie619 · commit 4d23ebf4e9ed · 2026-02-09T01:02:17.000+02:00
- Add 38 new Terraform resources: Bedrock (guardrails, flows, prompts),
  SageMaker (pipelines, notebooks, domains), Comprehend, Kendra, Lex,
  Rekognition, Azure OpenAI/AI Foundry/ML, Google Vertex AI (reasoning
  engine, datasets, feature stores), Dialogflow CX, Discovery Engine
- Add 20 new CloudFormation resources matching Terraform coverage
- Handle workflow ComponentType → orchestration UsageType
- Extract kind, display_name, description from Terraform metadata
- Add CloudFormation fallback names: AgentName, FlowName, GuardrailName,
  PipelineName
- Add Scan Levels documentation section to README
- Update demo data with Azure OpenAI, Vertex AI, and Bedrock guardrail
- Add CloudFormation test fixture and 11 new test methods (135 total)

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -9,15 +9,16 @@
     <a href="#demo">Demo</a> &nbsp;|&nbsp;
     <a href="#output-formats">Output Formats</a> &nbsp;|&nbsp;
     <a href="#n8n-workflow-scanning-first-of-its-kind">n8n Scanning</a> &nbsp;|&nbsp;
-    <a href="#risk-scoring">Risk Scoring</a>
+    <a href="#risk-scoring">Risk Scoring</a> &nbsp;|&nbsp;
+    <a href="#scan-levels">Scan Levels</a>
   </p>
 
   <!-- badges -->
   <p>
     <img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="License" />
     <img src="https://img.shields.io/badge/python-3.10%2B-blue.svg" alt="Python" />
     <img src="https://img.shields.io/badge/CycloneDX-1.6-green.svg" alt="CycloneDX" />
-    <img src="https://img.shields.io/badge/tests-124%20passing-brightgreen.svg" alt="Tests" />
+    <img src="https://img.shields.io/badge/tests-135%20passing-brightgreen.svg" alt="Tests" />
     <img src="https://img.shields.io/badge/PRs-welcome-orange.svg" alt="PRs Welcome" />
   </p>
 </div>
@@ -68,7 +69,7 @@ ai-bom scan . --format cyclonedx --output ai-bom.json
 | Model References | gpt-4o, claude-3-5-sonnet, gemini-1.5-pro, llama-3 | Code |
 | API Keys | OpenAI (sk-\*), Anthropic (sk-ant-\*), HuggingFace (hf\_\*) | Code, Network |
 | AI Containers | Ollama, vLLM, HuggingFace, NVIDIA, ChromaDB | Docker |
-| Cloud AI | AWS Bedrock, SageMaker, Vertex AI, Azure Cognitive | Cloud |
+| Cloud AI | AWS Bedrock, SageMaker, Comprehend, Kendra, Lex \| Azure OpenAI, AI Foundry, ML \| Google Vertex AI, Dialogflow CX | Cloud |
 | AI Endpoints | api.openai.com, api.anthropic.com, localhost:11434 | Network |
 | n8n AI Nodes | AI Agents, LLM Chat, MCP Client, Tools, Embeddings | n8n |
 | MCP Servers | Model Context Protocol connections | Code, n8n |
@@ -199,6 +200,33 @@ Every component receives a risk score (0–100):
 | Deprecated model | +10 | Using deprecated AI model |
 | Unpinned model | +5 | Model version not pinned |
 
+## Scan Levels
+
+ai-bom's detection depth depends on the permissions available at scan time. Each level progressively reveals more shadow AI:
+
+| Level | Access Required | What It Finds | Scanner |
+|-------|----------------|---------------|---------|
+| **Level 1 — File System** | Read-only file access | Source code imports, dependency files, config files, IaC definitions, n8n workflow JSON | Code, Cloud, n8n |
+| **Level 2 — Docker** | + Docker socket access | Running AI containers, GPU allocations, AI model images | Docker |
+| **Level 3 — Network** | + Network/env file access | API endpoints, hardcoded API keys, .env configurations | Network |
+| **Level 4 — Cloud IAM** | + Cloud provider credentials | Managed AI services (Bedrock, SageMaker, Vertex AI, Azure OpenAI) provisioned at infrastructure level | Cloud |
+
+### What each level requires
+
+**Level 1 (default)** — Works out of the box. Just point ai-bom at a directory or Git URL:
+```bash
+ai-bom scan .
+ai-bom scan https://github.com/org/repo.git
+```
+
+**Level 2** — Requires access to Docker socket or compose files in the scan path. No additional configuration needed if Dockerfiles/compose files are in the repo.
+
+**Level 3** — Scans `.env`, `.env.local`, `.env.production`, and config files (`.yaml`, `.json`, `.toml`, `.ini`). Detects both endpoint URLs and hardcoded API keys. For maximum coverage, ensure environment files are accessible (they're often gitignored).
+
+**Level 4** — Scans Terraform (`.tf`) and CloudFormation (`.yaml`, `.json`) files for cloud-provisioned AI services. Covers 60+ AWS, Azure, and GCP resource types. For live cloud inventory (not yet available), would require IAM read permissions.
+
+> **Tip:** For CI/CD pipelines, Level 1-3 are automatic. Level 4 requires IaC files in the repo (Terraform/CloudFormation). A future release will add live cloud API scanning with IAM credentials.
+
 ## Comparison
 
 How does ai-bom compare to existing supply chain tools?
@@ -258,7 +286,7 @@ git clone https://github.com/trusera/ai-bom.git
 cd ai-bom
 pip install -e ".[dev]"
 
-# Run tests (124 passing)
+# Run tests (135 passing)
 pytest tests/ -v
 
 # Run demo
diff --git a/examples/demo-project/infra/main.tf b/examples/demo-project/infra/main.tf
@@ -32,3 +32,48 @@ resource "aws_sagemaker_model" "llm_model" {
     model_data_url = "s3://my-bucket/models/fine-tuned-model.tar.gz"
   }
 }
+
+# Azure OpenAI Deployment
+resource "azurerm_cognitive_deployment" "gpt4o" {
+  name                 = "gpt-4o-global"
+  cognitive_account_id = azurerm_cognitive_account.openai.id
+  model_name           = "gpt-4o"
+
+  model {
+    format  = "OpenAI"
+    name    = "gpt-4o"
+    version = "2024-05-13"
+  }
+
+  sku {
+    name     = "GlobalStandard"
+    capacity = 50
+  }
+}
+
+# Google Vertex AI Reasoning Engine
+resource "google_vertex_ai_reasoning_engine" "support_agent" {
+  display_name = "ai-support-agent"
+  description  = "Customer support reasoning engine powered by Gemini"
+  project      = "my-gcp-project"
+  location     = "us-central1"
+}
+
+# AWS Bedrock Guardrail
+resource "aws_bedrock_guardrail" "content_safety" {
+  name        = "production-content-filter"
+  description = "Block harmful and sensitive content in AI responses"
+
+  content_policy_config {
+    filters_config {
+      type            = "HATE"
+      input_strength  = "HIGH"
+      output_strength = "HIGH"
+    }
+    filters_config {
+      type            = "VIOLENCE"
+      input_strength  = "HIGH"
+      output_strength = "HIGH"
+    }
+  }
+}
diff --git a/src/ai_bom/demo_data/infra/main.tf b/src/ai_bom/demo_data/infra/main.tf
@@ -32,3 +32,48 @@ resource "aws_sagemaker_model" "llm_model" {
     model_data_url = "s3://my-bucket/models/fine-tuned-model.tar.gz"
   }
 }
+
+# Azure OpenAI Deployment
+resource "azurerm_cognitive_deployment" "gpt4o" {
+  name                 = "gpt-4o-global"
+  cognitive_account_id = azurerm_cognitive_account.openai.id
+  model_name           = "gpt-4o"
+
+  model {
+    format  = "OpenAI"
+    name    = "gpt-4o"
+    version = "2024-05-13"
+  }
+
+  sku {
+    name     = "GlobalStandard"
+    capacity = 50
+  }
+}
+
+# Google Vertex AI Reasoning Engine
+resource "google_vertex_ai_reasoning_engine" "support_agent" {
+  display_name = "ai-support-agent"
+  description  = "Customer support reasoning engine powered by Gemini"
+  project      = "my-gcp-project"
+  location     = "us-central1"
+}
+
+# AWS Bedrock Guardrail
+resource "aws_bedrock_guardrail" "content_safety" {
+  name        = "production-content-filter"
+  description = "Block harmful and sensitive content in AI responses"
+
+  content_policy_config {
+    filters_config {
+      type            = "HATE"
+      input_strength  = "HIGH"
+      output_strength = "HIGH"
+    }
+    filters_config {
+      type            = "VIOLENCE"
+      input_strength  = "HIGH"
+      output_strength = "HIGH"
+    }
+  }
+}
diff --git a/src/ai_bom/scanners/cloud_scanner.py b/src/ai_bom/scanners/cloud_scanner.py
@@ -28,38 +28,108 @@ class CloudScanner(BaseScanner):
 
     # Terraform resource type to (provider, component_type) mapping
     TERRAFORM_AI_RESOURCES = {
+        # --- AWS Bedrock ---
         "aws_bedrockagent_agent": ("AWS Bedrock", ComponentType.agent_framework),
         "aws_bedrockagent_knowledge_base": ("AWS Bedrock", ComponentType.tool),
+        "aws_bedrock_custom_model": ("AWS Bedrock", ComponentType.model),
+        "aws_bedrock_provisioned_model_throughput": ("AWS Bedrock", ComponentType.endpoint),
+        "aws_bedrock_guardrail": ("AWS Bedrock", ComponentType.tool),
+        "aws_bedrock_model_invocation_logging_configuration": ("AWS Bedrock", ComponentType.tool),
+        "aws_bedrockagent_agent_action_group": ("AWS Bedrock", ComponentType.tool),
+        "aws_bedrockagent_agent_alias": ("AWS Bedrock", ComponentType.agent_framework),
+        "aws_bedrockagent_data_source": ("AWS Bedrock", ComponentType.tool),
+        "aws_bedrockagent_flow": ("AWS Bedrock", ComponentType.workflow),
+        "aws_bedrockagent_prompt": ("AWS Bedrock", ComponentType.tool),
+        # --- AWS SageMaker ---
         "aws_sagemaker_endpoint": ("AWS SageMaker", ComponentType.endpoint),
         "aws_sagemaker_model": ("AWS SageMaker", ComponentType.model),
-        "aws_sagemaker_endpoint_configuration": (
-            "AWS SageMaker",
-            ComponentType.endpoint,
-        ),
+        "aws_sagemaker_endpoint_configuration": ("AWS SageMaker", ComponentType.endpoint),
+        "aws_sagemaker_notebook_instance": ("AWS SageMaker", ComponentType.tool),
+        "aws_sagemaker_domain": ("AWS SageMaker", ComponentType.container),
+        "aws_sagemaker_pipeline": ("AWS SageMaker", ComponentType.workflow),
+        "aws_sagemaker_feature_group": ("AWS SageMaker", ComponentType.tool),
+        "aws_sagemaker_space": ("AWS SageMaker", ComponentType.container),
+        "aws_sagemaker_app": ("AWS SageMaker", ComponentType.tool),
+        "aws_sagemaker_model_package_group": ("AWS SageMaker", ComponentType.model),
+        # --- AWS Comprehend ---
+        "aws_comprehend_document_classifier": ("AWS Comprehend", ComponentType.model),
+        "aws_comprehend_entity_recognizer": ("AWS Comprehend", ComponentType.model),
+        # --- AWS Kendra ---
+        "aws_kendra_index": ("AWS Kendra", ComponentType.tool),
+        # --- AWS Lex ---
+        "aws_lexv2models_bot": ("AWS Lex", ComponentType.agent_framework),
+        # --- AWS Rekognition ---
+        "aws_rekognition_project": ("AWS Rekognition", ComponentType.model),
+        # --- Google Vertex AI ---
         "google_vertex_ai_endpoint": ("Google Vertex AI", ComponentType.endpoint),
         "google_vertex_ai_featurestore": ("Google Vertex AI", ComponentType.tool),
         "google_vertex_ai_index": ("Google Vertex AI", ComponentType.tool),
         "google_vertex_ai_tensorboard": ("Google Vertex AI", ComponentType.tool),
+        "google_vertex_ai_dataset": ("Google Vertex AI", ComponentType.tool),
+        "google_vertex_ai_metadata_store": ("Google Vertex AI", ComponentType.tool),
+        "google_vertex_ai_deployment_resource_pool": ("Google Vertex AI", ComponentType.container),
+        "google_vertex_ai_index_endpoint": ("Google Vertex AI", ComponentType.endpoint),
+        "google_vertex_ai_feature_online_store": ("Google Vertex AI", ComponentType.tool),
+        "google_vertex_ai_reasoning_engine": ("Google Vertex AI", ComponentType.agent_framework),
+        "google_notebooks_instance": ("Google Vertex AI", ComponentType.tool),
+        "google_workbench_instance": ("Google Vertex AI", ComponentType.tool),
+        # --- Google ML Engine ---
         "google_ml_engine_model": ("Google ML Engine", ComponentType.model),
+        # --- Google Dialogflow CX ---
+        "google_dialogflow_cx_agent": ("Google Dialogflow CX", ComponentType.agent_framework),
+        # --- Google Discovery Engine ---
+        "google_discovery_engine_search_engine": (
+            "Google Discovery Engine",
+            ComponentType.endpoint,
+        ),
+        # --- Azure AI ---
         "azurerm_cognitive_account": ("Azure AI", ComponentType.llm_provider),
+        "azurerm_cognitive_deployment": ("Azure OpenAI", ComponentType.endpoint),
+        "azurerm_ai_services": ("Azure AI", ComponentType.llm_provider),
+        "azurerm_ai_foundry": ("Azure AI Foundry", ComponentType.tool),
+        "azurerm_ai_foundry_project": ("Azure AI Foundry", ComponentType.tool),
+        # --- Azure ML ---
         "azurerm_machine_learning_workspace": ("Azure ML", ComponentType.tool),
-        "azurerm_machine_learning_compute_cluster": (
-            "Azure ML",
-            ComponentType.container,
-        ),
-        "azurerm_machine_learning_compute_instance": (
-            "Azure ML",
-            ComponentType.container,
-        ),
+        "azurerm_machine_learning_compute_cluster": ("Azure ML", ComponentType.container),
+        "azurerm_machine_learning_compute_instance": ("Azure ML", ComponentType.container),
+        "azurerm_machine_learning_inference_cluster": ("Azure ML", ComponentType.endpoint),
+        "azurerm_machine_learning_synapse_spark": ("Azure ML", ComponentType.container),
+        "azurerm_machine_learning_datastore_blobstorage": ("Azure ML", ComponentType.tool),
     }
 
     # CloudFormation resource types to (provider, component_type) mapping
     CLOUDFORMATION_AI_RESOURCES = {
+        # --- Bedrock ---
         "AWS::Bedrock::Agent": ("AWS Bedrock", ComponentType.agent_framework),
         "AWS::Bedrock::KnowledgeBase": ("AWS Bedrock", ComponentType.tool),
+        "AWS::Bedrock::AgentAlias": ("AWS Bedrock", ComponentType.agent_framework),
+        "AWS::Bedrock::DataSource": ("AWS Bedrock", ComponentType.tool),
+        "AWS::Bedrock::Flow": ("AWS Bedrock", ComponentType.workflow),
+        "AWS::Bedrock::FlowAlias": ("AWS Bedrock", ComponentType.workflow),
+        "AWS::Bedrock::Guardrail": ("AWS Bedrock", ComponentType.tool),
+        "AWS::Bedrock::Prompt": ("AWS Bedrock", ComponentType.tool),
+        "AWS::Bedrock::ApplicationInferenceProfile": ("AWS Bedrock", ComponentType.endpoint),
+        # --- SageMaker ---
         "AWS::SageMaker::Endpoint": ("AWS SageMaker", ComponentType.endpoint),
         "AWS::SageMaker::Model": ("AWS SageMaker", ComponentType.model),
         "AWS::SageMaker::EndpointConfig": ("AWS SageMaker", ComponentType.endpoint),
+        "AWS::SageMaker::NotebookInstance": ("AWS SageMaker", ComponentType.tool),
+        "AWS::SageMaker::Domain": ("AWS SageMaker", ComponentType.container),
+        "AWS::SageMaker::Pipeline": ("AWS SageMaker", ComponentType.workflow),
+        "AWS::SageMaker::FeatureGroup": ("AWS SageMaker", ComponentType.tool),
+        "AWS::SageMaker::ModelPackage": ("AWS SageMaker", ComponentType.model),
+        "AWS::SageMaker::ModelPackageGroup": ("AWS SageMaker", ComponentType.model),
+        "AWS::SageMaker::InferenceComponent": ("AWS SageMaker", ComponentType.endpoint),
+        "AWS::SageMaker::Space": ("AWS SageMaker", ComponentType.container),
+        # --- Comprehend ---
+        "AWS::Comprehend::DocumentClassifier": ("AWS Comprehend", ComponentType.model),
+        "AWS::Comprehend::Flywheel": ("AWS Comprehend", ComponentType.workflow),
+        # --- Kendra ---
+        "AWS::Kendra::Index": ("AWS Kendra", ComponentType.tool),
+        # --- Lex ---
+        "AWS::Lex::Bot": ("AWS Lex", ComponentType.agent_framework),
+        # --- Rekognition ---
+        "AWS::Rekognition::Project": ("AWS Rekognition", ComponentType.model),
     }
 
     # Patterns for GPU instance types
@@ -292,6 +362,21 @@ def _extract_terraform_metadata(
         if endpoint_name_match:
             metadata["endpoint_name"] = endpoint_name_match.group(1)
 
+        # kind = "..." (common in GCP resources)
+        kind_match = re.search(r'kind\s*=\s*"([^"]+)"', block_text)
+        if kind_match:
+            metadata["kind"] = kind_match.group(1)
+
+        # display_name = "..." (common in Azure/GCP resources)
+        display_name_match = re.search(r'display_name\s*=\s*"([^"]+)"', block_text)
+        if display_name_match:
+            metadata["display_name"] = display_name_match.group(1)
+
+        # description = "..." (common across providers)
+        description_match = re.search(r'description\s*=\s*"([^"]+)"', block_text)
+        if description_match:
+            metadata["description"] = description_match.group(1)
+
         return metadata
 
     def _scan_cloudformation(self, file_path: Path) -> list[AIComponent]:
@@ -345,6 +430,10 @@ def _scan_cloudformation(self, file_path: Path) -> list[AIComponent]:
                         properties.get("ModelId", "")
                         or properties.get("ModelName", "")
                         or properties.get("FoundationModel", "")
+                        or properties.get("AgentName", "")
+                        or properties.get("FlowName", "")
+                        or properties.get("GuardrailName", "")
+                        or properties.get("PipelineName", "")
                     )
 
                     # Create metadata
@@ -495,6 +584,10 @@ def _infer_usage_type(
             # Default to completion for LLM endpoints
             return UsageType.completion
 
+        # Workflows are used for orchestration
+        if component_type == ComponentType.workflow:
+            return UsageType.orchestration
+
         # Tools are used for tool_use
         if component_type == ComponentType.tool:
             return UsageType.tool_use
diff --git a/tests/fixtures/sample_cloudformation.yaml b/tests/fixtures/sample_cloudformation.yaml
@@ -0,0 +1,33 @@
+AWSTemplateFormatVersion: "2010-09-09"
+Description: Sample CloudFormation template with AI resources for testing
+
+Resources:
+  OrderProcessingFlow:
+    Type: AWS::Bedrock::Flow
+    Properties:
+      Name: order-processing-flow
+      Description: AI flow for processing customer orders
+      ExecutionRoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/BedrockFlowRole"
+
+  ContentGuardrail:
+    Type: AWS::Bedrock::Guardrail
+    Properties:
+      Name: content-safety
+      Description: Content filtering guardrail
+      BlockedInputMessaging: "Input blocked by guardrail"
+      BlockedOutputsMessaging: "Output blocked by guardrail"
+
+  TrainingPipeline:
+    Type: AWS::SageMaker::Pipeline
+    Properties:
+      PipelineName: model-training-pipeline
+      PipelineDescription: Automated model training pipeline
+      RoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/SageMakerRole"
+
+  SearchIndex:
+    Type: AWS::Kendra::Index
+    Properties:
+      Name: knowledge-base-index
+      Description: Enterprise knowledge base search index
+      Edition: ENTERPRISE_EDITION
+      RoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/KendraRole"
diff --git a/tests/fixtures/sample_terraform.tf b/tests/fixtures/sample_terraform.tf
diff --git a/tests/test_scanners/test_cloud_scanner.py b/tests/test_scanners/test_cloud_scanner.py