-
Notifications
You must be signed in to change notification settings - Fork 2
TMI Terraform Analyzer
Automated Terraform infrastructure analysis tool for threat modeling using LLM providers (Claude, GPT-4, Grok, Gemini).
The TMI Terraform Analysis Tool automates the analysis of Terraform infrastructure code associated with threat models in the TMI platform. It uses LLM providers (Claude, GPT-4, Grok, or Gemini) via LiteLLM to analyze infrastructure components, relationships, data flows, and security considerations, then generates comprehensive markdown reports stored as notes in TMI.
Repository: /Users/efitz/Projects/tmi-tf-wh
This tool can be used as a CLI for interactive analysis or deployed as a webhook-driven FastAPI service (e.g., on OKE) that processes analysis requests from TMI server events.
- Multi-Provider LLM Support -- Leverages Claude, GPT-4, Grok, or Gemini via LiteLLM for infrastructure analysis
- OAuth Authentication -- Google Sign-In (CLI mode) or Client Credentials (webhook/server mode)
- Smart Repository Discovery -- Automatically identifies GitHub repositories with Terraform code from threat models
- Environment Detection -- Detects multiple Terraform environments (root modules) within a repository and allows selection
-
Sparse Cloning -- Efficiently clones only Terraform files (
.tf,.tfvars) from repositories - Phased AI Analysis -- Three-phase LLM pipeline: inventory extraction, infrastructure analysis, and threat identification/analysis
- Automatic Threat Extraction -- Extracts security vulnerabilities and creates structured threat objects using STRIDE framework, CVSS 4.0 scoring, and CWE classification
- Data Flow Diagrams -- Generates interactive DFD diagrams showing infrastructure components, flows, and trust boundaries
- Comprehensive Reports -- Creates separate inventory and analysis markdown reports with security observations
-
TMI Integration
- Stores inventory and analysis results as separate notes in threat models
- Generates and stores data flow diagrams for more precise modeling
- Creates structured threat objects with STRIDE, CVSS, and CWE metadata
- Attaches LLM metadata (model, token counts, cost) to all generated artifacts
- Webhook Server Mode -- Runs as a FastAPI service processing analysis requests from an OCI Queue, triggered by TMI webhook events
- Python 3.10 or higher
- UV package manager
- Git
- Access to a TMI server (https://api.tmi.dev)
- API key for at least one LLM provider:
- Anthropic API key (for Claude) - default
- OpenAI API key (for GPT-4)
- x.ai API key (for Grok)
- Google API key (for Gemini)
- OCI Generative AI (via OCI credentials)
- Optional: GitHub personal access token (for higher API rate limits)
cd ~/Projects
git clone <repository-url> tmi-tf-wh
cd tmi-tf-whCopy the example environment file:
cp .env.example .envEdit .env and set your API keys:
ANTHROPIC_API_KEY=your_actual_anthropic_api_key_here
GITHUB_TOKEN=your_github_token_here # Optional but recommendeduv syncAll configuration is managed through the .env file:
| Variable | Description | Default |
|---|---|---|
TMI_SERVER_URL |
TMI server URL | https://api.tmi.dev |
TMI_OAUTH_IDP |
OAuth identity provider (google or tmi) |
google |
TMI_CLIENT_ID |
Client ID (required if TMI_OAUTH_IDP=tmi) |
None |
TMI_CLIENT_SECRET |
Client secret (required if TMI_OAUTH_IDP=tmi) |
None |
LLM_PROVIDER |
LLM provider: anthropic, openai, xai, gemini, or oci
|
anthropic |
LLM_MODEL |
Model override (optional) | Provider default |
LLM_API_KEY |
Generic API key, mapped to provider-specific env var | None |
ANTHROPIC_API_KEY |
Claude API key | Required if LLM_PROVIDER=anthropic
|
OPENAI_API_KEY |
OpenAI API key | Required if LLM_PROVIDER=openai
|
XAI_API_KEY |
x.ai API key | Required if LLM_PROVIDER=xai
|
GEMINI_API_KEY |
Google Gemini API key | Required if LLM_PROVIDER=gemini
|
OCI_CONFIG_PROFILE |
OCI config profile | DEFAULT |
OCI_COMPARTMENT_ID |
OCI compartment ID | Required if LLM_PROVIDER=oci
|
GITHUB_TOKEN |
GitHub personal access token | Optional |
MAX_REPOS |
Maximum repositories to analyze | 3 |
CLONE_TIMEOUT |
Git clone timeout in seconds | 300 |
ANALYSIS_NOTE_NAME |
Base name for the generated note | Terraform Analysis Report |
DIAGRAM_NAME |
Base name for the generated diagram | Infrastructure Data Flow Diagram |
Default models per provider:
| Provider | Default Model |
|---|---|
anthropic |
claude-opus-4-6 |
openai |
gpt-5.4 |
xai |
grok-4-1-fast-reasoning |
gemini |
gemini-3.1-pro-preview |
oci |
xai.grok-4 |
Note: The model name and timestamp are automatically appended to note and diagram names (e.g., "Terraform Inventory - prod (anthropic/claude-opus-4-6, 2026-04-04 12:00:00 UTC)").
When running as a webhook-driven service (e.g., deployed on OKE), additional configuration is available:
| Variable | Description | Default |
|---|---|---|
QUEUE_OCID |
OCI Queue OCID for receiving webhook messages | None |
VAULT_OCID |
OCI Vault OCID for secret retrieval | None |
WEBHOOK_SECRET |
HMAC shared secret for webhook validation | None |
WEBHOOK_SUBSCRIPTION_ID |
Subscription UUID filter | None |
MAX_CONCURRENT_JOBS |
Maximum parallel analysis jobs | 3 |
JOB_TIMEOUT |
Job timeout in seconds | 3600 |
MAX_MESSAGE_AGE_HOURS |
Maximum age for queue messages | 24 |
SERVER_PORT |
FastAPI server port | 8080 |
TMI_CLIENT_PATH |
Path to TMI Python client | None |
SECRET_PROVIDER |
Secret provider: oci or none
|
Inferred from VAULT_OCID
|
LLM_PROVIDER: Selects which LLM backend to use for analysis. All providers are accessed through LiteLLM. The oci provider uses OCI Generative AI with instance principal or config-file authentication.
LLM_MODEL: Optional override for the model name. If unset, the default model for the selected provider is used. If the value does not contain a /, the provider prefix is prepended automatically.
LLM_API_KEY: A convenience variable. When set, the value is copied into the provider-specific environment variable (e.g., ANTHROPIC_API_KEY when LLM_PROVIDER=anthropic). Useful when the key is loaded from a vault at runtime.
TMI_OAUTH_IDP: Set to google for browser-based PKCE OAuth (CLI mode) or tmi for client credentials flow (server/webhook mode).
ANTHROPIC_API_KEY: Get your API key from Anthropic Console. Required for Claude AI analysis.
GITHUB_TOKEN: Optional but highly recommended. Without a token, you are limited to 60 GitHub API requests per hour. With a token, you get 5,000 requests per hour. Create a token at GitHub Settings > Developer settings > Personal access tokens.
MAX_REPOS: Limits the number of repositories analyzed to avoid excessive API costs and processing time. Start with 1-3 repositories.
uv run tmi-tf config-infoShows current configuration including API endpoints, limits, LLM model, timestamp, and environment settings.
uv run tmi-tf authOpens a browser window for Google OAuth authentication. The token is cached in ~/.tmi-tf/token.json.
uv run tmi-tf list-repos <threat-model-id>Lists all repositories associated with a threat model and identifies which ones are GitHub repositories.
uv run tmi-tf analyze <threat-model-id>The main command - analyzes Terraform code and creates inventory notes, analysis notes, data flow diagrams, and threat objects in TMI.
uv run tmi-tf analyze <threat-model-id> [OPTIONS]Options:
-
--max-repos INTEGER- Override maximum number of repositories to analyze -
--dry-run- Analyze but do not create notes, diagrams, or threats (output to stdout) -
--output PATH- Save markdown reports to files (generates<name>-inventory.mdand<name>-analysis.md) -
--force-auth- Force new authentication (ignore cached token) -
--verbose- Enable verbose (DEBUG-level) logging -
--skip-diagram- Skip generating data flow diagram -
--skip-threats- Skip extracting and creating threat objects from security issues -
--environment, -e TEXT- Pre-select a Terraform environment by name (skip interactive prompt)
Analyze a threat model and save results to TMI:
uv run tmi-tf analyze abc-123-def-456Analyze and save the reports to local files:
uv run tmi-tf analyze abc-123-def-456 --output report.mdThis produces report-inventory.md and report-analysis.md.
Dry run to preview analysis without creating artifacts in TMI:
uv run tmi-tf analyze abc-123-def-456 --dry-runAnalyze only the first repository with verbose logging:
uv run tmi-tf analyze abc-123-def-456 --max-repos 1 --verboseWhen a repository contains multiple Terraform environments, pre-select one:
uv run tmi-tf analyze abc-123-def-456 --environment prodAnalyze infrastructure only, without generating DFD or threat objects:
uv run tmi-tf analyze abc-123-def-456 --skip-diagram --skip-threats# 1. Authenticate
uv run tmi-tf auth
# 2. List repositories to see what will be analyzed
uv run tmi-tf list-repos abc-123-def-456
# 3. Run analysis with verbose output
uv run tmi-tf analyze abc-123-def-456 --verbose --output analysis-$(date +%Y%m%d).mdThe analysis process follows these steps:
Authenticates with TMI server using Google OAuth 2.0 (CLI) or client credentials (server mode). Opens browser for user consent in CLI mode, then caches the JWT token locally.
Fetches the specified threat model and its associated repository references from TMI.
Identifies GitHub repositories and filters to the first MAX_REPOS repositories.
For each repository, performs a sparse clone to download only:
-
.tffiles (Terraform configurations) -
.tfvarsfiles (Terraform variables)
This significantly reduces clone time and storage requirements.
Detects multiple Terraform environments (root modules) within the cloned repository. If multiple environments are found, the CLI prompts the user for selection (or the --environment flag can pre-select one). If only one environment is found, it is auto-selected. In webhook mode, environment selection can be provided in the trigger payload, otherwise all environments are analyzed.
Sends Terraform code through a three-phase LLM pipeline:
Phase 1 - Inventory Extraction: Identifies all infrastructure components and services from the Terraform code (compute, storage, network, security resources).
Phase 2 - Infrastructure Analysis: Analyzes component relationships, dependencies, data flows, trust boundaries, and architecture summary using the inventory from Phase 1.
Phase 3a - Threat Identification: Identifies potential security threats based on the inventory and infrastructure analysis.
Phase 3b - Per-Threat Analysis: For each identified threat, performs detailed STRIDE classification, CVSS 4.0 vector scoring, CWE identification, severity assessment, and mitigation recommendations. Each threat gets its own LLM call.
Generates two separate markdown reports:
- Inventory Report -- Detailed listing of discovered infrastructure components and services
- Analysis Report -- Infrastructure relationships, data flows, security observations, and threat modeling recommendations
Creates a data flow diagram (DFD) using LLM-generated structured component and flow data, then builds TMI-native diagram cells. The diagram is stored in the threat model.
Creates structured threat objects in TMI from the security findings, including:
- STRIDE classification
- CVSS 4.0 vector and computed score
- CWE identifiers
- Severity (derived from CVSS score when available)
- Mitigation recommendations
- Links to the generated diagram (when available)
Attaches LLM provenance metadata (provider, model, token counts, estimated cost) to all generated artifacts (notes, diagrams, threats).
The tool generates the following artifacts in your TMI threat model:
Detailed listing of infrastructure components discovered from Terraform code:
- Compute resources (EC2, Lambda, ECS, etc.)
- Storage resources (S3, RDS, DynamoDB, etc.)
- Network resources (VPC, subnets, security groups, etc.)
- Security resources (IAM, KMS, secrets, etc.)
- Services and managed offerings
A comprehensive markdown report including:
- Infrastructure relationships and dependencies
- Data flow mapping
- Trust boundaries
- Security observations and concerns
- Architecture summary
- Threat modeling recommendations
An interactive diagram showing:
- Infrastructure components (processes, data stores, external entities)
- Data flows between components
- Trust boundaries and security zones
Structured threat objects automatically extracted from security analysis, including:
- Name: Clear, concise threat description
- Type: STRIDE classification (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege); can be multi-valued
- Description: Detailed threat description and risk assessment
- Severity: Critical, High, Medium, or Low (derived from CVSS score when available)
- Score: CVSS 4.0 base score (0.0-10.0)
- CVSS: CVSS 4.0 vector string and computed score
- CWE: Associated CWE identifiers (e.g., CWE-284)
- Mitigation: Recommended security controls and remediation strategies
- Status: Open (default for new threats)
tmi-tf-wh/
├── tmi_tf/
│ ├── __init__.py
│ ├── cli.py # CLI interface (Click commands)
│ ├── config.py # Configuration management
│ ├── auth.py # OAuth authentication (Google PKCE, client credentials)
│ ├── analyzer.py # Shared analysis pipeline (CLI + webhook)
│ ├── tmi_client_wrapper.py # TMI API client
│ ├── github_client.py # GitHub API integration
│ ├── repo_analyzer.py # Repository cloning, sparse checkout, environment detection
│ ├── llm_analyzer.py # Phased LLM analysis pipeline (via LiteLLM)
│ ├── markdown_generator.py # Report generation (inventory + analysis)
│ ├── dfd_llm_generator.py # LLM-based DFD component/flow generation
│ ├── diagram_builder.py # DFD cell builder (TMI-native format)
│ ├── threat_processor.py # Threat extraction, STRIDE/CVSS/CWE classification
│ ├── cvss_scorer.py # CVSS 4.0 vector validation and scoring
│ ├── tf_validator.py # Terraform file validation and sanitization
│ ├── artifact_metadata.py # LLM provenance metadata for artifacts
│ ├── retry.py # Transient LLM error retry logic
│ ├── server.py # FastAPI webhook server
│ ├── webhook_handler.py # Webhook payload parsing and HMAC validation
│ ├── worker.py # Async worker pool for analysis jobs
│ ├── job.py # Job tracking and lifecycle
│ ├── queue_client.py # OCI Queue client
│ ├── addon_callback.py # Addon status callback
│ └── providers/
│ ├── __init__.py # Secret provider registry
│ ├── oci.py # OCI Vault secret provider
│ └── none.py # No-op secret provider
├── prompts/
│ ├── inventory_system.txt # Phase 1 system prompt
│ ├── inventory_user.txt # Phase 1 user prompt template
│ ├── infrastructure_analysis_system.txt # Phase 2 system prompt
│ ├── infrastructure_analysis_user.txt # Phase 2 user prompt template
│ ├── threat_identification_system.txt # Phase 3a system prompt
│ ├── threat_identification_user.txt # Phase 3a user prompt template
│ ├── threat_analysis_system.txt # Phase 3b system prompt
│ ├── threat_analysis_user.txt # Phase 3b user prompt template
│ ├── dfd_generation_system.txt # DFD generation system prompt
│ ├── dfd_generation_user.txt # DFD generation user prompt template
│ ├── terraform_analysis_system.txt # Legacy single-pass system prompt
│ └── terraform_analysis_user.txt # Legacy single-pass user prompt template
├── scripts/
│ └── push-oci.sh # OCI container registry push script
├── tests/ # Test suite
├── infra/ # Infrastructure / deployment configs
├── docs/ # Additional documentation
├── Dockerfile # Multi-stage OCI Linux build for OKE deployment
├── .env # Environment configuration (not in git)
├── .env.example # Example environment file
├── pyproject.toml # Project dependencies
├── uv.lock # UV lock file
├── LICENSE # Apache License 2.0
├── CLAUDE.md # Claude Code instructions
└── README.md
Edit the prompt templates in the prompts/ directory to customize what the LLM analyzes. The analysis uses separate prompt pairs for each phase:
Phase 1 - Inventory (prompts/inventory_system.txt, prompts/inventory_user.txt):
- Defines how infrastructure components and services are extracted from Terraform code
Phase 2 - Infrastructure (prompts/infrastructure_analysis_system.txt, prompts/infrastructure_analysis_user.txt):
- Defines how relationships, data flows, and trust boundaries are identified
Phase 3a - Threat Identification (prompts/threat_identification_system.txt, prompts/threat_identification_user.txt):
- Defines how security threats are identified from the inventory and infrastructure analysis
Phase 3b - Threat Analysis (prompts/threat_analysis_system.txt, prompts/threat_analysis_user.txt):
- Defines how each threat is analyzed for STRIDE classification, CVSS 4.0 scoring, CWE mapping, and mitigation
DFD Generation (prompts/dfd_generation_system.txt, prompts/dfd_generation_user.txt):
- Defines how data flow diagram components and flows are generated from structured analysis data
Modify sparse checkout patterns in repo_analyzer.py to include/exclude file types:
patterns = [
"*.tf", # Terraform files
"*.tfvars", # Terraform variables
# Add more patterns as needed:
# "*.yaml", # Kubernetes manifests
# "*.yml", # CI/CD configs
# "Dockerfile*", # Docker files
]Change the provider in .env:
LLM_PROVIDER=openai
OPENAI_API_KEY=your_openai_api_key_here
# Optional: override default model
# LLM_MODEL=gpt-4oProblem: OAuth flow fails or token is rejected
Solution: Clear cached token and re-authenticate:
uv run tmi-tf clear-auth
uv run tmi-tf authProblem: "Token expired" error
Solution: Tokens expire after a set period. Force re-authentication:
uv run tmi-tf analyze <tm-id> --force-authGitHub API Rate Limits:
- Unauthenticated: 60 requests/hour
- Authenticated: 5,000 requests/hour
Solution: Set GITHUB_TOKEN in .env with a personal access token.
LLM API Rate Limits:
- Varies by provider and account tier (Anthropic, OpenAI, x.ai, Google)
- Tool implements exponential backoff retry logic for transient errors
Solution: Check your provider's account limits. Upgrade if needed or reduce MAX_REPOS.
Problem: Large repositories timeout during clone
Solution: Increase timeout in .env:
CLONE_TIMEOUT=600 # 10 minutesOr exclude problematic repositories from the threat model.
Problem: Analysis fails or is truncated for very large Terraform codebases
Cause: LLMs have varying context windows. When the LLM response is truncated, the tool logs a warning with finish_reason=length.
Solutions:
- Reduce
MAX_REPOSto analyze fewer repositories - Use
--environmentto analyze one environment at a time - Split large Terraform files into smaller modules
- Use
--max-repos 1to analyze one repository at a time
Problem: Analysis is too generic or misses important details
Solution: Customize prompts to be more specific:
- Edit the phase-specific prompt files in
prompts/to add domain expertise - Add examples of good analysis to the prompts
- Try a different LLM provider or model (
LLM_PROVIDER,LLM_MODEL)
Problem: Tool does not detect environments or selects the wrong one
Solution: Use the --environment flag to explicitly select an environment by name. Run with --verbose to see which environments were detected and how they were resolved.
- Proof of Concept -- This is a PoC tool, not production-ready
- Token Limits -- LLMs have varying context windows (Claude ~1M, GPT-4 ~128K); very large files may be truncated
- GitHub Only -- Currently supports only GitHub repositories (not GitLab, Bitbucket, etc.)
- Public Repos -- Best suited for public repositories; private repos require GitHub authentication
- Sequential Processing -- Repositories are analyzed sequentially (not parallelized)
- No State Management -- No resume capability if analysis fails mid-way
-
API Keys -- Never commit the
.envfile; it contains sensitive credentials -
Token Cache -- OAuth tokens are cached in
~/.tmi-tf/token.json - Temporary Files -- Cloned repositories are stored in temp directories and cleaned up automatically
- LLM Response Files -- Raw LLM responses are saved to a session temp directory for debugging
- Network Security -- All API calls use HTTPS
- Webhook Validation -- Server mode validates HMAC signatures on incoming webhook payloads
- AI Limitations -- LLM analysis should complement, not replace, human security review
- LLM API -- Each analysis incurs API costs based on tokens processed. The phased pipeline makes multiple LLM calls per repository (inventory, infrastructure, threat identification, plus one call per identified threat). Cost estimates are logged and attached as metadata to generated artifacts
- GitHub API -- Free with authentication token (5,000 requests/hour)
- Storage -- Sparse cloning minimizes storage, but multiple analyses accumulate temporary files
- Analysis Time -- Can take several minutes per repository depending on size and number of threats identified
- Network Dependency -- Requires a stable internet connection for API calls
- Authentication -- OAuth tokens expire and require periodic re-authentication
- Phased Pipeline -- If a phase fails, subsequent phases are skipped for that repository
- Add Repository Context -- Include README files and architecture docs in repositories
- Use Terraform Modules -- Well-structured modules improve analysis quality
- Document Decisions -- Add comments in Terraform explaining security decisions
- Review Analysis -- Always manually review LLM findings for accuracy
-
Start Small -- Begin with
MAX_REPOS=1to test and refine prompts -
Use Dry Run -- Test with
--dry-runbefore creating artifacts in TMI -
Save to Files -- Use
--outputto keep historical analysis records - Limit Scope -- Only add relevant repositories to threat models
-
Select Environment -- Use
--environmentto analyze one environment at a time
-
Protect API Keys -- Never commit the
.envfile - Use Read-Only Tokens -- GitHub token only needs repo read access
- Review Before Sharing -- Analysis reports may contain sensitive infrastructure details
- Regular Updates -- Re-run analysis when infrastructure changes
Potential improvements for future versions:
- Support for other Git providers (GitLab, Bitbucket)
- Parallel repository processing
- Resume capability for long-running analyses
- Terraform state file analysis
- Integration with terraform security scanners (tfsec, checkov)
- Custom analysis rules and filters
- Incremental analysis (only changed files)
- Multi-cloud support (AWS, Azure, GCP specific analysis)
- Cost estimation integration
- Compliance framework mapping (PCI-DSS, HIPAA, SOC 2)
This is a proof-of-concept tool. Contributions welcome for:
- Additional cloud provider support
- Enhanced analysis prompts
- Performance improvements
- Additional output formats
- Integration with other security tools
- API-Clients -- API client libraries
- API-Integration -- Integration patterns
- Extending-TMI -- Extension development
- Issue-Tracker-Integration -- Other integrations
For issues and questions:
- Check logs with
--verboseflag - Review configuration with
config-infocommand - Ensure all prerequisites are installed
- Verify TMI server accessibility
- Check LLM API key validity for your selected provider
- See Getting-Help for support channels
Version: 0.1.0 Status: Proof of Concept License: Apache License 2.0
- Using TMI for Threat Modeling
- Accessing TMI
- Authentication
- Creating Your First Threat Model
- Understanding the User Interface
- Working with Data Flow Diagrams
- Managing Threats
- Collaborative Threat Modeling
- Using Notes and Documentation
- Timmy AI Assistant
- Metadata and Extensions
- Planning Your Deployment
- Terraform Deployment (AWS, OCI, GCP, Azure)
- Deploying TMI Server
- OCI Container Deployment
- Certificate Automation
- Deploying TMI Web Application
- Setting Up Authentication
- Database Setup
- Component Integration
- Post-Deployment
- Branding and Customization
- Monitoring and Health
- Cloud Logging
- Configuration Management
- Config Migration Guide
- Database Operations
- Database Security Strategies
- Security Operations
- Performance and Scaling
- Maintenance Tasks
- Getting Started with Development
- Architecture and Design
- API Integration
- Testing
- Contributing
- Extending TMI
- Dependency Upgrade Plans
- DFD Graphing Library Reference
- Migration Instructions
- Issue Tracker Integration
- Webhook Integration
- Addon System
- MCP Integration
- Delegated Content Providers