Security breaches cost organizations an average of $4.45 million per incident, with detection taking 277 days on average. Traditional security approaches—manual log reviews, periodic vulnerability scans, siloed security tools—cannot protect modern cloud environments where infrastructure changes by the minute and threats evolve by the hour. AWS Security Services provide intelligent, automated threat detection, continuous compliance monitoring, and integrated security operations that scale with your infrastructure. Understanding these services deeply is essential for Solutions Architects building secure, compliant systems that protect against sophisticated attacks while maintaining operational efficiency.
The AWS Security Services portfolio addresses security challenges holistically. Amazon GuardDuty analyzes billions of events using machine learning to detect threats in real-time—compromised instances, unauthorized access, crypto-mining. AWS Security Hub aggregates findings from multiple services into a unified dashboard, providing security posture scores and automated compliance checks. Amazon Inspector continuously scans workloads for vulnerabilities and network exposure. Amazon Macie discovers and protects sensitive data using machine learning pattern recognition. AWS Detective investigates security findings by analyzing VPC Flow Logs, CloudTrail, and GuardDuty data to determine root causes. Together, these services transform security from reactive firefighting to proactive threat prevention.
This chapter builds on previous security knowledge (IAM policies from Chapter 2, VPC security groups from Chapter 3, encryption from storage chapters) by adding threat detection, continuous monitoring, and automated response. These services integrate with your existing architecture—GuardDuty analyzes VPC Flow Logs, Security Hub consolidates findings from Inspector scanning your EC2 instances, Macie examines S3 buckets for sensitive data. The chapter covers service architectures, threat detection mechanisms, findings analysis, automated remediation, compliance frameworks, integration patterns, and building security operations centers that detect and respond to threats within minutes instead of months.
Traditional vs Cloud Security:
Traditional Data Center Security:
Perimeter Defense Model:
- Physical security (building access)
- Network firewalls (single entry point)
- Manual security audits (quarterly)
- Static infrastructure (changes monthly)
Characteristics:
✓ Well-understood perimeter
✓ Controlled access points
✓ Physical security measures
✗ Slow to adapt
✗ Manual processes
✗ Cannot scale dynamically
Cloud Security Challenges:
Dynamic Infrastructure:
- Resources created/destroyed constantly
- No fixed perimeter (internet-facing services)
- Multi-account environments
- API-driven changes (thousands/day)
New Attack Vectors:
- API credential theft
- Misconfigured S3 buckets (public data)
- Excessive IAM permissions
- Unpatched AMIs
- Crypto-mining on compromised instances
Complexity:
- 200+ AWS services
- Thousands of configuration options
- Multiple teams deploying independently
- Compliance requirements vary by region
Traditional Tools Fail:
- Manual audits too slow (infrastructure changed)
- Perimeter security insufficient (services internet-facing)
- Static rules don't detect new threats
- Cannot process billions of events
Cloud-Native Security Requirements:
✓ Automated threat detection
✓ Continuous monitoring
✓ Real-time response
✓ Machine learning for anomaly detection
✓ API-driven remediation
✓ Multi-account visibility
AWS Shared Responsibility Model:
AWS Responsibility: Security OF the Cloud
- Physical data center security
- Network infrastructure
- Hypervisor isolation
- Hardware disposal
- Service availability
Customer Responsibility: Security IN the Cloud
- IAM policies and permissions
- Data encryption
- Network configuration (security groups, NACLs)
- Application security
- Patch management
- Access logging and monitoring
- Compliance validation
Example Breakdown:
EC2 Instance Security:
AWS: Physical server, hypervisor, network
Customer: OS patches, application security, data encryption, IAM roles
S3 Bucket Security:
AWS: Infrastructure, availability, hardware
Customer: Bucket policies, encryption, versioning, access logging
RDS Database Security:
AWS: Infrastructure, automated backups, patching (managed)
Customer: Database credentials, encryption keys, security groups, IAM
Critical Insight:
AWS provides secure infrastructure
Customers must configure it securely
AWS Security Services help customers meet their responsibility
Architecture and Threat Detection:
GuardDuty Purpose:
Intelligent threat detection service analyzing:
- VPC Flow Logs (network traffic)
- CloudTrail events (API calls)
- DNS logs (domain queries)
- Kubernetes audit logs (EKS)
Machine Learning Models:
- Baseline normal behavior
- Detect anomalies
- Known malicious IPs/domains
- Crypto-mining signatures
- Reconnaissance patterns
GuardDuty Data Flow:
AWS Account Activity
├── VPC Flow Logs → GuardDuty Analysis Engine
├── CloudTrail Logs → ML Models & Threat Intelligence
├── DNS Logs → Pattern Recognition
└── EKS Audit Logs → Behavioral Analysis
↓
Findings Generated
├── Severity: Low, Medium, High, Critical
├── Type: UnauthorizedAccess, CryptoCurrency, etc.
└── Evidence: IP addresses, resources, timestamps
↓
Integration Points
├── EventBridge (automated response)
├── Security Hub (centralized view)
└── SNS (notifications)
Key Features:
Threat Intelligence:
- AWS threat feeds (billions of IPs/domains)
- Third-party feeds (Proofpoint, CrowdStrike)
- Malicious IP databases
- Known command-and-control servers
Anomaly Detection:
- Unusual API calls
- New ports/protocols
- Login from new locations
- Spikes in API activity
- Resource access patterns
Severity Levels:
Low (Informational):
- Reconnaissance attempts
- Minor policy violations
Example: "Port scan detected from known scanner"
Medium:
- Suspicious activity requiring investigation
- Potential security issues
Example: "Instance communicating with known malicious IP"
High:
- Active threat or policy violation
- Requires immediate attention
Example: "Cryptocurrency mining activity detected"
Critical:
- Confirmed compromise or severe threat
- Emergency response needed
Example: "Backdoor installed on instance, data exfiltration detected"
Common GuardDuty Finding Types:
1. UnauthorizedAccess Findings:
UnauthorizedAccess:EC2/SSHBruteForce
- SSH brute force attempts
- Multiple failed login attempts
- Likely automated attack
Response: Block source IP, review instance security
UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration
- IAM credentials used outside AWS
- Stolen credentials used externally
- Critical security breach
Response: Rotate credentials immediately, investigate compromise
UnauthorizedAccess:IAMUser/TorIPCaller
- API calls from Tor exit nodes
- Anonymized access (suspicious)
- Possible credential theft
Response: Require MFA, investigate user activity
2. CryptoCurrency Findings:
CryptoCurrency:EC2/BitcoinTool.B!DNS
- Instance querying Bitcoin-related domains
- Crypto-mining activity
- Resource theft
Response: Isolate instance, forensic analysis, terminate
3. Backdoor Findings:
Backdoor:EC2/C&CActivity.B!DNS
- Communication with command-and-control server
- Instance compromised
- Active threat
Response: Isolate immediately, snapshot for forensics, terminate
4. Reconnaissance Findings:
Recon:EC2/PortProbeUnprotectedPort
- Port scanning from instance
- Preparing for attack
- Early warning
Response: Review instance purpose, investigate legitimacy
5. Trojan Findings:
Trojan:EC2/DNSDataExfiltration
- Data exfiltration via DNS queries
- Advanced persistent threat
- Ongoing data theft
Response: Isolate, forensic analysis, incident response
6. Impact Findings:
Impact:EC2/AbusedDomainRequest.Reputation
- Queries to known malicious domains
- Malware communication
- Compromised instance
Response: Isolate and investigate
Cost Model:
Free Tier: 30-day trial (full features)
Pricing After Trial:
- CloudTrail analysis: $4.00 per million events
- VPC Flow Log analysis: $1.00 per GB
- DNS log analysis: $0.40 per million events
- EKS audit logs: $0.40 per million events
Typical Monthly Cost:
Small environment (50 resources): $30-50/month
Medium environment (500 resources): $100-200/month
Large environment (5,000+ resources): $500-1,000+/month
Cost Optimization:
- Analyze by log volume, not resource count
- High value vs cost (detect threats early)
- Cheaper than breach ($4.45M average)
Centralized Security Posture Management:
Security Hub Purpose:
- Aggregate findings from multiple services
- Unified security view across accounts
- Compliance framework checks
- Security score dashboard
- Automated remediation workflows
Architecture:
Security Finding Sources:
├── GuardDuty (threat detection)
├── Inspector (vulnerability scanning)
├── Macie (sensitive data discovery)
├── IAM Access Analyzer (permission risks)
├── Firewall Manager (policy compliance)
├── Systems Manager (patch compliance)
├── Third-party tools (Palo Alto, Splunk, etc.)
└── Custom findings (via API)
↓
Security Hub Aggregation
├── Normalize findings (standard format)
├── Calculate security score
├── Apply insights (filters/groupings)
└── Check compliance frameworks
↓
Integration Points
├── EventBridge (automated actions)
├── SIEM tools (Splunk, QRadar)
├── Ticketing systems (ServiceNow, Jira)
└── ChatOps (Slack, Teams)
Security Standards:
1. AWS Foundational Security Best Practices:
- Comprehensive AWS security checks
- IAM, EC2, S3, RDS, Lambda, etc.
- 150+ automated checks
- Free (included)
2. CIS AWS Foundations Benchmark:
- Industry-standard security baseline
- Center for Internet Security standards
- 50+ checks
- IAM, logging, monitoring, networking
3. PCI DSS (Payment Card Industry):
- Credit card data protection
- Required for payment processing
- 40+ checks
- Encryption, access control, monitoring
4. NIST (National Institute of Standards):
- Federal security standards
- Government compliance
- Comprehensive controls
5. ISO 27001:
- International security standard
- Information security management
- Audit-ready documentation
Security Score:
Calculation:
Total Passed Checks / Total Checks × 100
Example:
- Total checks: 100
- Passed: 85
- Failed: 15
- Score: 85%
Target: >90% for production accounts
Finding Format (ASFF):
AWS Security Finding Format:
{
"SchemaVersion": "2018-10-08",
"Id": "finding-id-12345",
"ProductArn": "arn:aws:securityhub:us-east-1::product/aws/guardduty",
"GeneratorId": "backdoor-instance",
"AwsAccountId": "123456789012",
"Types": ["TTPs/Command and Control/Backdoor"],
"CreatedAt": "2025-01-15T10:30:00.000Z",
"UpdatedAt": "2025-01-15T10:30:00.000Z",
"Severity": {
"Label": "HIGH",
"Normalized": 70
},
"Title": "Backdoor:EC2/C&CActivity.B!DNS",
"Description": "EC2 instance communicating with command-and-control server",
"Resources": [{
"Type": "AwsEc2Instance",
"Id": "arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
"Region": "us-east-1",
"Tags": {"Environment": "production"}
}],
"Compliance": {
"Status": "FAILED"
},
"Remediation": {
"Recommendation": {
"Text": "Isolate instance, perform forensic analysis"
}
},
"Workflow": {
"Status": "NEW" // NEW, NOTIFIED, RESOLVED, SUPPRESSED
}
}
Benefits of Standard Format:
✓ Consistent across all services
✓ Easy automation
✓ SIEM integration simplified
✓ Third-party tool compatibility
Insights and Custom Actions:
Insights (Grouping Findings):
Built-in Insights:
1. Resources with most findings
2. Top severity findings
3. Findings by resource type
4. Findings by compliance status
5. Unresolved critical findings
Custom Insights:
Filter: ResourceType = "AwsEc2Instance"
Group by: Tag.Environment
Result: EC2 findings grouped by environment
Use Case: Track security by team/project
Custom Actions (Automated Response):
Example: Auto-remediate S3 public bucket
1. Create EventBridge rule:
- Pattern: S3 bucket public finding
- Target: Lambda function
2. Lambda function:
- Receive finding details
- Call S3 API to block public access
- Update Security Hub (mark resolved)
- Send notification
3. Result: Public bucket fixed automatically (seconds)
Integration Architecture:
Security Hub Finding
↓
EventBridge Rule (filter by finding type)
↓
Lambda Function (remediation logic)
↓
AWS API (fix issue)
↓
Security Hub API (update finding status)
↓
SNS (notify security team)
Vulnerability and Network Exposure Assessment:
Inspector Purpose:
- Continuous vulnerability scanning
- Network reachability analysis
- Package vulnerability detection
- Lambda function assessment
Supported Resources:
- EC2 instances
- Container images (ECR)
- Lambda functions
Assessment Types:
1. Package Vulnerabilities (CVE):
- Scans OS packages
- Application dependencies
- Known vulnerabilities (CVE database)
- Severity scoring (CVSS)
2. Network Reachability:
- Port accessibility
- Security group analysis
- Network path analysis
- Internet exposure
Architecture:
EC2 Instance Assessment:
1. Systems Manager Agent (SSM Agent) required
2. Inspector service scans instance
3. Checks installed packages
4. Compares against CVE database
5. Analyzes network configuration
6. Generates findings
Container Image Assessment:
1. Push image to ECR
2. Inspector automatically scans
3. Checks base image and layers
4. Identifies vulnerabilities
5. Provides remediation guidance
Lambda Function Assessment:
1. Inspector scans function code
2. Checks dependencies
3. Identifies vulnerabilities
4. Continuous monitoring
Finding Severity:
Critical (CVSS 9.0-10.0):
- Remote code execution
- SQL injection
- Authentication bypass
- Immediate action required
High (CVSS 7.0-8.9):
- Privilege escalation
- Information disclosure
- Denial of service
- Urgent attention needed
Medium (CVSS 4.0-6.9):
- Minor security issues
- Limited impact
- Should be addressed
Low (CVSS 0.1-3.9):
- Minimal risk
- Informational
- Optional fix
Finding Example:
Title: CVE-2024-12345 - OpenSSL vulnerability
Severity: High (CVSS 8.2)
Resource: EC2 instance i-1234567890abcdef0
Package: openssl-1.0.2k (vulnerable)
Fix: Upgrade to openssl-1.0.2u or later
Impact: Remote code execution possible
Network Path: Internet → Port 443 → Instance
Remediation:
1. Update package: yum update openssl
2. Restart affected services
3. Verify patch applied
Pricing:
EC2 Scanning:
- $0.30 per instance per month
- 100 instances = $30/month
ECR Image Scanning:
- First scan per image per month: Free
- Subsequent scans: $0.09 per scan
Lambda Function Scanning:
- $0.60 per function per month
- Scans code and dependencies
Sensitive Data Discovery and Protection:
Macie Purpose:
- Discover sensitive data (PII, financial, credentials)
- Data security and privacy compliance
- S3 bucket inventory and classification
- Automated data protection
Machine Learning Detection:
Sensitive Data Types:
- Personal Identifiable Information (PII):
* Names, addresses, phone numbers
* Email addresses
* Social Security numbers
* Driver's license numbers
* Passport numbers
- Financial Information:
* Credit card numbers
* Bank account numbers
* Tax IDs
* Financial statements
- Credentials:
* AWS access keys
* Private keys
* API keys
* Passwords
- Healthcare:
* Medical records
* Health insurance numbers
* Prescription data
- Custom Identifiers:
* Regex patterns (custom)
* Keywords
* Business-specific data
Architecture:
S3 Buckets (Data Sources)
↓
Macie Discovery Jobs
├── Sample objects (automated)
├── Scan content
├── Apply ML models
└── Pattern matching
↓
Classification Results
├── Bucket inventory
├── Object metadata
├── Sensitivity levels
└── Data identifiers
↓
Findings
├── Sensitive data discovered
├── Policy violations
└── Security issues
Discovery Job Types:
1. One-Time Job:
- Scan specific buckets once
- Comprehensive analysis
- Historical data assessment
2. Scheduled Job:
- Daily, weekly, or monthly
- Continuous monitoring
- New data detection
3. Automated Discovery:
- Monitors all buckets
- Samples objects automatically
- Prioritizes analysis
Findings:
Policy Findings:
- Unencrypted bucket
- Public bucket
- Shared with external accounts
- Missing bucket policies
Sensitive Data Findings:
- PII discovered
- Financial data exposed
- Credentials in objects
- Healthcare information
Example Finding:
Title: Multiple credit card numbers discovered
Severity: High
Bucket: customer-data-prod
Object: customers/export-2025-01.csv
Count: 1,524 credit card numbers detected
Exposure: Bucket is public
Recommendation:
1. Block public access immediately
2. Enable encryption
3. Remove sensitive data or mask
4. Implement access logging
Pricing:
Bucket Inventory:
- $0.10 per 1,000 bucket-months
- 100 buckets × 12 months = $12/year
Sensitive Data Discovery:
- $1.00 per GB scanned
- 1 TB discovery job = $1,000
Automated Discovery (Sampling):
- $0.10 per GB scanned (sampled)
- More cost-effective for large datasets
Cost Optimization:
- Use sampling for large buckets
- Schedule jobs during off-hours
- Focus on high-risk buckets first
Security Investigation and Root Cause Analysis:
Detective Purpose:
- Visualize security investigation data
- Analyze root cause of findings
- Correlate events across services
- Interactive graph analysis
Data Sources:
- VPC Flow Logs (network traffic)
- CloudTrail logs (API activity)
- GuardDuty findings (threats)
- Aggregated over time
Investigation Capabilities:
Behavior Graphs:
- Entity relationships (users, roles, IPs, instances)
- Activity timelines
- Anomaly detection
- Historical analysis
Use Cases:
1. Compromised Instance Investigation:
GuardDuty Alert: Instance mining cryptocurrency
↓
Detective Analysis:
- When did unusual activity start?
- What API calls preceded compromise?
- Which IAM credentials accessed instance?
- What network connections were established?
- What data was accessed?
Result: Identify initial attack vector
2. Unusual API Activity:
GuardDuty Alert: Spike in IAM API calls
↓
Detective Analysis:
- Normal baseline for this user?
- Geographic anomaly (new location)?
- What resources accessed?
- Pattern of reconnaissance?
Result: Determine if legitimate or attack
3. Data Exfiltration:
GuardDuty Alert: Large data transfer
↓
Detective Analysis:
- What triggered the transfer?
- Source and destination?
- User/role responsible?
- Historical transfer patterns?
Result: Confirm data breach, scope impact
Visual Analysis:
Example Graph View:
IAM User (john@example.com)
↓
Assumed Role (EC2-Admin)
↓
Launched Instance (i-12345)
↓
Network Connection (malicious IP)
↓
Data Transfer (5 GB to external)
Timeline: Shows exact timestamps
Anomaly: Highlight unusual activity
Context: Compare to historical baseline
Benefits:
✓ Faster investigations (hours → minutes)
✓ Visual correlation of events
✓ Historical context (up to 1 year)
✓ Reduce false positives
✓ Understand attack progression
Pricing:
Data Ingestion:
- $2.00 per GB of VPC Flow Logs
- $2.00 per GB of CloudTrail logs
- $2.00 per GB of GuardDuty findings
Typical Cost:
Small account: $50-100/month
Medium account: $200-500/month
Large account: $1,000+/month
Free Trial: 30 days (full features)
Unified Security Operations:
Complete Security Stack:
Data Collection Layer:
├── VPC Flow Logs
├── CloudTrail
├── DNS Logs
├── Application Logs
└── Network Packet Capture
Detection Layer:
├── GuardDuty (threat detection)
├── Inspector (vulnerability scanning)
├── Macie (sensitive data discovery)
├── IAM Access Analyzer (permission risks)
└── Config (configuration compliance)
Aggregation Layer:
└── Security Hub (central dashboard)
├── Normalize findings
├── Calculate security score
├── Apply compliance frameworks
└── Generate insights
Analysis Layer:
└── Detective (root cause analysis)
├── Behavior graphs
├── Timeline analysis
└── Anomaly detection
Response Layer:
├── EventBridge (routing)
├── Lambda (automation)
├── Systems Manager (remediation)
└── SNS (notifications)
Integration Flow:
GuardDuty Finding (compromised instance)
↓
Security Hub (aggregates finding)
↓
EventBridge Rule (triggers on HIGH severity)
↓
Lambda Function (remediation workflow)
├── Isolate instance (security group change)
├── Snapshot instance (forensics)
├── Create Detective investigation
├── Notify security team (SNS)
└── Create ticket (ServiceNow API)
↓
Security Team Reviews Detective Analysis
↓
Decision: Terminate or Remediate
↓
Security Hub Updated (mark resolved)
Multi-Account Strategy:
Organization Structure:
├── Security Account (GuardDuty/Security Hub master)
├── Log Archive Account (centralized logs)
└── Member Accounts (applications)
Benefits:
✓ Centralized security team
✓ Consolidated findings
✓ Consistent policies
✓ Delegated administration
✓ Cost optimization
Objective: Enable threat detection and create automated response to compromised instances.
Step 1: Enable GuardDuty
# Enable GuardDuty (AWS CLI)
aws guardduty create-detector \
--enable \
--finding-publishing-frequency FIFTEEN_MINUTES
# Get detector ID
DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)
echo "GuardDuty enabled with detector ID: $DETECTOR_ID"Step 2: Create SNS Topic for Alerts
# Create SNS topic
TOPIC_ARN=$(aws sns create-topic \
--name guardduty-alerts \
--query 'TopicArn' \
--output text)
# Subscribe email to topic
aws sns subscribe \
--topic-arn $TOPIC_ARN \
--protocol email \
--notification-endpoint security@example.com
# Confirm subscription (check email)Step 3: Create EventBridge Rule for High Severity Findings
// guardduty-rule-pattern.json
{
"source": ["aws.guardduty"],
"detail-type": ["GuardDuty Finding"],
"detail": {
"severity": [7, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8, 8.0, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9]
}
}# Create EventBridge rule
aws events put-rule \
--name guardduty-high-severity \
--event-pattern file://guardduty-rule-pattern.json \
--state ENABLED
# Add SNS topic as target
aws events put-targets \
--rule guardduty-high-severity \
--targets "Id"="1","Arn"="$TOPIC_ARN"Step 4: Create Lambda Function for Automated Instance Isolation
# lambda_function.py
import boto3
import json
ec2 = boto3.client('ec2')
sns = boto3.client('sns')
def lambda_handler(event, context):
"""
Isolate compromised EC2 instance automatically
"""
# Parse GuardDuty finding
finding = event['detail']
finding_type = finding['type']
severity = finding['severity']
# Extract instance ID
resources = finding.get('resource', {}).get('instanceDetails', {})
instance_id = resources.get('instanceId')
if not instance_id:
print("No instance ID found in finding")
return
print(f"Processing finding: {finding_type}")
print(f"Severity: {severity}")
print(f"Instance ID: {instance_id}")
# Create isolation security group (no inbound/outbound)
try:
response = ec2.create_security_group(
GroupName=f'isolated-{instance_id}',
Description='Isolation SG for compromised instance',
VpcId=resources.get('vpcId')
)
isolation_sg = response['GroupId']
print(f"Created isolation security group: {isolation_sg}")
# Remove all rules (deny all traffic)
# Default: no rules = deny all
except Exception as e:
print(f"Error creating security group: {e}")
# Use existing isolation SG if available
isolation_sg = 'sg-isolation-group-id' # Pre-created SG
# Apply isolation security group to instance
try:
ec2.modify_instance_attribute(
InstanceId=instance_id,
Groups=[isolation_sg]
)
print(f"Instance {instance_id} isolated successfully")
# Create snapshot for forensics
volumes = ec2.describe_instance_attribute(
InstanceId=instance_id,
Attribute='blockDeviceMapping'
)
for volume in volumes.get('BlockDeviceMappings', []):
volume_id = volume.get('Ebs', {}).get('VolumeId')
if volume_id:
snapshot = ec2.create_snapshot(
VolumeId=volume_id,
Description=f'Forensic snapshot for {instance_id}'
)
print(f"Created snapshot: {snapshot['SnapshotId']}")
# Send notification
message = f"""
GuardDuty Alert: Instance Compromised and Isolated
Finding Type: {finding_type}
Severity: {severity}
Instance ID: {instance_id}
Action Taken: Instance isolated, forensic snapshots created
Next Steps:
1. Review GuardDuty finding details
2. Analyze forensic snapshots
3. Determine root cause
4. Terminate or remediate instance
"""
sns.publish(
TopicArn='arn:aws:sns:region:account:guardduty-alerts',
Subject='GuardDuty: Instance Isolated',
Message=message
)
except Exception as e:
print(f"Error isolating instance: {e}")
raise
return {
'statusCode': 200,
'body': json.dumps(f'Instance {instance_id} isolated')
}# Package and deploy Lambda
zip function.zip lambda_function.py
aws lambda create-function \
--function-name guardduty-auto-isolate \
--runtime python3.11 \
--role arn:aws:iam::account:role/lambda-execution-role \
--handler lambda_function.lambda_handler \
--zip-file fileb://function.zip \
--timeout 60
# Add Lambda as EventBridge target
aws events put-targets \
--rule guardduty-high-severity \
--targets "Id"="2","Arn"="arn:aws:lambda:region:account:function:guardduty-auto-isolate"
# Grant EventBridge permission to invoke Lambda
aws lambda add-permission \
--function-name guardduty-auto-isolate \
--statement-id AllowEventBridgeInvoke \
--action 'lambda:InvokeFunction' \
--principal events.amazonaws.comStep 5: Test with Sample Finding
# Generate sample finding
aws guardduty create-sample-findings \
--detector-id $DETECTOR_ID \
--finding-types "Backdoor:EC2/C&CActivity.B!DNS"
# Check EventBridge invocations
aws cloudwatch get-metric-statistics \
--namespace AWS/Events \
--metric-name TriggeredRules \
--dimensions Name=RuleName,Value=guardduty-high-severity \
--start-time 2025-01-15T00:00:00Z \
--end-time 2025-01-15T23:59:59Z \
--period 3600 \
--statistics SumExpected Outcomes:
- GuardDuty enabled and analyzing logs
- Email alert received for high-severity findings
- Lambda automatically isolates compromised instances
- Forensic snapshots created
- Security team notified
Objective: Enable Security Hub across multiple accounts with centralized management.
Step 1: Enable Security Hub in Master Account
import boto3
securityhub = boto3.client('securityhub')
# Enable Security Hub
securityhub.enable_security_hub(
Tags={'Environment': 'Production'},
EnableDefaultStandards=True # Enable AWS Foundational Best Practices
)
# Enable additional standards
securityhub.batch_enable_standards(
StandardsSubscriptionRequests=[
{'StandardsArn': 'arn:aws:securityhub:us-east-1::standards/cis-aws-foundations-benchmark/v/1.2.0'},
{'StandardsArn': 'arn:aws:securityhub:us-east-1::standards/pci-dss/v/3.2.1'}
]
)
print("Security Hub enabled with CIS and PCI DSS standards")Step 2: Invite Member Accounts
# In master account
member_accounts = [
{'AccountId': '111111111111', 'Email': 'account1@example.com'},
{'AccountId': '222222222222', 'Email': 'account2@example.com'}
]
# Create members
securityhub.create_members(
AccountDetails=member_accounts
)
# Invite members
for account in member_accounts:
securityhub.invite_members(
AccountIds=[account['AccountId']]
)
print("Member accounts invited")Step 3: Accept Invitation in Member Accounts
# In each member account
# List invitations
invitations = securityhub.list_invitations()
for invitation in invitations['Invitations']:
master_id = invitation['AccountId']
invitation_id = invitation['InvitationId']
# Accept invitation
securityhub.accept_invitation(
MasterId=master_id,
InvitationId=invitation_id
)
print(f"Accepted invitation from master account {master_id}")Step 4: View Aggregated Findings
# In master account - view all findings across accounts
import pandas as pd
def get_security_score():
"""Calculate security score across all accounts"""
# Get compliance results
paginator = securityhub.get_paginator('get_compliance_summary_by_resource_type')
total_passed = 0
total_failed = 0
for page in paginator.paginate():
for result in page['SummaryByResourceType']:
total_passed += result['PassedCount']
total_failed += result['FailedCount']
total_checks = total_passed + total_failed
score = (total_passed / total_checks * 100) if total_checks > 0 else 0
print(f"Security Score: {score:.2f}%")
print(f"Passed Checks: {total_passed}")
print(f"Failed Checks: {total_failed}")
return score
def get_critical_findings():
"""Get all critical findings"""
response = securityhub.get_findings(
Filters={
'SeverityLabel': [{'Value': 'CRITICAL', 'Comparison': 'EQUALS'}],
'WorkflowStatus': [{'Value': 'NEW', 'Comparison': 'EQUALS'}]
},
MaxResults=100
)
findings = response['Findings']
# Create summary
summary = []
for finding in findings:
summary.append({
'Title': finding['Title'],
'Resource': finding['Resources'][0]['Id'],
'Account': finding['AwsAccountId'],
'Severity': finding['Severity']['Label']
})
df = pd.DataFrame(summary)
print("\nCritical Findings:")
print(df)
return findings
# Run reports
score = get_security_score()
critical_findings = get_critical_findings()Objective: Enable continuous vulnerability scanning for EC2 instances and containers.
Step 1: Enable Inspector
# Enable Inspector
aws inspector2 enable \
--resource-types EC2 ECR LAMBDA
# Check status
aws inspector2 batch-get-account-status \
--account-ids $(aws sts get-caller-identity --query Account --output text)Step 2: Install SSM Agent on EC2 Instances
# For Amazon Linux 2
sudo yum install -y amazon-ssm-agent
sudo systemctl enable amazon-ssm-agent
sudo systemctl start amazon-ssm-agent
# Verify agent running
sudo systemctl status amazon-ssm-agent
# Instance IAM role needs AmazonSSMManagedInstanceCore policyStep 3: View Vulnerability Findings
import boto3
inspector = boto3.client('inspector2')
def get_vulnerability_summary():
"""Get summary of vulnerabilities by severity"""
# Get all findings
paginator = inspector.get_paginator('list_findings')
vulnerabilities = {
'CRITICAL': 0,
'HIGH': 0,
'MEDIUM': 0,
'LOW': 0
}
for page in paginator.paginate():
for finding in page['findings']:
severity = finding['severity']
vulnerabilities[severity] += 1
print("Vulnerability Summary:")
for severity, count in vulnerabilities.items():
print(f"{severity}: {count}")
return vulnerabilities
def get_critical_vulnerabilities():
"""Get details of critical vulnerabilities"""
response = inspector.list_findings(
filterCriteria={
'severity': [{'comparison': 'EQUALS', 'value': 'CRITICAL'}]
},
maxResults=50
)
print("\nCritical Vulnerabilities:")
for finding in response['findings']:
print(f"\nTitle: {finding['title']}")
print(f"Resource: {finding['resources'][0]['id']}")
print(f"Package: {finding.get('packageVulnerabilityDetails', {}).get('vulnerablePackages', [{}])[0].get('name')}")
print(f"CVE: {finding.get('packageVulnerabilityDetails', {}).get('vulnerabilityId')}")
print(f"Fix: {finding.get('remediation', {}).get('recommendation', {}).get('text')}")
# Run reports
summary = get_vulnerability_summary()
critical = get_critical_vulnerabilities()Step 4: Automated Patching with Systems Manager
# Create Systems Manager maintenance window
ssm = boto3.client('ssm')
# Create maintenance window
response = ssm.create_maintenance_window(
Name='PatchCriticalVulnerabilities',
Schedule='cron(0 2 ? * SUN *)', # Weekly Sunday 2 AM
Duration=4, # 4 hours
Cutoff=1, # Stop new tasks 1 hour before end
AllowUnassociatedTargets=False
)
window_id = response['WindowId']
# Register targets (instances with critical vulnerabilities)
ssm.register_target_with_maintenance_window(
WindowId=window_id,
ResourceType='INSTANCE',
Targets=[{
'Key': 'tag:PatchGroup',
'Values': ['Critical']
}]
)
# Register patch task
ssm.register_task_with_maintenance_window(
WindowId=window_id,
TaskType='RUN_COMMAND',
TaskArn='AWS-RunPatchBaseline',
Priority=1,
MaxConcurrency='50%',
MaxErrors='25%'
)
print(f"Automated patching configured in window {window_id}")Enterprise Security Architecture:
SOC Integration Pattern:
AWS Security Services → SIEM Platform → Security Analysts
Data Flow:
GuardDuty Findings
Security Hub Findings → EventBridge → Lambda Transform
Inspector Findings ↓
Macie Findings Kinesis Firehose
CloudTrail Events ↓
VPC Flow Logs S3 Bucket (staging)
↓
SIEM (Splunk/QRadar/Elastic)
↓
Security Dashboard
↓
Threat Hunting
↓
Incident Response
Implementation:
# Lambda function to transform findings for SIEM
def transform_for_siem(event):
"""Transform AWS finding to SIEM format"""
finding = event['detail']
# Map to Common Event Format (CEF)
cef_event = {
'timestamp': finding['updatedAt'],
'severity': finding['severity']['normalized'],
'event_type': finding['type'],
'source_ip': finding.get('service', {}).get('action', {}).get('networkConnectionAction', {}).get('remoteIpDetails', {}).get('ipAddressV4'),
'resource_id': finding['resources'][0]['id'],
'account_id': finding['awsAccountId'],
'region': finding['region'],
'description': finding['description'],
'remediation': finding.get('remediation', {}).get('recommendation', {}).get('text')
}
return cef_event
# Kinesis Firehose delivery to SIEM
import boto3
firehose = boto3.client('firehose')
firehose.create_delivery_stream(
DeliveryStreamName='security-findings-to-siem',
DeliveryStreamType='DirectPut',
HttpEndpointDestinationConfiguration={
'EndpointConfiguration': {
'Url': 'https://siem-collector.example.com/ingest',
'AccessKey': 'api-key'
},
'RequestConfiguration': {
'ContentEncoding': 'GZIP'
},
'BufferingHints': {
'SizeInMBs': 5,
'IntervalInSeconds': 60
}
}
)
Automated Playbooks:
# Automated incident response playbooks
class SecurityPlaybook:
"""Automated security incident response"""
def __init__(self):
self.ec2 = boto3.client('ec2')
self.sns = boto3.client('sns')
self.ssm = boto3.client('ssm')
def compromised_instance_response(self, instance_id):
"""
Comprehensive response to compromised instance
"""
print(f"Executing playbook for compromised instance: {instance_id}")
# Step 1: Tag instance
self.ec2.create_tags(
Resources=[instance_id],
Tags=[
{'Key': 'SecurityStatus', 'Value': 'Compromised'},
{'Key': 'IncidentID', 'Value': f'INC-{datetime.now().strftime("%Y%m%d-%H%M%S")}'}
]
)
# Step 2: Isolate instance
isolation_sg = self.isolate_instance(instance_id)
# Step 3: Capture memory dump (if forensics required)
self.capture_memory_dump(instance_id)
# Step 4: Create disk snapshots
snapshots = self.create_forensic_snapshots(instance_id)
# Step 5: Collect logs
self.collect_logs(instance_id)
# Step 6: Notify security team
self.notify_security_team(instance_id, snapshots)
# Step 7: Create investigation case
case_id = self.create_investigation_case(instance_id)
return {
'instance_id': instance_id,
'isolation_sg': isolation_sg,
'snapshots': snapshots,
'case_id': case_id
}
def isolate_instance(self, instance_id):
"""Isolate instance from network"""
# Create/use isolation security group
try:
response = self.ec2.describe_security_groups(
Filters=[{'Name': 'group-name', 'Values': ['forensics-isolation']}]
)
if response['SecurityGroups']:
isolation_sg = response['SecurityGroups'][0]['GroupId']
else:
# Create isolation SG
vpc = self.ec2.describe_instances(InstanceIds=[instance_id])
vpc_id = vpc['Reservations'][0]['Instances'][0]['VpcId']
response = self.ec2.create_security_group(
GroupName='forensics-isolation',
Description='Isolation SG for forensic analysis',
VpcId=vpc_id
)
isolation_sg = response['GroupId']
# Add only SSH from security team subnet
self.ec2.authorize_security_group_ingress(
GroupId=isolation_sg,
IpPermissions=[{
'IpProtocol': 'tcp',
'FromPort': 22,
'ToPort': 22,
'IpRanges': [{'CidrIp': '10.0.100.0/24', 'Description': 'Security team access'}]
}]
)
# Apply isolation SG
self.ec2.modify_instance_attribute(
InstanceId=instance_id,
Groups=[isolation_sg]
)
print(f"Instance {instance_id} isolated with SG {isolation_sg}")
return isolation_sg
except Exception as e:
print(f"Error isolating instance: {e}")
raise
def create_forensic_snapshots(self, instance_id):
"""Create snapshots of all volumes"""
volumes = self.ec2.describe_instance_attribute(
InstanceId=instance_id,
Attribute='blockDeviceMapping'
)
snapshots = []
for device in volumes['BlockDeviceMappings']:
volume_id = device['Ebs']['VolumeId']
snapshot = self.ec2.create_snapshot(
VolumeId=volume_id,
Description=f'Forensic snapshot for incident {instance_id}',
TagSpecifications=[{
'ResourceType': 'snapshot',
'Tags': [
{'Key': 'Purpose', 'Value': 'Forensics'},
{'Key': 'InstanceId', 'Value': instance_id},
{'Key': 'Timestamp', 'Value': datetime.now().isoformat()}
]
}]
)
snapshots.append(snapshot['SnapshotId'])
print(f"Created snapshot {snapshot['SnapshotId']} for volume {volume_id}")
return snapshots
def notify_security_team(self, instance_id, snapshots):
"""Send detailed notification"""
message = f"""
SECURITY INCIDENT: Compromised Instance Detected and Isolated
Instance ID: {instance_id}
Timestamp: {datetime.now().isoformat()}
Actions Taken:
- Instance isolated from network
- Forensic snapshots created: {', '.join(snapshots)}
- Investigation case created
- Memory dump captured (if enabled)
- Logs collected
Next Steps:
1. Review GuardDuty/Security Hub findings
2. Analyze Detective investigation graph
3. Examine forensic snapshots
4. Determine root cause and scope
5. Decide on termination vs remediation
Investigation Dashboard: https://console.aws.amazon.com/detective/
"""
self.sns.publish(
TopicArn='arn:aws:sns:region:account:security-incidents',
Subject=f'INCIDENT: Compromised Instance {instance_id}',
Message=message
)
# Usage
playbook = SecurityPlaybook()
result = playbook.compromised_instance_response('i-1234567890abcdef0')CIS Benchmark Automated Remediation:
# Auto-remediate CIS AWS Foundations Benchmark findings
class CISRemediation:
"""Automated CIS benchmark remediation"""
def __init__(self):
self.iam = boto3.client('iam')
self.s3 = boto3.client('s3')
self.ec2 = boto3.client('ec2')
self.cloudtrail = boto3.client('cloudtrail')
def remediate_finding(self, finding):
"""Route finding to appropriate remediation"""
control_id = finding['ProductFields'].get('ControlId', '')
remediation_map = {
'CIS.1.4': self.ensure_mfa_root_account,
'CIS.2.1': self.ensure_cloudtrail_enabled,
'CIS.2.3': self.enable_s3_bucket_logging,
'CIS.4.1': self.disable_unrestricted_ssh,
'CIS.4.2': self.disable_unrestricted_rdp
}
remediation_func = remediation_map.get(control_id)
if remediation_func:
try:
remediation_func(finding)
print(f"Remediated {control_id}")
except Exception as e:
print(f"Failed to remediate {control_id}: {e}")
else:
print(f"No automated remediation for {control_id}")
def ensure_cloudtrail_enabled(self, finding):
"""CIS 2.1: Ensure CloudTrail enabled in all regions"""
# Check if trail exists
trails = self.cloudtrail.describe_trails()
multi_region_trail = any(
trail.get('IsMultiRegionTrail', False)
for trail in trails['trailList']
)
if not multi_region_trail:
# Create multi-region trail
self.cloudtrail.create_trail(
Name='organization-trail',
S3BucketName='org-cloudtrail-logs-bucket',
IsMultiRegionTrail=True,
EnableLogFileValidation=True,
IncludeGlobalServiceEvents=True
)
# Start logging
self.cloudtrail.start_logging(Name='organization-trail')
print("Created multi-region CloudTrail")
def enable_s3_bucket_logging(self, finding):
"""CIS 2.3: Enable S3 bucket logging"""
bucket_name = finding['Resources'][0]['Id'].split(':')[-1]
# Enable logging
self.s3.put_bucket_logging(
Bucket=bucket_name,
BucketLoggingStatus={
'LoggingEnabled': {
'TargetBucket': 'central-s3-logs-bucket',
'TargetPrefix': f'{bucket_name}/'
}
}
)
print(f"Enabled logging for bucket {bucket_name}")
def disable_unrestricted_ssh(self, finding):
"""CIS 4.1: No security group allows 0.0.0.0/0 ingress on port 22"""
sg_id = finding['Resources'][0]['Id'].split('/')[-1]
# Remove unrestricted SSH rule
try:
self.ec2.revoke_security_group_ingress(
GroupId=sg_id,
IpPermissions=[{
'IpProtocol': 'tcp',
'FromPort': 22,
'ToPort': 22,
'IpRanges': [{'CidrIp': '0.0.0.0/0'}]
}]
)
print(f"Removed unrestricted SSH from {sg_id}")
except Exception as e:
print(f"Could not remove SSH rule: {e}")
# EventBridge integration
def lambda_handler(event, context):
"""Auto-remediate CIS findings from Security Hub"""
finding = event['detail']['findings'][0]
control_id = finding.get('ProductFields', {}).get('ControlId', '')
if control_id.startswith('CIS'):
remediation = CISRemediation()
remediation.remediate_finding(finding)
return {'statusCode': 200}Cost Optimization:
Security Services Cost Management:
GuardDuty Optimization:
- Enable only in regions with resources
- Use organization-level pricing (volume discount)
- Archive old findings to reduce storage
Security Hub Optimization:
- Consolidate to security account
- Use selective standards (not all simultaneously)
- Suppress low-priority findings
Inspector Optimization:
- Scan only internet-facing instances initially
- Schedule scans during off-hours
- Use Systems Manager Inventory for non-critical
Macie Optimization:
- Use automated discovery sampling (vs full scans)
- Focus on high-risk buckets
- Schedule discovery jobs weekly vs daily
Detective Optimization:
- 30-day trial evaluation period
- Enable only when investigation needed
- Consider cost vs breach cost ($4.45M average)
Typical Monthly Costs:
Small Organization (100 resources):
- GuardDuty: $50
- Security Hub: $20
- Inspector: $30
- Total: $100/month
Large Organization (5,000 resources):
- GuardDuty: $500
- Security Hub: $100
- Inspector: $1,500
- Macie: $300
- Detective: $500
- Total: $2,900/month
ROI Calculation:
Cost: $2,900/month = $34,800/year
Average breach cost: $4.45 million
Prevented breaches per year: < 1% still positive ROI
Tip 1: Tune GuardDuty with Trusted IPs and Threat Lists Reduce false positives by whitelisting known IPs (corporate VPNs, partners) and adding custom threat lists.
# Add trusted IP list
guardduty.create_ip_set(
DetectorId=detector_id,
Name='trusted-ips',
Format='TXT',
Location='s3://security-config/trusted-ips.txt',
Activate=True
)
# trusted-ips.txt content:
# 203.0.113.0/24 # Corporate VPN
# 198.51.100.0/24 # Partner networkTip 2: Prioritize Critical Findings Configure Security Hub insights to surface critical findings first—don't drown in low-severity alerts.
Tip 3: Enable Cross-Region Aggregation Use Security Hub aggregation region to consolidate findings from all regions in single view.
# Enable aggregation in us-east-1
securityhub.create_finding_aggregator(
RegionLinkingMode='ALL_REGIONS'
)Tip 4: Implement Finding Suppression Rules Suppress known false positives to reduce noise and focus on real threats.
# Suppress development environment findings
securityhub.create_insight(
Name='Exclude Dev Findings',
Filters={
'ResourceTags': [{'Key': 'Environment', 'Value': 'Development'}]
},
GroupByAttribute='ResourceId'
)
# Update workflow status to suppressed
securityhub.batch_update_findings(
FindingIdentifiers=finding_ids,
Workflow={'Status': 'SUPPRESSED'},
Note={'Text': 'Development environment - expected behavior'}
)Tip 5: Schedule Inspector Scans Strategically Run vulnerability scans during maintenance windows to reduce impact on production workloads.
Tip 6: Use Step Functions for Complex Workflows Orchestrate multi-step incident response with AWS Step Functions for reliability and visibility.
Tip 7: Implement Gradual Rollout for Auto-Remediation Start with notifications only, then enable auto-remediation for low-risk findings, gradually expanding scope.
Phase 1: Monitor Only (Week 1-2)
- GuardDuty findings → SNS notifications
- Review findings manually
- Understand baseline
Phase 2: Auto-Remediate Low-Risk (Week 3-4)
- Public S3 buckets → Auto-block
- Unrestricted security groups → Auto-remove
- Missing encryption → Auto-enable
Phase 3: Auto-Remediate Medium-Risk (Month 2)
- Outdated packages → Auto-patch
- Configuration drift → Auto-correct
Phase 4: Full Automation (Month 3+)
- Compromised instances → Auto-isolate
- Suspicious activity → Auto-investigate
Tip 8: Tag Resources for Automated Response
Use tags to control automated remediation behavior (e.g., AutoRemediate=true, CriticalityLevel=high).
Tip 9: Integrate with ChatOps Send security alerts to Slack/Teams for faster team response and collaboration.
# Send to Slack
import json
import urllib.request
def send_to_slack(finding):
webhook_url = 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
message = {
'text': f"🚨 Security Alert",
'attachments': [{
'color': 'danger',
'title': finding['Title'],
'fields': [
{'title': 'Severity', 'value': finding['Severity']['Label'], 'short': True},
{'title': 'Resource', 'value': finding['Resources'][0]['Id'], 'short': False}
]
}]
}
req = urllib.request.Request(
webhook_url,
data=json.dumps(message).encode('utf-8'),
headers={'Content-Type': 'application/json'}
)
urllib.request.urlopen(req)Tip 10: Maintain Runbooks for Manual Incidents Not everything can be automated—maintain documented procedures for complex incidents requiring human judgment.
Problem: Security teams overwhelmed by thousands of low-priority findings, miss critical threats.
Why It Happens:
- All security services enabled without tuning
- No finding suppression or prioritization
- Development and production treated equally
- No baseline of normal behavior
Impact:
- Critical findings buried in noise
- Team stops reviewing alerts
- Real threats missed
- Increased response time
- Burnout and turnover
Example:
Security Hub Dashboard:
- 5,234 findings total
- 4,800 LOW severity (development resources)
- 350 MEDIUM severity (minor misconfigurations)
- 80 HIGH severity (requires attention)
- 4 CRITICAL severity (LOST in the noise!)
Result: Critical findings not addressed for days
Remedy:
Step 1: Implement Finding Hierarchy
# Prioritize findings by severity AND resource criticality
def calculate_priority_score(finding):
"""Calculate priority score (0-100)"""
severity_scores = {
'CRITICAL': 40,
'HIGH': 30,
'MEDIUM': 20,
'LOW': 10
}
# Base score from severity
score = severity_scores.get(finding['Severity']['Label'], 0)
# Increase score for production resources
tags = finding['Resources'][0].get('Tags', {})
if tags.get('Environment') == 'Production':
score += 30
# Increase score for internet-facing resources
if 'public' in finding['Title'].lower():
score += 20
# Decrease score for development
if tags.get('Environment') == 'Development':
score -= 20
return min(100, max(0, score))
# Filter findings
high_priority = [f for f in findings if calculate_priority_score(f) >= 70]
print(f"High priority findings: {len(high_priority)}")
# Result: 15 findings (manageable)Step 2: Suppress Development Environment Findings
# Auto-suppress development findings
def suppress_dev_findings():
"""Suppress findings from development resources"""
response = securityhub.get_findings(
Filters={
'ResourceTags': [{
'Key': 'Environment',
'Value': 'Development',
'Comparison': 'EQUALS'
}],
'WorkflowStatus': [{
'Value': 'NEW',
'Comparison': 'EQUALS'
}]
}
)
finding_ids = [
{'Id': f['Id'], 'ProductArn': f['ProductArn']}
for f in response['Findings']
]
if finding_ids:
securityhub.batch_update_findings(
FindingIdentifiers=finding_ids,
Workflow={'Status': 'SUPPRESSED'},
Note={'Text': 'Auto-suppressed: Development environment'}
)
print(f"Suppressed {len(finding_ids)} development findings")
# Run daily
suppress_dev_findings()Step 3: Create Focused Insights
# Create insight for actionable findings only
securityhub.create_insight(
Name='Actionable Production Findings',
Filters={
'SeverityLabel': [
{'Value': 'CRITICAL', 'Comparison': 'EQUALS'},
{'Value': 'HIGH', 'Comparison': 'EQUALS'}
],
'WorkflowStatus': [
{'Value': 'NEW', 'Comparison': 'EQUALS'}
],
'ResourceTags': [
{'Key': 'Environment', 'Value': 'Production', 'Comparison': 'EQUALS'}
]
},
GroupByAttribute='ResourceId'
)
# Result: Dashboard shows only critical production findingsPrevention:
- Implement finding prioritization from day one
- Separate development and production alerts
- Regularly review and tune suppression rules
- Use insights to create focused views
- Monitor alert volume trends
- Set team SLAs (critical within 1 hour, high within 24 hours)
Problem: Legitimate activities flagged as threats, wasting investigation time.
Why It Happens:
- GuardDuty doesn't know your normal behavior
- No trusted IP lists configured
- Scheduled jobs flagged as anomalies
- Development testing triggers alerts
Impact:
- Investigation time wasted
- Real threats overlooked
- Team loses trust in alerts
- Delays in incident response
Example:
GuardDuty Finding: UnauthorizedAccess:IAMUser/TorIPCaller
Description: API call from Tor exit node
User: security-scanner-bot
Investigation: This is our authorized penetration testing tool
Result: 30 minutes wasted investigating false positive
Frequency: Daily (automated security scans)
Annual waste: 180 hours (4.5 weeks)
Remedy:
Step 1: Build Baseline During Initial Deployment
# Learning period: First 2 weeks
# Monitor findings without automation
# Document legitimate activities
baseline_activities = {
'security_scanning': {
'users': ['security-scanner-bot'],
'expected_findings': ['UnauthorizedAccess:IAMUser/TorIPCaller'],
'schedule': 'Daily 02:00 UTC',
'duration': '30 minutes'
},
'data_analytics': {
'instances': ['i-analytics-123'],
'expected_findings': ['Recon:EC2/PortProbeUnprotectedPort'],
'reason': 'Legitimate port scanning for network discovery'
}
}
# Document and whitelistStep 2: Configure Trusted IPs and Threat Lists
# Add trusted IP sets
trusted_ips = """
203.0.113.0/24 # Corporate VPN
198.51.100.0/24 # Partner network
192.0.2.0/24 # Security scanning tools
"""
# Upload to S3
s3.put_object(
Bucket='security-config',
Key='trusted-ips.txt',
Body=trusted_ips.encode('utf-8')
)
# Create IP set in GuardDuty
guardduty.create_ip_set(
DetectorId=detector_id,
Name='trusted-corporate-ips',
Format='TXT',
Location='s3://security-config/trusted-ips.txt',
Activate=True
)
# GuardDuty won't alert on traffic from these IPsStep 3: Implement Smart Filtering
def is_false_positive(finding):
"""Check if finding is known false positive"""
finding_type = finding['Type']
resource = finding['Resources'][0]['Id']
# Check against known patterns
false_positive_patterns = [
{
'type': 'UnauthorizedAccess:IAMUser/TorIPCaller',
'user': 'security-scanner-bot',
'reason': 'Authorized security scanning'
},
{
'type': 'Recon:EC2/PortProbeUnprotectedPort',
'tag': 'Purpose=NetworkDiscovery',
'reason': 'Analytics workload'
}
]
for pattern in false_positive_patterns:
if pattern['type'] == finding_type:
# Check additional criteria
if 'user' in pattern:
if pattern['user'] in finding['Title']:
return True, pattern['reason']
return False, None
# Filter findings
for finding in findings:
is_fp, reason = is_false_positive(finding)
if is_fp:
# Suppress finding
securityhub.batch_update_findings(
FindingIdentifiers=[
{'Id': finding['Id'], 'ProductArn': finding['ProductArn']}
],
Workflow={'Status': 'SUPPRESSED'},
Note={'Text': f'Known false positive: {reason}'}
)Prevention:
- Build baseline before enabling automation
- Document legitimate activities
- Configure trusted IPs and threat lists
- Regularly review suppressed findings
- Update baseline as environment changes
- Communicate with teams about expected activities
Problem: Security services can't scan resources without required agents, leaving blind spots.
Why It Happens:
- SSM Agent not installed on EC2 instances
- GuardDuty not enabled in all regions
- Inspector can't scan instances without agent
- CloudTrail not logging all API calls
Impact:
- Vulnerabilities undetected
- Compromises not identified
- Incomplete security posture
- Compliance failures
- False sense of security
Example:
Security Hub Score: 95% (excellent!)
Reality: Only 30% of instances have SSM Agent
Inspector scanning 30% of resources
70% of instances never scanned for vulnerabilities
Actual risk: HIGH (unmonitored resources)
Remedy:
Step 1: Audit Coverage
def audit_security_coverage():
"""Assess security service coverage"""
ec2 = boto3.client('ec2')
ssm = boto3.client('ssm')
# Get all instances
instances = ec2.describe_instances()
total_instances = 0
instance_ids = []
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
if instance['State']['Name'] == 'running':
total_instances += 1
instance_ids.append(instance['InstanceId'])
# Check SSM Agent coverage
managed_instances = ssm.describe_instance_information()
ssm_instance_ids = [
i['InstanceId'] for i in managed_instances['InstanceInformationList']
]
covered = len(ssm_instance_ids)
coverage_pct = (covered / total_instances * 100) if total_instances > 0 else 0
print(f"Total instances: {total_instances}")
print(f"SSM managed: {covered}")
print(f"Coverage: {coverage_pct:.1f}%")
# Identify uncovered instances
uncovered = set(instance_ids) - set(ssm_instance_ids)
if uncovered:
print(f"\nUncovered instances: {len(uncovered)}")
for instance_id in list(uncovered)[:10]:
print(f" - {instance_id}")
return {
'total': total_instances,
'covered': covered,
'coverage_pct': coverage_pct,
'uncovered': list(uncovered)
}
coverage = audit_security_coverage()Step 2: Automated Agent Installation
# Use Systems Manager State Manager to ensure agent installed
ssm.create_association(
Name='AWS-ConfigureAWSPackage',
Parameters={
'action': ['Install'],
'name': ['AmazonSSMAgent']
},
Targets=[
{
'Key': 'InstanceIds',
'Values': ['*'] # All instances
}
],
ScheduleExpression='rate(30 minutes)' # Check every 30 min
)
# Alternative: User data script for new instances
user_data = """#!/bin/bash
# Install SSM Agent on launch
yum install -y amazon-ssm-agent
systemctl enable amazon-ssm-agent
systemctl start amazon-ssm-agent
"""Step 3: Enable Multi-Region Services
def enable_guardduty_all_regions():
"""Enable GuardDuty in all active regions"""
ec2 = boto3.client('ec2')
# Get all regions
regions = ec2.describe_regions()['Regions']
for region in regions:
region_name = region['RegionName']
try:
# Create regional GuardDuty client
regional_guardduty = boto3.client('guardduty', region_name=region_name)
# Check if enabled
detectors = regional_guardduty.list_detectors()
if not detectors['DetectorIds']:
# Enable GuardDuty
regional_guardduty.create_detector(Enable=True)
print(f"Enabled GuardDuty in {region_name}")
else:
print(f"GuardDuty already enabled in {region_name}")
except Exception as e:
print(f"Error in {region_name}: {e}")
enable_guardduty_all_regions()Step 4: Implement Coverage Monitoring
# CloudWatch metric for coverage
cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_data(
Namespace='SecurityCoverage',
MetricData=[
{
'MetricName': 'SSMAgentCoverage',
'Value': coverage['coverage_pct'],
'Unit': 'Percent'
}
]
)
# Alarm if coverage drops below 90%
cloudwatch.put_metric_alarm(
AlarmName='LowSecurityCoverage',
MetricName='SSMAgentCoverage',
Namespace='SecurityCoverage',
Statistic='Average',
Period=3600,
EvaluationPeriods=1,
Threshold=90,
ComparisonOperator='LessThanThreshold',
AlarmActions=['arn:aws:sns:region:account:security-alerts']
)Prevention:
- Audit coverage before claiming compliance
- Automate agent installation
- Enable services in all active regions
- Monitor coverage continuously
- Include agent installation in AMI builds
- Set organizational policies requiring agents
Problem: Automated remediation unexpectedly expensive or disruptive.
Why It Happens:
- Auto-terminating instances without approval
- Creating snapshots without lifecycle policies
- Enabling expensive services automatically
- No cost guardrails on remediation
Impact:
- Unexpected AWS bills (thousands/month)
- Service disruptions
- Data loss from aggressive remediation
- Team resistance to automation
Example:
Automated Remediation: Create forensic snapshots
Finding: 50 compromised instances detected (false positive)
Action: Created 200 snapshots (4 volumes each × 50 instances)
Snapshot size: 100 GB each
Cost: 200 × 100 GB × $0.05/GB-month = $1,000/month
Duration: Snapshots retained indefinitely
Annual cost: $12,000 (unplanned)
Remedy:
Step 1: Implement Cost Guardrails
# Check costs before remediation
def can_afford_remediation(action_type, resource_count):
"""Check if remediation within budget"""
cost_limits = {
'snapshot': 100, # Max $100/month
'terminate': 10, # Max 10 instances/day
'isolate': 50 # Max 50 instances/day
}
estimated_costs = {
'snapshot': resource_count * 100 * 0.05, # 100GB @ $0.05/GB
'terminate': 0, # Reduces cost
'isolate': 0 # No direct cost
}
estimated_cost = estimated_costs.get(action_type, 0)
limit = cost_limits.get(action_type, 1000)
if estimated_cost > limit:
print(f"Cost ${estimated_cost} exceeds limit ${limit}")
return False
return True
# Before remediation
if can_afford_remediation('snapshot', 50):
create_snapshots()
else:
notify_team_manual_action_required()Step 2: Implement Approval Workflow for High-Impact Actions
# Require approval for expensive remediation
def require_approval(finding, action):
"""Send approval request for high-impact actions"""
high_impact_actions = ['terminate_instance', 'create_snapshot']
if action in high_impact_actions:
# Create approval task
message = f"""
Remediation Approval Required
Finding: {finding['Title']}
Resource: {finding['Resources'][0]['Id']}
Action: {action}
Estimated Cost: $100
Approve: https://console.aws.amazon.com/approve/{finding['Id']}
Deny: https://console.aws.amazon.com/deny/{finding['Id']}
"""
sns.publish(
TopicArn='arn:aws:sns:region:account:security-approvals',
Subject='Remediation Approval Required',
Message=message
)
# Wait for approval (Step Functions)
# Don't remediate until approved
return False
return True # Auto-approve low-impact actionsStep 3: Implement Snapshot Lifecycle
# Auto-delete forensic snapshots after 30 days
def create_snapshot_with_lifecycle(volume_id, instance_id):
"""Create snapshot with automatic deletion"""
# Create snapshot
snapshot = ec2.create_snapshot(
VolumeId=volume_id,
Description=f'Forensic snapshot for {instance_id}',
TagSpecifications=[{
'ResourceType': 'snapshot',
'Tags': [
{'Key': 'Purpose', 'Value': 'Forensics'},
{'Key': 'DeleteAfter', 'Value': (datetime.now() + timedelta(days=30)).isoformat()}
]
}]
)
snapshot_id = snapshot['SnapshotId']
# Schedule deletion using Lambda + EventBridge
# (Simplified - use DLM in production)
return snapshot_id
# Daily cleanup job
def cleanup_old_forensic_snapshots():
"""Delete snapshots past retention period"""
snapshots = ec2.describe_snapshots(
Filters=[
{'Name': 'tag:Purpose', 'Values': ['Forensics']}
]
)
for snapshot in snapshots['Snapshots']:
tags = {tag['Key']: tag['Value'] for tag in snapshot.get('Tags', [])}
delete_after = tags.get('DeleteAfter')
if delete_after and datetime.fromisoformat(delete_after) < datetime.now():
ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId'])
print(f"Deleted expired snapshot {snapshot['SnapshotId']}")Prevention:
- Estimate costs before automation
- Implement approval workflows
- Set budget alerts
- Use lifecycle policies
- Test automation in non-production first
- Monitor remediation costs monthly
AWS Security Services provide intelligent, automated threat detection and continuous compliance monitoring that scales with cloud infrastructure. GuardDuty analyzes billions of events to detect threats, Security Hub aggregates findings into unified dashboards, Inspector scans for vulnerabilities, Macie discovers sensitive data, and Detective investigates root causes. Together, these services transform security from reactive to proactive, detecting threats in minutes instead of months.
Key Takeaways:
- Enable GuardDuty in All Accounts: Real-time threat detection using machine learning; analyzes VPC Flow Logs, CloudTrail, DNS logs for threats like crypto-mining, compromised instances, reconnaissance
- Aggregate with Security Hub: Centralized security posture across accounts; compliance framework checks (CIS, PCI DSS); normalize findings from multiple sources for unified view
- Automate Incident Response: EventBridge + Lambda for automated remediation; isolate compromised instances automatically; reduce response time from hours to seconds
- Scan Continuously with Inspector: Detect vulnerabilities (CVEs) in EC2, containers, Lambda; network exposure analysis; prioritize critical findings requiring immediate patches
- Discover Sensitive Data with Macie: Machine learning identifies PII, financial data, credentials in S3; prevent data exposure; compliance with GDPR, HIPAA, PCI DSS
- Investigate with Detective: Visual analysis of security incidents; correlate events across services; determine root cause and attack timeline; reduce investigation time 90%
- Tune for Your Environment: Configure trusted IPs; suppress development findings; implement finding prioritization; reduce alert fatigue; focus on actionable threats
AWS Security Services integrate with existing architecture (VPC, IAM, CloudTrail) while adding threat intelligence and automation. Next chapter covers AWS compliance and governance services—Config for configuration compliance, CloudTrail for audit logging, and Organizations for multi-account management.
Objective: Build complete security monitoring and automated response system.
Scenario: Detect and respond to compromised EC2 instance automatically.
Prerequisites:
- AWS account with admin access
- Running EC2 instance for testing
- SNS email subscription configured
Steps:
- Enable Security Services (30 minutes)
- Enable GuardDuty in all regions
- Enable Security Hub with CIS Benchmark
- Enable Inspector for EC2 scanning
- Enable Macie for S3 bucket analysis
- Verify all services active
- Configure Automated Response (45 minutes)
- Create EventBridge rule for high-severity GuardDuty findings
- Deploy Lambda function for instance isolation
- Configure SNS notifications
- Test with sample GuardDuty finding
- Verify automation works
- Implement Compliance Monitoring (30 minutes)
- Review Security Hub compliance dashboard
- Identify failed CIS checks
- Create remediation Lambda for S3 public buckets
- Enable automated remediation
- Verify compliance score improves
- Vulnerability Scanning (20 minutes)
- Ensure SSM Agent on test instance
- Run Inspector assessment
- Review vulnerability findings
- Create patching automation with Systems Manager
- Verify vulnerabilities remediated
- Security Dashboard (15 minutes)
- Create Security Hub custom insight
- Configure CloudWatch dashboard with metrics
- Set up SNS alerts for critical findings
- Document SOC procedures
Expected Outcomes:
- Complete security monitoring active
- Compromised instances isolated automatically within 60 seconds
- Compliance score >85%
- All critical vulnerabilities identified
- Automated remediation reducing manual work 90%
- Total setup cost: <$50/month
- What data sources does GuardDuty analyze? a) CloudTrail only b) VPC Flow Logs only c) VPC Flow Logs, CloudTrail, DNS logs, EKS audit logs ✓ d) S3 access logs
Answer: C - GuardDuty analyzes VPC Flow Logs, CloudTrail events, DNS logs, and EKS audit logs for comprehensive threat detection
- What is the purpose of AWS Security Hub? a) Detect threats b) Aggregate findings from multiple services ✓ c) Scan for vulnerabilities d) Discover sensitive data
Answer: B - Security Hub aggregates and normalizes findings from GuardDuty, Inspector, Macie, and other services
- Which severity requires immediate response? a) LOW b) MEDIUM c) HIGH d) CRITICAL ✓
Answer: D - CRITICAL findings indicate confirmed compromise or severe threat requiring emergency response
- What does Amazon Inspector scan for? a) Network threats b) Package vulnerabilities (CVEs) and network exposure ✓ c) Sensitive data d) API misuse
Answer: B - Inspector scans for CVE vulnerabilities in packages and analyzes network reachability
- What is required for Inspector to scan EC2 instances? a) GuardDuty enabled b) SSM Agent installed ✓ c) Public IP address d) CloudWatch agent
Answer: B - SSM Agent required for Inspector to scan EC2 instances for vulnerabilities
- What does Amazon Macie discover? a) Network threats b) Vulnerabilities c) Sensitive data (PII, financial info, credentials) ✓ d) Configuration issues
Answer: C - Macie uses machine learning to discover and protect sensitive data like PII, financial information, credentials
- What is AWS Detective used for? a) Threat detection b) Root cause analysis and investigation ✓ c) Vulnerability scanning d) Compliance checks
Answer: B - Detective provides visual analysis for investigating security incidents and determining root causes
- What is the AWS Security Finding Format (ASFF)? a) Encryption algorithm b) Standardized finding format across services ✓ c) Compliance framework d) IAM policy format
Answer: B - ASFF is standardized JSON format for security findings, enabling consistent integration and automation
- What is a recommended response to compromised instance finding? a) Delete immediately b) Ignore if development c) Isolate, snapshot, investigate ✓ d) Reboot instance
Answer: C - Best practice: isolate instance, create forensic snapshots, investigate root cause before termination
- How can you reduce GuardDuty false positives? a) Disable GuardDuty b) Configure trusted IP lists ✓ c) Ignore all findings d) Only enable in production
Answer: B - Configure trusted IP lists for known legitimate sources (corporate VPNs, partners) to reduce false positives
- What is Security Hub's security score based on? a) Number of resources b) Passed checks / Total checks ✓ c) Number of findings d) AWS spending
Answer: B - Security score calculated as: (Passed compliance checks / Total checks) × 100
- What is the purpose of finding suppression? a) Delete findings permanently b) Reduce noise from known acceptable issues ✓ c) Hide security problems d) Save costs
Answer: B - Suppression reduces alert noise by marking known acceptable issues (e.g., development environments, expected behavior)
- What does EventBridge enable for security services? a) Encryption b) Automated response to findings ✓ c) Cost reduction d) Compliance reporting
Answer: B - EventBridge routes security findings to Lambda for automated remediation and response
- What is the recommended approach for automated remediation? a) Enable all automation immediately b) Start with monitoring, gradually enable automation ✓ c) Only automate in development d) Never automate security
Answer: B - Best practice: start with monitoring only, understand baseline, gradually enable automation for low-risk then high-risk actions
- What is the typical cost range for GuardDuty in medium environment? a) $10-50/month b) $100-200/month ✓ c) $1,000-5,000/month d) Free
Answer: B - Medium environment (500 resources) typically costs $100-200/month for GuardDuty