Complete guide for operators, developers, and consumers of Version Guard.
- For Service Owners
- For Platform Operators
- API Reference
- Service Attribution
- Configuration
- Troubleshooting
- Operational Runbooks
# Get fleet-wide summary
./bin/version-guard-cli fleet summary
# With filters
./bin/version-guard-cli fleet summary --cloud-provider=aws --resource-type=aurora
# List all findings
./bin/version-guard-cli finding list
# Filter by status
./bin/version-guard-cli finding list --status=red
# Filter by service
./bin/version-guard-cli finding list --service=my-service
# Export findings to CSV
./bin/version-guard-cli finding export --output=findings.csvEach finding includes:
- Resource: Which database/cache/cluster
- Current Version: What's running now
- Status: RED/YELLOW/GREEN
- Message: What's wrong
- Recommendation: What to do
- EOL Date: When version support ends
Examples:
# Aurora
Resource: arn:aws:rds:us-east-1:123456:cluster:my-db
Current Version: aurora-mysql 5.6.10a
Status: RED
Message: Version is past End-of-Life (EOL since Nov 2024)
Recommendation: Upgrade to aurora-mysql 8.0.35 immediately
EOL Date: 2024-11-01
# EKS
Resource: arn:aws:eks:us-west-2:123456:cluster/my-cluster
Current Version: k8s-1.27
Status: YELLOW
Message: Version in extended support (6x cost), ends 2025-11-24
Recommendation: Upgrade to k8s-1.31 to exit extended support
EOL Date: 2025-11-24
Action: Upgrade immediately
-
Aurora MySQL 5.6 → 8.0:
# Via AWS Console or CLI aws rds modify-db-cluster \ --db-cluster-identifier my-db \ --engine-version 8.0.35 \ --apply-immediately -
ElastiCache Redis 4.x → 7.x:
aws elasticache modify-replication-group \ --replication-group-id my-cache \ --engine-version 7.0 \ --apply-immediately
-
EKS Kubernetes 1.27 → 1.31:
aws eks update-cluster-version \ --name my-cluster \ --kubernetes-version 1.31
Action: Schedule upgrade within 90 days
- Review upgrade path
- Test in staging environment
- Schedule maintenance window
- Upgrade during low-traffic period
-
Wiz Access (optional - can use mock inventory):
- Client ID and Client Secret credentials
- Access to saved reports for Aurora, ElastiCache, EKS
-
Temporal Cluster:
- Temporal server running locally or remote
- Namespace configured for Version Guard
-
AWS Credentials (for EOL API access):
- IAM permissions for RDS and EKS describe operations
# Build the server
make build
# Set environment variables
export TEMPORAL_ENDPOINT=localhost:7233
export TEMPORAL_NAMESPACE=version-guard-dev
export AWS_REGION=us-west-2
export GRPC_PORT=8080
# Optional: Configure Wiz (otherwise uses mock data)
export WIZ_CLIENT_ID_SECRET=your-client-id
export WIZ_CLIENT_SECRET_SECRET=your-client-secret
export WIZ_AURORA_REPORT_ID=your-report-id
export WIZ_ELASTICACHE_REPORT_ID=your-report-id
export WIZ_EKS_REPORT_ID=your-report-id
# Optional: Configure S3 snapshots
export S3_BUCKET=version-guard-snapshots
export S3_PREFIX=snapshots/
# Run the server
./bin/version-guard# Build Docker image
make docker-build
# Run container
docker run -p 8080:8080 \
-e TEMPORAL_ENDPOINT=host.docker.internal:7233 \
-e TEMPORAL_NAMESPACE=version-guard-dev \
-e AWS_REGION=us-west-2 \
version-guard:latestThe orchestrator workflow automatically triggers detection workflows for all resource types (Aurora, ElastiCache, EKS) on a schedule.
Manual Trigger via Temporal UI:
- Navigate to your Temporal UI (e.g., http://localhost:8233)
- Start workflow:
VersionGuardOrchestratorWorkflow - Input:
{} - Monitor workflow execution
Monitor Progress:
# Check temporal workflows
temporal workflow list --namespace version-guard-dev
# View workflow details
temporal workflow describe --workflow-id <workflow-id> --namespace version-guard-devVersion Guard emits the following metrics (if Datadog enabled):
version_guard.findings.red- Critical issues countversion_guard.findings.yellow- Warning issues countversion_guard.findings.total- Total resources scannedversion_guard.compliance_percentage- Fleet compliance %version_guard.detection.duration_ms- Scan durationversion_guard.inventory.fetch- Inventory fetch success rate
# If running directly
# Logs output to stdout/stderr
# If running in Docker
docker logs <container-id> -f
# If running in Kubernetes
kubectl logs -n version-guard deployment/version-guard -fTemporal Workers: Increase the number of Temporal worker replicas for higher throughput.
Detection Frequency: Configure the orchestrator workflow schedule via Temporal schedules or cron triggers.
Default Endpoint: localhost:8080
Get compliance score for a specific service.
Request:
message GetServiceScoreRequest {
string service = 1; // Required: service name
ResourceType resource_type = 2; // Optional: filter by resource type
CloudProvider cloud_provider = 3; // Optional: filter by cloud
}Response:
message GetServiceScoreResponse {
string service = 1;
ComplianceGrade grade = 2; // BRONZE/SILVER/GOLD
int32 total_resources = 3;
int32 red_count = 4;
int32 yellow_count = 5;
int32 green_count = 6;
float compliance_percentage = 7;
}Example (grpcurl):
grpcurl \
-plaintext \
-d '{"service": "payments"}' \
localhost:8080 \
block.versionguard.service.VersionGuard/GetServiceScoreList all findings with filters.
Request:
message ListFindingsRequest {
CloudProvider cloud_provider = 1; // Optional: filter by cloud (AWS/GCP/AZURE)
ResourceType resource_type = 2; // Optional: filter by type (AURORA/ELASTICACHE/etc)
string service = 3; // Optional: filter by service name
Status status = 4; // Optional: filter by status (RED/YELLOW/GREEN)
string brand = 5; // Optional: filter by brand
string cloud_account_id = 6; // Optional: filter by AWS account/GCP project
string cloud_region = 7; // Optional: filter by region (us-east-1/us-west-2)
int32 limit = 8; // Optional: max results to return
}Response:
message ListFindingsResponse {
repeated Finding findings = 1;
int32 total_count = 2;
}Example:
# Get all RED findings for payments service
grpcurl \
-plaintext \
-d '{"service": "payments", "status": "RED"}' \
localhost:8080 \
block.versionguard.service.VersionGuard/ListFindingsGet aggregate statistics across the fleet.
Request:
message GetFleetSummaryRequest {
CloudProvider cloud_provider = 1; // Optional
ResourceType resource_type = 2; // Optional
}Response:
message GetFleetSummaryResponse {
int32 total_resources = 1;
int32 red_count = 2;
int32 yellow_count = 3;
int32 green_count = 4;
int32 unknown_count = 5;
float compliance_percentage = 6;
google.protobuf.Timestamp last_scan = 7;
map<string, int32> by_service = 8; // Resources grouped by service
map<string, int32> by_brand = 9; // Resources grouped by brand
map<string, int32> by_cloud_provider = 10; // Resources by cloud (aws/gcp/azure)
}Example:
# Get AWS Aurora fleet summary
grpcurl \
-plaintext \
-d '{"cloud_provider": "AWS", "resource_type": "AURORA"}' \
localhost:8080 \
block.versionguard.service.VersionGuard/GetFleetSummaryVersion Guard attributes infrastructure resources to services using a 3-tier fallback approach to ensure accurate ownership mapping even when resources are poorly tagged.
Priority Order (highest to lowest):
-
Resource Tags (Primary, Fastest)
- Checks AWS tags (configurable via
TAG_APP_KEYSenvironment variable) - Default tag keys:
app,application,service(tried in order) - Customize to match your organization's tagging conventions
- Speed: Instant (already in CSV data)
- Accuracy: Depends on tagging discipline
- Checks AWS tags (configurable via
-
Registry Lookup (Fallback, Optional)
- Maps Cloud Account ID + Region → Service Name
- Speed: ~100ms per resource (if configured)
- Accuracy: Authoritative (from your service registry)
- Enabled: Only if registry client implemented and configured
-
Resource Name Parsing (Last Resort)
- Extracts service from cluster name (e.g.,
payments-prod-cluster-1→payments) - Speed: Instant (regex parsing)
- Accuracy: Best effort
- Extracts service from cluster name (e.g.,
Aurora Cluster: arn:aws:rds:us-east-1:123456789012:cluster:untagged-db
Step 1: Check tags
→ Tags: {} (empty)
→ Result: ❌ No service found
Step 2: Registry lookup (if configured)
→ AWS Account: 123456789012
→ Region: us-east-1
→ Registry returns: "payments"
→ Result: ✅ Service = "payments"
(If registry not configured or fails)
Step 3: Parse name
→ Cluster name: "untagged-db"
→ Parse result: "untagged"
→ Result: ⚠️ Service = "untagged" (may be inaccurate)
Version Guard defines a registry.Client interface that you can implement:
// pkg/registry/client.go
type Client interface {
GetServiceByCloudAccount(ctx context.Context, cloudAccountID, region string) (string, error)
}Example implementation:
package myregistry
import (
"context"
"fmt"
)
type MyRegistryClient struct {
endpoint string
apiKey string
}
func NewClient(endpoint, apiKey string) *MyRegistryClient {
return &MyRegistryClient{
endpoint: endpoint,
apiKey: apiKey,
}
}
func (c *MyRegistryClient) GetServiceByCloudAccount(ctx context.Context, accountID, region string) (string, error) {
// Call your internal service registry API
// Return service name or error if not found
return "my-service", nil
}-
Tag your resources properly:
# Use one of the configured tag keys (default: app, application, or service) aws rds add-tags-to-resource \ --resource-name arn:aws:rds:us-east-1:123:cluster:my-db \ --tags Key=app,Value=my-service # Or match your organization's custom tag convention # (if you've customized TAG_APP_KEYS environment variable)
Note: If your organization uses different tag keys (e.g.,
team,component), configure Version Guard to match by setting theTAG_APP_KEYSenvironment variable. -
Ensure registry data is current (if using registry):
- Keep cloud account mappings up-to-date
- Update when services migrate across accounts
-
Use consistent naming:
- Include service name in cluster names:
{service}-{env}-cluster-{n} - Example:
payments-prod-cluster-1,billing-staging-cluster-2
- Include service name in cluster names:
# Temporal Configuration
TEMPORAL_ENDPOINT=localhost:7233
TEMPORAL_NAMESPACE=version-guard-dev
TEMPORAL_TASK_QUEUE=version-guard-detection
# Wiz Configuration (Optional - falls back to mock data if not provided)
WIZ_CLIENT_ID_SECRET=your-wiz-client-id-here
WIZ_CLIENT_SECRET_SECRET=your-wiz-client-secret-here
WIZ_CACHE_TTL_HOURS=1
# Wiz Saved Report IDs
WIZ_AURORA_REPORT_ID=your-aurora-report-id
WIZ_ELASTICACHE_REPORT_ID=your-elasticache-report-id
WIZ_EKS_REPORT_ID=your-eks-report-id
# AWS Configuration
AWS_REGION=us-west-2
# S3 Snapshot Storage
S3_BUCKET=version-guard-snapshots
S3_PREFIX=snapshots/
# gRPC Service
GRPC_PORT=8080
# Tag Configuration (customize AWS resource tag keys)
TAG_APP_KEYS=app,application,service
TAG_ENV_KEYS=environment,env
TAG_BRAND_KEYS=brand
# Logging
LOG_LEVEL=infoCustomizing Tag Keys:
Version Guard extracts metadata from AWS resource tags to determine service ownership, environment, and brand. By default, it looks for tags like app, application, service, etc. Customize these to match your organization's tagging conventions:
# Example: Organization uses "cost-center" for business units
TAG_BRAND_KEYS=cost-center,department,business-unit
# Example: Organization uses "team" for service attribution
TAG_APP_KEYS=team,squad,component,application
# Example: Organization uses "env" exclusively
TAG_ENV_KEYS=envTag keys are tried in order — the first matching tag wins.
See .env.example for a complete template.
Symptom: No new findings appearing
Debug:
# Check Temporal workflow status
temporal workflow describe \
--workflow-id version-guard-orchestrator-v1 \
--namespace version-guard-dev
# Check server logs for errorsFix:
- Ensure Temporal server is running and accessible
- Verify workflow is registered (check server startup logs)
- Check that workflow schedule exists (if using scheduled runs)
Symptom: failed to fetch Wiz report
Debug:
- Check Wiz credentials are correctly configured
- Verify report IDs are correct
- Check Wiz API status
Fix:
- If Wiz is unavailable, server will automatically fall back to mock inventory
- Verify credentials:
echo $WIZ_CLIENT_ID_SECRET - Check report ID configuration
Symptom: TooManyRequestsException in logs
Fix:
- Exponential backoff is already implemented in EOL providers
- Reduce scan frequency
- Request AWS service quota increase
Symptom: Resource exists but not detected
Debug:
- Check if resource appears in Wiz report CSV
- Verify resource tags are correctly formatted
- Check EOL provider has version data
Fix:
- Ensure resource is in Wiz report (or mock inventory if testing)
- Verify resource tags (app, service, brand)
- Check EOL provider implementation for version coverage
Example: Adding a new database type
-
Create EOL Provider:
touch pkg/eol/custom/mydb.go # Implement EOLProvider interface -
Create Inventory Source:
touch pkg/inventory/wiz/mydb.go # Implement InventorySource interface -
Create Detector:
mkdir pkg/detector/mydb touch pkg/detector/mydb/detector.go # Implement Detector interface -
Register in Server Main:
// cmd/server/main.go eolProviders[types.ResourceTypeMyDB] = myeol.NewProvider(...) invSources[types.ResourceTypeMyDB] = myinv.NewSource(...) detectors[types.ResourceTypeMyDB] = mydb.NewDetector(...)
-
Add Configuration:
- Add to
.env.example - Update orchestrator workflow to include new resource type
- Add to
-
Test & Deploy:
make test make lint make build ./bin/version-guard
Version Guard provides emitter interfaces for you to implement. See ARCHITECTURE.md - Custom Emitters for details.
Quick Start:
-
Implement the interface:
package myemitter import ( "context" "github.com/block/Version-Guard/pkg/emitters" "github.com/block/Version-Guard/pkg/types" ) type MyEmitter struct { endpoint string } func (e *MyEmitter) Emit(ctx context.Context, snapshotID string, findings []*types.Finding) (*emitters.IssueTrackerResult, error) { // Send findings to your issue tracker, dashboard, etc. return &emitters.IssueTrackerResult{IssuesCreated: len(findings)}, nil }
-
Wire it up in your workflow: Create a custom workflow that reads snapshots from S3 and calls your emitter.
Example: Adding GCP support
See ARCHITECTURE.md - Multi-Cloud Support for detailed steps.
Summary:
- Add
CloudProviderGCPto enum - Create GCP inventory sources (Wiz + GCP Asset Inventory)
- Create GCP EOL providers for CloudSQL, Memorystore, GKE
- Create GCP detectors
- Update workflow orchestrator
- Update configuration
- Test end-to-end
- Deploy
Global Flags:
--endpoint=STRING # gRPC endpoint (env: VERSION_GUARD_ENDPOINT)
-v, --verbose # Enable verbose logging
-h, --help # Show help# Get fleet-wide summary
./bin/version-guard-cli fleet summary \
[--cloud-provider=aws|gcp|azure] \
[--resource-type=aurora|elasticache|eks] \
[--output-format=text|json|yaml]# List findings with filters
./bin/version-guard-cli finding list \
[--service=STRING] \
[--status=red|yellow|green] \
[--resource-type=STRING] \
[--cloud-provider=STRING] \
[--limit=INT] \
[--output-format=text|json|yaml]
# Show finding details
./bin/version-guard-cli finding show <resource-id> \
[--output-format=text|json|yaml]
# Export findings to CSV
./bin/version-guard-cli finding export \
[--output=findings.csv] \
[--service=STRING] \
[--status=red|yellow|green]Q: How often does Version Guard scan resources? A: Depends on how you configure the orchestrator workflow schedule in Temporal.
Q: Can I force a scan immediately? A: Yes, manually trigger the orchestrator workflow via Temporal UI or CLI.
Q: What happens if I upgrade a resource? A: Next scan will detect the new version and auto-resolve the finding.
Q: Does Version Guard automatically upgrade resources? A: No, Version Guard only detects and reports. You must upgrade manually.
Q: What if my resource version isn't in the EOL database? A: Finding will show status UNKNOWN. You can extend the EOL provider to add version data.
Q: How do I add a new resource type? A: See Runbook 1 above.
For more information:
- Architecture: See ARCHITECTURE.md
- GitHub Issues: https://github.com/block/Version-Guard/issues
- Contributing: See CONTRIBUTING.md