Skip to content

Commit 677ddaa

Browse files
pinetopsclaude
andauthored
feat: implement PAM Slack integration and policy v0.7 (#4)
* Remove webapp resources from shared GKE - Delete webapp-team.tf from shared-gke module - Webapp team now uses their own project-specific clusters 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Archive shared GKE infrastructure - Moved 2-shared-gke to archived/ directory - Shared GKE clusters have been destroyed - All workloads migrated to project-specific clusters - Projects remain with deletion protection 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Clean up redundant and experimental code - Archived 3-shared-gke directory (no longer needed) - Moved test and migration files to archived/ - Removed old terraform state backup 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Remove archived directories - cleanup complete - Removed all archived directories and their contents - Shared GKE infrastructure fully removed - Migration scripts and test files removed 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Add Tailscale organization-wide setup to GCP compliance framework - Integrated Tailscale module into org-level Terraform configuration - Creates dedicated project under shared-services folder - Stores auth keys securely in Google Secret Manager - Deploys subnet routers in US and EU regions - Supports automatic Secret Manager retrieval in startup script Key features: - Organization-wide network access permissions - Advertises all private ranges (VPC, GKE pods/services) - Secure auth key management with Secret Manager - Comprehensive deployment guide with ACL examples - Supports both initial deployment and key rotation This enables secure access to all GCP resources (including NodePort services) without traditional VPN, using modern zero-trust networking. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add OAuth automation for Tailscale - no more manual key rotation! - Implemented OAuth-based automatic key rotation - Cloud Function generates new auth keys monthly using OAuth API - Cloud Scheduler triggers rotation on 1st of each month - OAuth credentials stored securely in Secret Manager - Added comprehensive setup guide with step-by-step instructions Key components: - tailscale-oauth.tf: OAuth infrastructure and automation - Cloud Function in Python for key generation - Device authorization option as alternative - Detailed troubleshooting and monitoring guidance This eliminates the need to manually rotate auth keys every 90 days. The system will automatically generate and deploy new keys monthly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove Tailscale infrastructure - Deleted all Tailscale-related Terraform configurations - Removed Tailscale projects and resources - Cleaned up variables.tf to remove Tailscale-specific variables - Kept CMEK compliance requirement disabled as requested 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * feat: add organization-level groups configuration and security phase Organization changes: - Add groups.tf for centralized group definitions - Add variables for domain and audit access control - Grant org-wide viewer to developers group - Grant billing viewer to auditors group (optional) Security phase (2-security): - Add PAM configuration with simplified 4-group structure - Implement break glass emergency access for admins - Add deployment approver elevation for approvers group - Configure just-in-time access for all standard operations - Set up notification channels and audit logging This establishes the foundation for organization-wide zero-standing privilege with clear separation between infrastructure admins, deployment approvers, developers, and auditors. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: standardize nonproduction terminology in organization main.tf Update comment to use "nonproduction" instead of "non-production" for consistency with other terminology updates. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: implement GCP Break-Glass & Change-Management Policy v0.3 - Add comprehensive policy document covering all change lanes - Update PAM configuration to require dual approval (no self-approval) - Replace owner/admin roles with specific permissions for break-glass - Consolidate and standardize documentation across repositories - Remove redundant approval guides from webapp-team-app - Align security phase with new policy requirements - Add role mappings for Prod Support, Tech Leads, and Tech Mgmt Key changes: - Break-glass now requires 2 Tech Mgmt approvers - All PAM notifications go to gcp-admins group - Security reviews enforced via CODEOWNERS - Integration points for Opal/Sym JIT platforms 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: update to GCP Break-Glass Policy v0.4 - Clarify Google PAM as primary platform (not Opal/Sym) - Add Cloud Function integration for Slack notifications - Update retention periods to 400 days (from 1 year) - Add glossary section with key terms - Create detailed PAM break-glass runbook - Specify TTL by lane (30-60 min) - Add requirement for lock-file dependency review Key changes from v0.3: - Platform clarification: Google PAM + Cloud Functions - Detailed runbook: runbooks/pam-break-glass.md - Retention alignment: 400 days for audit trails 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: implement GCP Break-Glass Policy v0.4 requirements - Updated BigQuery retention from 90 to 400 days - Aligned PAM TTL windows with policy (30-60 min per lane) - Renamed entitlements to match policy lanes (jit-deploy, jit-tf-admin) - Configured all entitlements for dual approval - Added Cloud Function for Slack integration (#audit-log) - Updated break-glass to require 2 Tech Mgmt approvers - Created PAM break-glass runbook with procedures 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: address deployment issues for policy v0.4 - Add missing APIs (cloudfunctions, pubsub, artifactregistry, cloudbuild) - Fix BigQuery KMS permissions for encryption - Update group references (use gcp-admins instead of non-existent groups) - Remove unsupported PAM features (service account approvers) - Add notification rate limit for alert policies - Update all approvals_needed to 1 (Google PAM limitation) Note: Dual approval requirement documented in policy but PAM currently only supports single approval. Will revisit when Google adds support. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: update to GCP Break-Glass Policy v0.7 Major changes in v0.7: - Added new groups structure with 5 distinct roles - Introduced Project Bootstrap Workflow (Lane 4) - Added detailed Org-Level Infrastructure workflow - Specified failsafe account (u2i-failsafe@google.com) - Expanded audit artifacts to include project bootstrap - Added billing/finance role for cost management - Clarified everything-as-code approach Note: Failsafe account roles (Org Admin, Project Creator, Billing Admin) to be maintained as specified. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: implement policy v0.7 group structure - Added new 5-group structure per policy v0.7 - gcp-developers: Feature branches, read prod logs - gcp-prodsupport: Merge & deploy lane #1, on-call - gcp-techlead: Approve all lanes, security reviews - gcp-techmgmt: Org-level sign-off (CEO/COO) - gcp-billing: Cost dashboards & invoice export - Updated PAM entitlements for all 4 lanes - Lane 1: App Code (30 min) - jit-deploy - Lane 2: Env Infra (60 min) - jit-tf-admin - Lane 3: Org Infra (30 min) - break-glass - Lane 4: Project Bootstrap (30 min) - jit-project-bootstrap - Added failsafe account monitoring - Alert policy for u2i-failsafe@google.com usage - Dedicated log sink and dashboard - 24-hour retro-PR requirement - Created groups migration guide - Updated all references from old to new groups Note: Google PAM currently only supports single approval, policy requires dual approval for lanes 3-4. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: add PAM Slack integration and finalize security setup - Implement Slack bot integration for PAM notifications - Add Cloud Function with Slack SDK for rich formatting - Configure organization-wide PAM audit log sink - Add failsafe account monitoring and alerts - Update documentation with setup guides - Configure Secret Manager for bot token storage Note: New groups (gcp-prodsupport, gcp-techlead, gcp-techmgmt, gcp-billing) must be created in Google Workspace before full deployment can complete. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: remove node_modules and zip files from git - Remove accidentally committed node_modules directory - Remove Cloud Functions deployment ZIP file - Update .gitignore to properly exclude these files - These should be built/installed during deployment, not stored in git --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 12d81ce commit 677ddaa

45 files changed

Lines changed: 5769 additions & 2204 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,7 @@
44
*.tfvars
55
!*.tfvars.example
66
.terraform.lock.hcl
7-
.DS_Store
7+
.DS_Store
8+
node_modules/
9+
*.zip
10+
2-security/functions/*/node_modules/
Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# Tailscale OAuth Setup Guide
2+
3+
## Step 1: Create OAuth Application in Tailscale
4+
5+
1. **Go to Tailscale Admin Console**
6+
```
7+
https://login.tailscale.com/admin/settings/oauth
8+
```
9+
10+
2. **Click "Generate OAuth client"**
11+
12+
3. **Configure the OAuth Application:**
13+
- **Description**: `GCP Subnet Router Automation`
14+
- **Scopes**: Check only `devices:write`
15+
- Click "Generate client"
16+
17+
4. **Save the Credentials** (you'll see these only once!)
18+
- **Client ID**: Looks like `k8FqZ...` (short string)
19+
- **Client Secret**: Looks like `tskey-client-kYFqZ...` (longer string)
20+
21+
## Step 2: Configure Terraform
22+
23+
1. **Set OAuth credentials as environment variables:**
24+
```bash
25+
export TF_VAR_tailscale_oauth_client_id="k8FqZ..."
26+
export TF_VAR_tailscale_oauth_client_secret="tskey-client-kYFqZ..."
27+
```
28+
29+
2. **Update your tailnet name in terraform.tfvars:**
30+
```hcl
31+
# Already set to u2i.com, but verify this matches your Tailscale organization
32+
tailscale_tailnet = "u2i.com"
33+
```
34+
35+
To find your tailnet name:
36+
- Go to https://login.tailscale.com/admin/settings/general
37+
- Look for "Tailnet name" - it's usually your domain or a unique ID
38+
39+
## Step 3: Deploy OAuth Infrastructure
40+
41+
```bash
42+
cd gcp-org-compliance/1-organization
43+
44+
# First, let's create the project and OAuth infrastructure
45+
terraform plan -target=google_project.tailscale \
46+
-target=google_secret_manager_secret.tailscale_oauth_client_id \
47+
-target=google_secret_manager_secret.tailscale_oauth_client_secret \
48+
-target=module.tailscale_oauth
49+
50+
terraform apply -target=google_project.tailscale \
51+
-target=google_secret_manager_secret.tailscale_oauth_client_id \
52+
-target=google_secret_manager_secret.tailscale_oauth_client_secret \
53+
-target=module.tailscale_oauth
54+
```
55+
56+
## Step 4: Store OAuth Credentials in Secret Manager
57+
58+
```bash
59+
# Get the project ID
60+
PROJECT_ID=$(terraform output -raw tailscale_project_id)
61+
62+
# Store OAuth credentials
63+
echo -n "${TF_VAR_tailscale_oauth_client_id}" | \
64+
gcloud secrets versions add tailscale-oauth-client-id \
65+
--project=${PROJECT_ID} --data-file=-
66+
67+
echo -n "${TF_VAR_tailscale_oauth_client_secret}" | \
68+
gcloud secrets versions add tailscale-oauth-client-secret \
69+
--project=${PROJECT_ID} --data-file=-
70+
```
71+
72+
## Step 5: Create Initial Auth Key (One-Time)
73+
74+
Since the OAuth automation generates keys going forward, we need one initial key:
75+
76+
1. **Generate a temporary auth key:**
77+
- Go to: https://login.tailscale.com/admin/settings/keys
78+
- Click "Generate auth key"
79+
- Settings:
80+
- Reusable: ✓ Yes
81+
- Ephemeral: ✗ No
82+
- Expiration: 90 days
83+
- Tags: `tag:subnet-router`
84+
- Click "Generate key"
85+
86+
2. **Store the initial key:**
87+
```bash
88+
export INITIAL_KEY="tskey-auth-..."
89+
echo -n "${INITIAL_KEY}" | \
90+
gcloud secrets versions add tailscale-auth-key \
91+
--project=${PROJECT_ID} --data-file=-
92+
```
93+
94+
## Step 6: Deploy Tailscale Routers
95+
96+
```bash
97+
# Now deploy the actual routers
98+
terraform apply
99+
```
100+
101+
## Step 7: Verify OAuth Automation
102+
103+
1. **Check Cloud Function deployment:**
104+
```bash
105+
gcloud functions describe tailscale-key-generator \
106+
--region=us-central1 \
107+
--project=${PROJECT_ID}
108+
```
109+
110+
2. **Check Cloud Scheduler job:**
111+
```bash
112+
gcloud scheduler jobs describe tailscale-key-rotation \
113+
--location=us-central1 \
114+
--project=${PROJECT_ID}
115+
```
116+
117+
3. **Test key generation manually:**
118+
```bash
119+
gcloud functions call tailscale-key-generator \
120+
--region=us-central1 \
121+
--project=${PROJECT_ID}
122+
```
123+
124+
4. **Verify new key was created:**
125+
```bash
126+
gcloud secrets versions list tailscale-auth-key \
127+
--project=${PROJECT_ID}
128+
```
129+
130+
## Step 8: Approve Routes (One-Time)
131+
132+
1. Go to: https://login.tailscale.com/admin/machines
133+
2. Find machines: `gcp-us-central1`, `gcp-europe-west1`, `gcp-europe-west4`
134+
3. Approve advertised routes for each machine
135+
136+
## How It Works
137+
138+
1. **Monthly Rotation**: Cloud Scheduler triggers on the 1st of each month at 2 AM UTC
139+
2. **OAuth Flow**: Cloud Function uses OAuth to authenticate with Tailscale API
140+
3. **Key Generation**: Creates new 90-day auth key with appropriate tags
141+
4. **Secret Update**: Stores new key in Secret Manager
142+
5. **Router Updates**: Routers pick up new key on next restart/refresh
143+
144+
## Monitoring
145+
146+
Set up alerts for failed rotations:
147+
148+
```bash
149+
gcloud alpha monitoring policies create \
150+
--notification-channels=YOUR_CHANNEL_ID \
151+
--display-name="Tailscale Key Rotation Failure" \
152+
--condition-display-name="Function Error" \
153+
--condition-filter='resource.type="cloud_function"
154+
resource.labels.function_name="tailscale-key-generator"
155+
severity>="ERROR"' \
156+
--project=${PROJECT_ID}
157+
```
158+
159+
## Troubleshooting
160+
161+
### OAuth Token Issues
162+
```bash
163+
# Test OAuth manually
164+
curl -X POST https://api.tailscale.com/api/v2/oauth/token \
165+
-u "${TF_VAR_tailscale_oauth_client_id}:${TF_VAR_tailscale_oauth_client_secret}" \
166+
-d "grant_type=client_credentials&scope=devices"
167+
```
168+
169+
### Function Logs
170+
```bash
171+
gcloud functions logs read tailscale-key-generator \
172+
--region=us-central1 \
173+
--project=${PROJECT_ID} \
174+
--limit=50
175+
```
176+
177+
### Force Key Rotation
178+
```bash
179+
# Manually trigger rotation
180+
gcloud scheduler jobs run tailscale-key-rotation \
181+
--location=us-central1 \
182+
--project=${PROJECT_ID}
183+
```
184+
185+
## Security Notes
186+
187+
- OAuth credentials are stored encrypted in Secret Manager
188+
- Only the Cloud Function service account can access them
189+
- Auth keys are automatically rotated monthly
190+
- Old keys expire after 90 days
191+
- All access is logged for audit
192+
193+
## Next Steps
194+
195+
1. ✅ OAuth automation is now active
196+
2. 🔄 Keys will rotate automatically every month
197+
3. 📊 Monitor the Cloud Function for any issues
198+
4. 🔐 No more manual key management!
199+
200+
Your Tailscale infrastructure is now fully automated! 🎉

0 commit comments

Comments
 (0)