This skill provides expert guidance for diagnosing and resolving common cloud.gov and Cloud Foundry issues.
- User reports an application error or unexpected behavior
- Application fails to start or crashes repeatedly
- Deployment fails with an error message
- Service binding or connectivity issues
- Performance problems or resource constraints
Always start by collecting:
# Check application status
cf app <APP_NAME>
# View recent logs
cf logs <APP_NAME> --recent
# Check environment variables
cf env <APP_NAME>
# List bound services
cf servicesCategorize the issue based on symptoms:
| Symptom | Category | Start With |
|---|---|---|
| App won't start | Startup failure | Logs, health check config |
| App crashes after running | Runtime error | Logs, memory usage |
| Deployment fails | Push failure | Build logs, manifest |
| Can't reach app | Routing issue | Routes, domains |
| Service connection fails | Binding issue | VCAP_SERVICES, network |
| Slow performance | Resource issue | Memory, instances, scaling |
Symptoms:
cf pushhangs then fails- Logs show app starting but then killed
- Error: "Process has crashed" or "failed to accept connections"
Diagnosis:
# Check health check configuration
cf get-health-check <APP_NAME>
# Look for port binding in logs
cf logs <APP_NAME> --recent | grep -i "listen\|port\|bind"Solutions:
-
App not listening on correct port - Bind to
$PORTenvironment variable, not hardcoded portport = int(os.environ.get('PORT', 8080))
-
Health check type mismatch - Use
processtype for workers,httpfor web appscf set-health-check <APP_NAME> process
-
Startup too slow - Increase timeout in manifest
timeout: 180
-
Health check endpoint missing - Ensure
/healthor configured endpoint exists and returns 200
Symptoms:
- Push fails during staging phase
- "None of the buildpacks detected a compatible application"
Diagnosis:
# View staging logs (run during push)
cf logs <APP_NAME>Solutions:
- Missing dependency file - Ensure
requirements.txt,package.json,Gemfile, etc. exists - Wrong buildpack - Explicitly specify in manifest:
buildpacks: - python_buildpack
- Incompatible version - Check buildpack supports your runtime version
Symptoms:
- App starts then crashes
- Logs show "Memory quota exceeded" or "OOMKilled"
cf appshows instances crashing/restarting
Diagnosis:
# Check memory allocation vs usage
cf app <APP_NAME>
# Look for memory errors in logs
cf logs <APP_NAME> --recent | grep -i "memory\|oom\|killed"Solutions:
-
Increase memory allocation:
cf scale <APP_NAME> -m 1G
-
Optimize application memory usage:
- Reduce worker processes/threads
- Implement proper garbage collection
- Check for memory leaks
-
For Java apps - Set JVM heap size:
env: JAVA_OPTS: "-Xmx512m -Xms256m"
Symptoms:
- "Connection refused" or "Connection timeout"
- "Authentication failed"
- App works locally but fails on cloud.gov
Diagnosis:
# Verify service is bound
cf services
# Check credentials are in environment
cf env <APP_NAME> | grep -A 30 VCAP_SERVICES
# SSH into container to test connectivity
cf ssh <APP_NAME> -c "nc -zv <db-host> <db-port>"Solutions:
-
Service not bound - Bind and restage:
cf bind-service <APP_NAME> <SERVICE_NAME> cf restage <APP_NAME>
-
Credentials not refreshed - Restage after binding:
cf restage <APP_NAME>
-
Parsing VCAP_SERVICES incorrectly - Use proper JSON parsing:
import json, os vcap = json.loads(os.environ.get('VCAP_SERVICES', '{}'))
-
Connection pool exhaustion - Configure connection limits and timeouts
Symptoms:
- "Access Denied" when accessing bucket
- "NoSuchBucket" errors
- Credentials appear correct but operations fail
Diagnosis:
# Check S3 credentials in environment
cf env <APP_NAME> | grep -A 20 s3Solutions:
- Using wrong bucket name - Get from
VCAP_SERVICES, don't hardcode - Wrong region - Use region from service credentials
- Credentials expired - Rebind service:
cf unbind-service <APP_NAME> <S3_SERVICE> cf bind-service <APP_NAME> <S3_SERVICE> cf restage <APP_NAME>
Symptoms:
- App is running but returns 404
- Route appears correct in
cf routes
Diagnosis:
# List routes for the app
cf app <APP_NAME> | grep routes
# Check all routes in space
cf routesSolutions:
-
Route not mapped - Map the route:
cf map-route <APP_NAME> app.cloud.gov --hostname <HOSTNAME>
-
Old route from failed deployment - Remap route:
cf unmap-route <OLD_APP> app.cloud.gov --hostname <HOSTNAME> cf map-route <NEW_APP> app.cloud.gov --hostname <HOSTNAME>
Symptoms:
- Timeout when calling external services
- "Connection refused" to external APIs
- Works locally, fails on cloud.gov
Diagnosis:
# Test from container
cf ssh <APP_NAME> -c "curl -v https://api.example.com"Solutions:
- Egress restricted - Contact cloud.gov support to allow external access
- Firewall on external service - Whitelist cloud.gov IP ranges
- DNS resolution failure - Verify hostname resolves correctly
Symptoms:
cf pushcompletes but app won't start- Multiple instances show as "crashed"
Diagnosis:
# Get detailed crash information
cf events <APP_NAME>
# Check recent logs
cf logs <APP_NAME> --recentSolutions:
-
Missing start command - Add Procfile or command in manifest:
command: gunicorn app:app
-
Missing environment variables - Set required vars:
cf set-env <APP_NAME> SECRET_KEY "value" cf restage <APP_NAME>
-
File permissions - Ensure scripts are executable in repo
Symptoms:
- Old app still receiving traffic
- New app running but not accessible
- Routes in inconsistent state
Diagnosis:
# Check routes on both apps
cf app <APP_NAME>
cf app <APP_NAME>-venerable
# List all routes
cf routesSolutions:
- Manually complete the cutover:
# Map route to new app cf map-route <APP_NAME> app.cloud.gov --hostname <HOSTNAME> # Unmap from old app cf unmap-route <APP_NAME>-venerable app.cloud.gov --hostname <HOSTNAME> # Delete old app cf delete <APP_NAME>-venerable -f
Symptoms:
- High latency on requests
- Timeouts under load
- Performance degradation over time
Diagnosis:
# Check instance count and resource usage
cf app <APP_NAME>
# Look for slow queries or operations in logs
cf logs <APP_NAME> --recentSolutions:
-
Scale horizontally:
cf scale <APP_NAME> -i 3
-
Scale vertically:
cf scale <APP_NAME> -m 1G
-
Optimize database queries - Add indexes, reduce N+1 queries
-
Add caching - Use Redis service for frequently accessed data
Symptoms:
cf appshows instances cycling- Intermittent 502/503 errors
- Logs show repeated startup messages
Diagnosis:
# Check crash events
cf events <APP_NAME>
# Monitor logs in real-time
cf logs <APP_NAME>Solutions:
- Memory leak - Profile application, increase memory temporarily
- Unhandled exceptions - Add proper error handling
- Health check issues - Adjust health check settings
# Application errors
cf logs <APP_NAME> --recent | grep -i "error\|exception\|failed"
# Memory issues
cf logs <APP_NAME> --recent | grep -i "memory\|oom\|heap"
# Connection issues
cf logs <APP_NAME> --recent | grep -i "connection\|timeout\|refused"
# Startup issues
cf logs <APP_NAME> --recent | grep -i "listen\|port\|bind\|starting"
# Database issues
cf logs <APP_NAME> --recent | grep -i "database\|postgres\|mysql\|query"# Get logs from specific time window
cf logs <APP_NAME> --recent 2>&1 | grep "2026-01-16T14:"If standard troubleshooting doesn't resolve the issue:
- Platform issues - Check cloud.gov status page
- Support request - File ticket at cloud.gov support with:
- App name and space
- Error messages
- Steps already tried
- Timeline of when issue started
- Emergency - Use cloud.gov emergency contact for production outages
# Full diagnostic dump
cf app <APP_NAME> && cf logs <APP_NAME> --recent && cf events <APP_NAME>
# Restart without restaging
cf restart <APP_NAME>
# Restart with fresh staging
cf restage <APP_NAME>
# SSH into running container
cf ssh <APP_NAME>
# Run one-off task
cf run-task <APP_NAME> --command "python manage.py migrate"
# View all apps in space
cf apps
# View service details
cf service <SERVICE_NAME>