Skip to content

Conversation

@Aditya-DP
Copy link
Collaborator

@Aditya-DP Aditya-DP commented Dec 17, 2025

Implements automated cleanup for stuck terminating pods and prevents StatefulSet restart failures caused by stale lock files in the telemetry namespace.

Changes Made
🆕 New Features
Automated Pod Cleanup CronJob
Runs every 5 minutes to detect and force-delete pods stuck in terminating state for >60 seconds
Removes finalizers and performs graceful cleanup
Includes proper RBAC (ServiceAccount, ClusterRole, ClusterRoleBinding)
🐛 Bug Fixes
StatefulSet Lock File Cleanup
Added initContainers to clean stale lock files before pod startup
MySQL: Removes .sock and .pid files from /var/lib/mysql/
VictoriaMetrics: Removes flock.lock from data directories
Prevents pods from failing to start after ungraceful shutdowns
🔧 Configuration Updates
PVC Access Mode Change: Changed from ReadWriteMany to ReadWriteOnce for:
MySQL database PVC (
idrac_telemetry_statefulset.yaml.j2
)
VictoriaMetrics storage PVC (
victoria-cluster-vmstorage.yaml.j2
)
Reason: StatefulSets require RWO, and single-pod access pattern doesn't need RWX

@Aditya-DP Aditya-DP changed the title Pub/k8s telemetry Add telemetry pod cleanup automation and fix StatefulSet restart issues Dec 17, 2025
@abhishek-sa1 abhishek-sa1 deleted the branch dell:pub/k8s_telemetry December 17, 2025 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants