Skip to content

Conversation

@schitizsharma
Copy link
Collaborator

Description

Add liveness probe to sync pods

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • Backward compatible
  • Shut down NFS
  • Schedule NFS on spot node and trigger a spot interruption

Screenshots or Recordings

Related PR's (If Any):

chore: staging to master (helm chart version v0.0.1)
fix: add write permissions for gh token (#11)
feat: add release workflows (#17)
- Add nfsServer.external.storageClass and size parameters in values.yaml
- Modify shared-storage-pvc.yaml to always create PVC with conditional storage class
- Simplify _helpers.tpl sharedStoragePVC function to use consistent naming
- Add validation error when storageClass is empty and NFS is disabled
- Support RWX storage classes like EFS, Azure Files, GCP Filestore, etc.
- Remove PodAntiAffinity that prevented sync pods from running on same node
- Allow multiple sync operations to run concurrently for better resource utilization
- Keep JobID-based node affinity for resource mapping
- Fix tautological condition in affinity assignment
- Add ConfigMapWatcher using client-go informers for live updates
- Integrate watcher into K8sPodManager with graceful shutdown
- Update pod scheduling to use live mapping instead of static config
- Support debouncing (2s) for rapid ConfigMap changes
- Maintain backward compatibility with existing Helm charts
@schitizsharma schitizsharma changed the title Feat/activity pod healthcheck feat: activity pod healthcheck Oct 23, 2025
Comment on lines +253 to +272
// Add liveness probe for long-running sync operations
if req.Command == types.Sync {
pod.Spec.Containers[0].LivenessProbe = &corev1.Probe{
ProbeHandler: corev1.ProbeHandler{
Exec: &corev1.ExecAction{
Command: []string{
"/bin/sh",
"-c",
"echo ok > /mnt/config/.healthcheck",
},
},
},
InitialDelaySeconds: 10,
PeriodSeconds: 30,
TimeoutSeconds: 5,
FailureThreshold: 3,
SuccessThreshold: 1,
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a constant AysncCommands which has all long running tasks. So can we use that here?

// Add liveness probe for long-running sync operations
	if slices.Contains(constants.AsyncCommands, req.Command) {
		pod.Spec.Containers[0].LivenessProbe = &corev1.Probe{
         ...
	}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it in this file: https://github.com/datazip-inc/olake-helm/blob/refactor/olake-worker/worker/constants/constants.go where is it located?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is pending on some other PR, then let's change it later on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add when clear-destination PR is merged.

shubham19may
shubham19may previously approved these changes Nov 1, 2025
Base automatically changed from refactor/olake-worker to staging November 6, 2025 14:09
@hash-data hash-data dismissed shubham19may’s stale review November 6, 2025 14:09

The base branch was changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants