Backup should immediately fail when nodeAgent pods are not running

**What steps did you take and what happened:**


When a user creates a DPA with defaultSnapshotMoveData: true but doesn't enable NodeAgent, backups hang in WaitingForPluginOperations for hours until timeout. The same issue exists for defaultVolumesToFSBackup. Both features require the NodeAgent DaemonSet to be running, but the DPA validator doesn't enforce this — the misconfiguration is only caught at backup time (and only for FSB; DataMover just silently hangs).

**What did you expect to happen:**

Backup fails if nodeagent is not running

Steps to Reproduce:
1. Create a DPA with CSI enabled. 

2. Deploy a stateful application

3. Create a backup with SnapshotMoveData flag set as true
```
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: test-backup3
  labels:
    velero.io/storage-location: default
  namespace: openshift-adp
spec:
  includedNamespaces:
  - ocp-mysql
  storageLocation: ts-dpa-1
  snapshotMoveData: true
```
Actual results:

Backup gets stuck in WaitingForPluginsOperation phase until it hits timeout error. 
```
$ oc get backup test-backup3 -o yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    velero.io/resource-timeout: 10m0s
    velero.io/source-cluster-k8s-gitversion: v1.27.6+98158f9
    velero.io/source-cluster-k8s-major-version: "1"
    velero.io/source-cluster-k8s-minor-version: "27"
  creationTimestamp: "2023-10-18T13:21:08Z"
  generation: 5
  labels:
    velero.io/storage-location: ts-dpa-1
  name: test-backup3
  namespace: openshift-adp
  resourceVersion: "212060"
  uid: 1f252fb5-eb00-4efb-b576-12c6f7169a92
spec:
  csiSnapshotTimeout: 10m0s
  defaultVolumesToFsBackup: false
  includedNamespaces:
  - ocp-mysql
  itemOperationTimeout: 4h0m0s
  snapshotMoveData: true
  storageLocation: ts-dpa-1
  ttl: 720h0m0s
status:
  backupItemOperationsAttempted: 2
  expiration: "2023-11-17T13:21:08Z"
  formatVersion: 1.1.0
  phase: WaitingForPluginOperations
  progress:
    itemsBackedUp: 31
    totalItems: 31
  startTimestamp: "2023-10-18T13:21:09Z"
  version: 1
```
 

Expected results:

Backup should get immediately failed in case nodeAgent pods are not running. 

**The following information will help us better understand what's going on**:

_If you are using velero v1.7.0+:_  
Please use `velero debug  --backup <backupname> --restore <restorename>` to generate the support bundle, and attach to this issue, more options please refer to `velero debug --help` 

_If you are using earlier versions:_  
Please provide the output of the following commands (Pasting long output into a [GitHub gist](https://gist.github.com) or other pastebin is fine.)
- `kubectl logs deployment/velero -n velero`
- `velero backup describe <backupname>` or `kubectl get backup/<backupname> -n velero -o yaml`
- `velero backup logs <backupname>`
- `velero restore describe <restorename>` or `kubectl get restore/<restorename> -n velero -o yaml`
- `velero restore logs <restorename>`


**Anything else you would like to add:**



**Environment:**

- Velero version (use `velero version`): 
- Velero features (use `velero client config get features`): 
- Kubernetes version (use `kubectl version`):
- Kubernetes installer & version:
- Cloud provider or hardware configuration:
- OS (e.g. from `/etc/os-release`):


**Vote on this issue!**

This is an invitation to the Velero community to vote on issues, you can see the project's [top voted issues listed here](https://github.com/vmware-tanzu/velero/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc).  
Use the "reaction smiley face" up to the right of this comment to vote.

- :+1: for "I would like to see this bug fixed as soon as possible"
- :-1: for "There are more important bugs to focus on right now"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup should immediately fail when nodeAgent pods are not running #9698

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Backup should immediately fail when nodeAgent pods are not running #9698

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions