[Observability] Stop recording optimistic concurrency conflict errors that are retried in metrics

**What would you like to be added**:

The `work_sync_workload_duration_seconds` metric should not count Kubernetes conflict errors (HTTP 409) as failures. These are expected retriable errors in distributed systems and should not be recorded with `result="error"`.

**Why is this needed**:

## Problem: Conflict Errors Cause False SLO Violations

In our environment, this metric shows **15% error rate** (16.23x burn rate) causing critical alerts, but the system is actually healthy. Nearly all "errors" are Kubernetes conflict errors that the controller successfully retries.

### Sample Logs

```
E0115 02:56:14.736986 Failed to update resource(kind=Deployment, default/nginx-deployment)
in cluster test-cluster-region1, err: Operation cannot be fulfilled on
deployments.apps "nginx-deployment": the object has been modified

E0115 02:56:19.000869 Failed to update resource(kind=Deployment, default/nginx-deployment)
in cluster test-cluster-region1, err: Operation cannot be fulfilled on
deployments.apps "nginx-deployment": the object has been modified

[Pattern repeats - controller retries and eventually succeeds]
```

These are HTTP 409 Conflict errors from Kubernetes optimistic concurrency control - normal, expected, and automatically retried.

### Current Metrics

```promql
# Error rate: 15.8% (target is 99% success)
sum(rate(work_sync_workload_duration_seconds_count{result="error"}[5m])) /
sum(rate(work_sync_workload_duration_seconds_count[5m]))

# Burn rate: 16.23x (burning error budget 16x faster than allowed)
slo:current_burn_rate:ratio{sloth_slo="work-sync-workload-availability"}
```

Nearly all these "errors" are conflict errors that successfully retry.

### Why This Matters

**Conflict errors are not failures:**
- Temporary and automatically retried (controller succeeds on next attempt)
- Expected in distributed systems (Kubernetes optimistic concurrency control)
- Not user-facing (workloads eventually sync successfully)

**Impact:**
- False critical alerts (teams paged for "healthy" system)
- Misleading dashboards (85% availability shown when actual is >99%)
- Alert fatigue and wasted engineering time

**Current:** Metric measures retry rate ("Did first attempt succeed?")
**Expected:** Metric should measure availability ("Did workload eventually sync?")

---

## Proposed Change

**File:** `pkg/controllers/execution/execution_controller.go`

Skip recording the metric:

```go
func (c *Controller) syncWork(...) (controllerruntime.Result, error) {
	start := time.Now()
	err := c.syncToClusters(ctx, clusterName, work)

	// Don't count conflict errors  - they are retriable
	...
```

**Impact:** Error rate 15% → <1%, Burn rate 16x → <1x


**Iteration Tasks**
- [x] `execution-controller` (@RainbowMango , https://github.com/karmada-io/karmada/pull/7106)
- [ ] `binding-controller` (@AnupamSingh2004, #7121)
- [ ] `cluster-binding-controller` (@AnupamSingh2004, #7121)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observability] Stop recording optimistic concurrency conflict errors that are retried in metrics #7111

Problem: Conflict Errors Cause False SLO Violations

Sample Logs

Current Metrics

Why This Matters

Proposed Change

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Observability] Stop recording optimistic concurrency conflict errors that are retried in metrics #7111

Description

Problem: Conflict Errors Cause False SLO Violations

Sample Logs

Current Metrics

Why This Matters

Proposed Change

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions