-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Checklist:
- I've included steps to reproduce the bug.
- I've included the version of argo rollouts.
Describe the bug
I have a DataDog analysis run result that shows 'Successful' where all measurement phases are either 'Errored' or 'Failed'.
Here's the DataDog Analysis Template for dd-analysis2 (incorrect successful phase):
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: dd-analysis2
...
spec:
args:
...
metrics:
- name: dd-analysis2
interval: 30s
count: 5
successCondition: "result < 0.02"
failureLimit: 2
provider:
datadog:
interval: 5m
query: sum:kubernetes.cpu.limits{service:{{ args.service }},workload:{{ args.workload }},env:{{ args.environment }},rollout_revision:{{ args.rollout_revision }}} by {container_name} * 1000
(This is just a test analysis template, and the query is not that useful. I just use a query that will return some data, since my test service receives no traffic.)
Here's the result from argo-rollouts for dd-analysis2 which has the 'successful' phase:
{
"name": "dd-analysis2",
"phase": "Successful",
"measurements": [
{
"phase": "Error",
"message": "invalid operation: < (mismatched types <nil> and float64)",
"startedAt": "2025-10-01T04:19:40Z",
"finishedAt": "2025-10-01T04:19:40Z"
},
{
"phase": "Error",
"message": "invalid operation: < (mismatched types <nil> and float64)",
"startedAt": "2025-10-01T04:19:50Z",
"finishedAt": "2025-10-01T04:19:50Z"
},
{
"phase": "Error",
"message": "invalid operation: < (mismatched types <nil> and float64)",
"startedAt": "2025-10-01T04:20:00Z",
"finishedAt": "2025-10-01T04:20:00Z"
},
{
"phase": "Failed",
"startedAt": "2025-10-01T04:20:10Z",
"finishedAt": "2025-10-01T04:20:10Z",
"value": "1000"
}
],
"count": 1,
"failed": 1,
"error": 3
}
Here's another DataDog analysis template dd-analysis3 that resulted in correct error phase:
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: dd-analysis3
...
spec:
args:
...
metrics:
- name: dd-analysis3
interval: 30s
count: 5
successCondition: "result < 0.05"
failureLimit: 2
provider:
datadog:
interval: 5m
query: sum:(
sum:trace.fastapi.request.hits{service:{{args.service}},http.status_code:500,env:{{args.environment}}}.as_count() /
sum:trace.http.request.hits{service:{{args.service }},env:{{args.environment }}}.as_count()
)
Here's the argo-rollouts result for dd-analysis3 which shows error phase correctly (error due to no data):
{
"name": "dd-analysis3",
"phase": "Error",
"measurements": [
{
"phase": "Error",
"message": "invalid operation: < (mismatched types <nil> and float64)",
"startedAt": "2025-10-01T04:19:40Z",
"finishedAt": "2025-10-01T04:19:40Z"
},
{
"phase": "Error",
"message": "invalid operation: < (mismatched types <nil> and float64)",
"startedAt": "2025-10-01T04:19:50Z",
"finishedAt": "2025-10-01T04:19:50Z"
},
{
"phase": "Error",
"message": "invalid operation: < (mismatched types <nil> and float64)",
"startedAt": "2025-10-01T04:20:00Z",
"finishedAt": "2025-10-01T04:20:00Z"
},
{
"phase": "Error",
"message": "invalid operation: < (mismatched types <nil> and float64)",
"startedAt": "2025-10-01T04:20:10Z",
"finishedAt": "2025-10-01T04:20:10Z"
},
{
"phase": "Error",
"message": "invalid operation: < (mismatched types <nil> and float64)",
"startedAt": "2025-10-01T04:20:20Z",
"finishedAt": "2025-10-01T04:20:20Z"
}
],
"message": "invalid operation: < (mismatched types <nil> and float64)",
"error": 5,
"consecutiveError": 5
},
The rollout of the service (the other analysis: promql-analysis, dd-analysis succeeded as expected):
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: pd-ws-test-workload
....
spec:
strategy:
canary:
maxUnavailable: 10%
maxSurge: 10%
steps:
- setWeight: 30
- analysis:
args:
- name: rollout_revision
valueFrom:
fieldRef:
fieldPath: metadata.annotations['rollout.argoproj.io/revision']
- name: workload
value: pd-ws-test-workload
templates:
- templateName: promql-analysis
- templateName: dd-analysis
- templateName: dd-analysis2
- templateName: dd-analysis3
- setWeight: 70
- pause: {}
- setWeight: 100
...
analysis:
successfulRunHistoryLimit: 10
unsuccessfulRunHistoryLimit: 10
revisionHistoryLimit: 20
minReadySeconds: 0
Argo-Rollouts version: 1.6.6
To Reproduce
Expected behavior
I expect dd-analysis2 should have 'Failed' phase instead of 'Successful' phase since all its measurement are either 'Errored' or 'Failed'.
Screenshots
Version
1.6.6
Logs
# Paste the logs from the rollout controller
# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts
# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME
time="2025-10-01T04:20:20Z" level=info msg="Event(v1.ObjectReference{Kind:\"AnalysisRun\", Namespace:\"pd-ws-test-dev\", Name:\"pd-ws-test-workload-6b447b6ff5-36-1\", UID:\"e6e35c8d-e0dc-46f5-87b0-df26092c3c59\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"2375908069\", FieldPath:\"\"}): type: 'Normal' reason: 'MetricSuccessful' Metric 'dd-analysis2' Completed. Result: Successful"
time="2025-10-01T04:20:20Z" level=info msg="Metric 'dd-analysis2' Completed. Result: Successful" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 event_reason=MetricSuccessful namespace=pd-ws-test-dev
time="2025-10-01T04:20:20Z" level=info msg="Metric 'dd-analysis2' transitioned from Running -> Successful" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:20:20Z" level=info msg="Metric Assessment Result - Successful: Run Terminated" metric=dd-analysis2
time="2025-10-01T04:20:20Z" level=info msg="Skipping measurement: run is terminating" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:20:10Z" level=info msg="Measurement Completed. Result: Failed" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:20:10Z" level=warning msg="Datadog will soon deprecate their API v1. Please consider switching to v2 soon." analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:20:10Z" level=info msg="Running overdue measurement" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:20:00Z" level=warning msg="Measurement had error: invalid operation: < (mismatched types <nil> and float64)" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:20:00Z" level=info msg="Measurement Completed. Result: Error" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:20:00Z" level=warning msg="Datadog will soon deprecate their API v1. Please consider switching to v2 soon." analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:20:00Z" level=info msg="Running overdue measurement" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:19:50Z" level=warning msg="Measurement had error: invalid operation: < (mismatched types <nil> and float64)" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:19:50Z" level=info msg="Measurement Completed. Result: Error" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:19:50Z" level=warning msg="Datadog will soon deprecate their API v1. Please consider switching to v2 soon." analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:19:50Z" level=info msg="Running overdue measurement" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:19:40Z" level=warning msg="Measurement had error: invalid operation: < (mismatched types <nil> and float64)" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:19:40Z" level=info msg="Measurement Completed. Result: Error" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:19:40Z" level=warning msg="Datadog will soon deprecate their API v1. Please consider switching to v2 soon." analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
time="2025-10-01T04:19:40Z" level=info msg="Running initial measurement" analysisrun=pd-ws-test-workload-6b447b6ff5-36-1 metric=dd-analysis2 namespace=pd-ws-test-dev
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.