Skip to content

feat(RELEASE-2120): add managed pipeline retry with mitigations#1683

Open
seanconroy2021 wants to merge 1 commit into
konflux-ci:mainfrom
seanconroy2021:RELEASE-2120
Open

feat(RELEASE-2120): add managed pipeline retry with mitigations#1683
seanconroy2021 wants to merge 1 commit into
konflux-ci:mainfrom
seanconroy2021:RELEASE-2120

Conversation

@seanconroy2021

@seanconroy2021 seanconroy2021 commented Jun 15, 2026

Copy link
Copy Markdown
Member

Retry managed PipelineRuns that fail with OOM or timeout. Generic errors are not retried.

  • Add EnsureManagedPipelineProcessingIsCompleted operation for retry or mark release as failed
  • Keep managed condition as Progressing during retries
  • Apply memory and timeout mitigations on each retry using the failed PipelineRun specs as the base
  • Bump pipeline timeouts when task timeout grows
  • Add attempt label for PipelineRun lookup
  • Handle cleanup across multiple PipelineRun attempts
  • Rename ManagedPipelineAttempt to PipelineAttempt
  • Add GetRoleBindingFromPipelineAttempt and GetReleasePipelineRunAttempt to loader

Assisted-by: Claude

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 15, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 8:54 PM UTC · Completed 9:15 PM UTC
Commit: ffde3b2 · View workflow run →

@qodo-app-for-konflux-ci

qodo-app-for-konflux-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

PR Reviewer Guide 🔍

(Review updated until commit d1fbb50)

Warning

/review is deprecated. Use /agentic_review instead (removal date not yet scheduled).

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Nil Deref

The retry gate dereferences the ReleasePlanAdmission retry configuration. Ensure the max retries value is always present (or safely defaulted) when retries are enabled to avoid a nil pointer panic.

// EnsureManagedPipelineProcessingIsCompleted is an operation that will ensure that a failed managed
// pipeline attempt is either retried or finalized as a failure.
func (a *adapter) EnsureManagedPipelineProcessingIsCompleted() (controller.OperationResult, error) {
	if !a.release.IsManagedPipelineProcessing() || a.release.HasManagedPipelineProcessingFinished() {
		return controller.ContinueProcessing()
	}

	if !a.release.IsCurrentManagedPipelineAttemptFailed() {
		return controller.ContinueProcessing()
	}

	rpa, err := a.loader.GetActiveReleasePlanAdmissionFromRelease(a.ctx, a.client, a.release)
	if err != nil {
		return controller.RequeueWithError(err)
	}

	if a.release.IsCurrentManagedPipelineAttemptRetriable() && rpa.IsRetryEnabled() &&
		a.release.GetManagedPipelineRetryCount() < *rpa.Status.RetryInfo.MaxRetries {
		return controller.RequeueOnErrorOrContinue(a.retryManagedPipeline())
	}

	patch := client.MergeFrom(a.release.DeepCopy())
	a.release.MarkManagedPipelineProcessingFailed("Release processing failed on managed pipelineRun")
	a.release.MarkReleaseFailed("Release processing failed on managed pipelineRun")
	return controller.RequeueOnErrorOrContinue(a.client.Status().Patch(a.ctx, a.release, patch))
}
Edge Case

Mitigation application for OOM/timeout relies on last failed task/step names. If these fields are empty or unavailable, the retry override merge may create invalid/ineffective specs; consider guarding and falling back to no mitigations in that case.

// computeRetryOverrides returns the TaskRunSpecs and Timeouts to use for a retry PipelineRun,
// applying mitigations based on the previous attempt's failure reason.
func (a *adapter) computeRetryOverrides(resources *loader.ProcessingResources) ([]tektonv1.PipelineTaskRunSpec, tektonv1.TimeoutFields) {
	originalSpecs := resources.ReleasePlanAdmission.Spec.Pipeline.TaskRunSpecs
	originalTimeouts := resources.ReleasePlanAdmission.Spec.Pipeline.Timeouts

	failedAttempt := a.release.GetCurrentManagedPipelineAttempt()
	if failedAttempt == nil {
		return originalSpecs, originalTimeouts
	}

	mitigations := resources.ReleasePlanAdmission.GetMitigations()
	if mitigations == nil {
		return originalSpecs, originalTimeouts
	}

	failedPipelineRun, err := a.loader.GetReleasePipelineRunAttempt(a.ctx, a.client, a.release, a.release.GetManagedPipelineRetryCount())
	if err != nil || failedPipelineRun == nil {
		a.logger.Info("Failed PipelineRun not found, retrying without mitigations")
		return originalSpecs, originalTimeouts
	}

	// use the failed PipelineRun's specs and timeouts as the base so mitigations accumulate
	baseSpecs := failedPipelineRun.Spec.TaskRunSpecs
	if baseSpecs == nil {
		baseSpecs = originalSpecs
	}
	baseTimeouts := originalTimeouts
	if failedPipelineRun.Spec.Timeouts != nil {
		baseTimeouts = *failedPipelineRun.Spec.Timeouts
	}

	var failedTaskRun *tektonv1.TaskRun
	if failedAttempt.LastTask != "" {
		taskRuns, listErr := a.listTaskRunsForPipelineRun(failedPipelineRun)
		if listErr != nil {
			a.logger.Error(listErr, "Failed to list TaskRuns for mitigation")
		}
		for i := range taskRuns {
			if taskRuns[i].Labels["tekton.dev/pipelineTask"] == failedAttempt.LastTask {
				failedTaskRun = &taskRuns[i]
				break
			}
		}
	}

	switch failedAttempt.FailureReason {
	case v1alpha1.AttemptFailureOOMKillReason:
		if mitigations.OOMKill == nil {
			break
		}
		currentResources := tekton.GetStepComputeResources(failedPipelineRun, failedTaskRun, failedAttempt.LastTask, failedAttempt.LastStep)
		newResources := retry.ApplyMemoryMitigation(currentResources, mitigations.OOMKill)
		if newResources == nil {
			break
		}
		return retry.MergeTaskRunSpecs(baseSpecs, tektonv1.PipelineTaskRunSpec{
			PipelineTaskName: failedAttempt.LastTask,
			StepSpecs: []tektonv1.TaskRunStepSpec{
				{Name: failedAttempt.LastStep, ComputeResources: *newResources},
			},
		}), baseTimeouts

	case v1alpha1.AttemptFailureTaskRunTimeoutReason:
		if mitigations.Timeout == nil || mitigations.Timeout.Task == nil {
			break
		}
		currentTimeout := tekton.GetTaskRunTimeout(failedPipelineRun, failedTaskRun, failedAttempt.LastTask)
		currentTimeouts := tekton.GetPipelineRunTimeouts(failedPipelineRun)
		newTimeout, adjustedTimeouts := retry.ApplyTaskTimeoutMitigation(currentTimeout, currentTimeouts, mitigations.Timeout.Task)
		return retry.MergeTaskRunSpecs(baseSpecs, tektonv1.PipelineTaskRunSpec{
			PipelineTaskName: failedAttempt.LastTask,
			Timeout:          newTimeout,
		}), *adjustedTimeouts

	case v1alpha1.AttemptFailurePipelineRunTimeoutReason:
		if mitigations.Timeout == nil || mitigations.Timeout.Pipeline == nil {
			break
		}
		currentTimeouts := tekton.GetPipelineRunTimeouts(failedPipelineRun)
		newTimeouts := retry.ApplyPipelineTimeoutMitigation(currentTimeouts, mitigations.Timeout.Pipeline)
		if newTimeouts != nil {
			return baseSpecs, *newTimeouts
		}
	}

	return baseSpecs, baseTimeouts
}
📚 Focus areas based on broader codebase context

Error context

EnsureManagedPipelineProcessingIsCompleted finalizes failures with a hard-coded message for both MarkManagedPipelineProcessingFailed and MarkReleaseFailed, which loses useful context about why the attempt failed (e.g., OOM vs timeout vs generic error). Consider propagating the current attempt’s failure message/reason into these calls so the managed condition message and status surface actionable debugging information. (Ref 1)

patch := client.MergeFrom(a.release.DeepCopy())
a.release.MarkManagedPipelineProcessingFailed("Release processing failed on managed pipelineRun")
a.release.MarkReleaseFailed("Release processing failed on managed pipelineRun")
return controller.RequeueOnErrorOrContinue(a.client.Status().Patch(a.ctx, a.release, patch))

Reference reasoning: The existing MarkManagedPipelineProcessingFailed(message string) API is designed to carry a caller-provided message into the managed processed condition via conditions.SetConditionWithMessage, implying callers should pass meaningful failure detail rather than a generic constant string.

📄 References
  1. konflux-ci/release-service/api/v1alpha1/release_types.go [547-581]
  2. konflux-ci/release-service/api/v1alpha1/release_types.go [422-440]

@qodo-app-for-konflux-ci

Copy link
Copy Markdown

PR Code Suggestions ✨

Warning

/improve is deprecated. Use /agentic_review instead (removal date not yet scheduled).

Inline suggestions were posted as code suggestions.

Comment thread controllers/release/adapter.go
Comment thread controllers/release/adapter.go
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review

Findings

High

  • [breaking-api] api/v1alpha1/release_types.go:182 — Exported Go type ManagedPipelineAttempt renamed to PipelineAttempt. External repos importing v1alpha1 and referencing ManagedPipelineAttempt will get compilation errors. The JSON serialization tag remains managedPipelineAttempts so wire format is backward compatible, but the Go API surface has changed.
    Remediation: External consumers must update imports from v1alpha1.ManagedPipelineAttempt to v1alpha1.PipelineAttempt. Consider adding a type alias (type ManagedPipelineAttempt = PipelineAttempt) for backward compatibility.

  • [design-direction] api/v1alpha1/release_types.go:712 — Breaking change to the Release state machine: MarkCurrentManagedPipelineAttemptFailed no longer sets managedProcessedConditionType to FailedReason. The condition stays at Progressing after an attempt failure, deferring finalization to EnsureManagedPipelineProcessingIsCompleted. This creates asymmetry with tenant and final pipeline failures which immediately terminate their phases. The deferred failure marking is necessary for retry support but is a significant behavioral change.
    Remediation: Document this state machine change in an ADR or in AGENTS.md to clarify that managed pipeline failures are now deferred to EnsureManagedPipelineProcessingIsCompleted.

Medium

  • [edge-case] controllers/release/adapter.go:853 — In EnsureManagedPipelineProcessingIsCompleted, the RPA is fetched via GetActiveReleasePlanAdmissionFromRelease before any failure finalization. If the RPA cannot be loaded (deleted, network error), the function requeues with error and the release stays in a Progressing state until resolved. For non-retriable failures (AttemptFailureErrorReason), the retry configuration is irrelevant — the release should be finalized as failed regardless of RPA availability.
    Remediation: Check IsCurrentManagedPipelineAttemptRetriable() first; finalize non-retriable failures before fetching the RPA.

  • [logic-error] api/v1alpha1/release_types.go:712 — The two-phase failure handling (attempt marked failed → separate operation finalizes Release) is less atomic than the original single-patch pattern. While RequeueWithError prevents indefinite stalls, non-retriable failures incur unnecessary requeues through the RPA fetch when the retry decision is already determined.
    See also: [edge-case] finding at controllers/release/adapter.go:853.

  • [design-direction] controllers/release/controller.go:96EnsureManagedPipelineProcessingIsCompleted introduces a third operation for managed pipelines (processed → tracked → completed), which is a new pattern compared to the two-operation model used by tenant and final pipelines. Placement between tracking and final pipeline processing is logical but undocumented.
    Remediation: Document this three-operation pattern for managed pipelines in AGENTS.md.


Labels: PR renames an exported Go type (ManagedPipelineAttempt to PipelineAttempt), which is a breaking change for external consumers.

Previous run

Review

Findings

High

  • [architectural-coherence] api/v1alpha1/release_types.go:712MarkCurrentManagedPipelineAttemptFailed no longer sets the overall managed condition to Failed (the conditions.SetConditionWithMessage call was removed). This keeps the condition as Progressing to enable retries, but changes the observable state machine. Consumers relying on HasManagedPipelineProcessingFinished() during the retry window will now see Progressing instead of Failed. The finalization path in EnsureManagedPipelineProcessingIsCompleted calls MarkManagedPipelineProcessingFailed when retries are exhausted, which does set the condition to Failed.
    Remediation: Document this state machine change. Consider introducing a distinct condition reason (e.g., RetryPending) to distinguish "failed and retrying" from "still running" Progressing states.

  • [breaking-api] api/v1alpha1/release_types.go:182 — The Go type ManagedPipelineAttempt has been renamed to PipelineAttempt. Any external Go package importing v1alpha1.ManagedPipelineAttempt will fail to compile. The JSON serialization tag managedPipelineAttempts is preserved, so Kubernetes API wire compatibility is maintained. The CRD description is auto-generated and reflects the rename.
    Remediation: Add a type alias for backward compatibility: type ManagedPipelineAttempt = PipelineAttempt with a deprecation comment directing users to PipelineAttempt.

Medium

  • [resource exhaustion / retry bound] api/v1alpha1/releaseplanadmission_types.go:96RetryInfo.MaxRetries (*int) has no upper-bound validation. While it is set server-side by the controller (not directly by users in the Spec), the lack of a ceiling is a defense-in-depth gap. A misconfigured ReleaseServiceConfig could set MaxRetries to an arbitrarily large value, causing many retry PipelineRuns with potentially doubled memory limits.

  • [error-handling] controllers/release/adapter.go:844 — In EnsureManagedPipelineProcessingIsCompleted, when retryManagedPipeline() fails, the error is returned via RequeueOnErrorOrContinue with no status update. If retry creation fails persistently (e.g., cannot create the PipelineRun), the release will requeue indefinitely without any status indication of the problem.

  • [naming-conventions] api/v1alpha1/release_types.go:182 — The type rename from ManagedPipelineAttempt to PipelineAttempt breaks the established naming pattern. All other similar types use specific prefixes (ManagedProcessing, TenantProcessing, FinalProcessing). The new name suggests generalization not reflected in actual usage — the type remains exclusively in ManagedPipelineAttempts. See also: [breaking-api] finding at this location.

  • [architectural-coherence] controllers/release/controller.go:98 — New operation EnsureManagedPipelineProcessingIsCompleted is inserted between tracking and final pipeline operations. When retries are exhausted, it calls MarkManagedPipelineProcessingFailed and MarkReleaseFailed, which should cause downstream operations to skip via the IsFailed() gate. The placement is logically correct but the state machine edge cases warrant review.

  • [package-organization] retry/mitigations.go — The existing controllers/utils/retry/ package is moved to a top-level retry/ package. While other top-level packages exist (loader/, syncer/, metadata/, tekton/), the established pattern has utility packages under controllers/utils/.

Previous run (2)

Review

Findings

Medium

  • [missing-test] controllers/release/adapter_test.go — There is no test for the successful retry path in EnsureManagedPipelineProcessingIsCompleted. Tests cover failure finalization ('finalize the failure for a generic error' and 'finalize the failure when retries are not enabled'), but there is no test where IsCurrentManagedPipelineAttemptRetriable() is true, retries are enabled, and retry count is below max — i.e., the path that calls retryManagedPipeline(). While retryManagedPipeline has its own unit tests, the integration through the orchestrating operation is untested.
    Remediation: Add a test case where the release has a retriable failure (e.g., OOMKill), the ReleasePlanAdmission has retry enabled with MaxRetries > 0, and verify that EnsureManagedPipelineProcessingIsCompleted triggers a retry (new attempt appended, PipelineRun created).

  • [data-exposure] loader/loader.goGetReleasePipelineRunAttempt uses label-based lookup with client.Limit(1) without ordering. If multiple PipelineRuns match the same label set, the returned result is nondeterministic. The PipelineAttempt status already stores the PipelineRun namespaced name, which could be used for a direct Get instead of a List. The security risk from label spoofing is low (PipelineRuns are created in managed namespaces), but a direct lookup would be more robust.
    Remediation: Consider using the PipelineRun name from the PipelineAttempt.PipelineRun field for a direct client.Get instead of label-based query.

  • [design-direction] api/v1alpha1/release_types.goMarkCurrentManagedPipelineAttemptFailed no longer sets the managed processed condition to Failed (the SetConditionWithMessage call was removed). This is the core behavioral change enabling retries — keeping the condition in Progressing state allows EnsureManagedPipelineProcessingIsCompleted to decide whether to retry or finalize. The test confirms this is intentional. However, this changes the error handling semantics described in AGENTS.md ('mark phase failed + continue').
    Remediation: Update AGENTS.md to document that managed pipeline attempt failures keep the condition in Progressing state to allow retry decisions.

  • [architectural-conflict] retry/mitigations.go — The retry package was moved from controllers/utils/retry/ to the repo root. This makes sense since it's now imported by both controllers/release/ and controllers/releaseplanadmission/, but the new top-level location is not reflected in AGENTS.md's repository structure.
    Remediation: Add retry/ to the repository structure section in AGENTS.md with a description like 'Retry matchers and pipeline mitigation strategies'.

Low

  • [edge-case] controllers/release/adapter.go — In computeRetryOverrides, if the failed PipelineRun has been garbage-collected before the retry, mitigations fall back to original specs instead of accumulating. The fallback is logged ('Failed PipelineRun not found, retrying without mitigations') and is safe, but accumulated mitigations are lost.

  • [fail-open] controllers/release/adapter.goIsRetryEnabled() checks MaxRetries != nil but not *MaxRetries > 0. With MaxRetries=0 and Enabled=true, the retry gate never triggers (fail-closed, safe), but the state is semantically inconsistent.

  • [privilege-escalation] controllers/release/adapter.go — Memory mitigations accumulate exponentially across retries (2x→4x→8x with multiplier=2) because computeRetryOverrides uses the failed PipelineRun's specs as the base. MaxComputeResources caps this but is optional. See also: [edge-case] finding about accumulation strategy.

  • [naming-convention] api/v1alpha1/release_types.go — Type renamed from ManagedPipelineAttempt to PipelineAttempt (suggesting broader scope) while the field ManagedPipelineAttempts and all methods retain the Managed prefix. Minor naming inconsistency.

  • [doc-style] retry/mitigations.goenforcePipelineCeiling comment is grammatically incomplete ('ensures pipeline greater than or equal to tasks plus finally'). Should read 'ensures the pipeline timeout is greater than or equal to the sum of tasks and finally timeouts.'

  • [misplaced-abstraction] loader/loader.go — Two parallel RoleBinding loading paths exist: GetRoleBindingFromReleaseStatusPipelineInfo (takes PipelineInfo) and GetRoleBindingFromPipelineAttempt (takes PipelineAttempt). Both parse a namespaced name string and GET the RoleBinding. The duplication is minor since the old path is still needed for non-managed pipelines.

  • [design-direction] controllers/release/adapter.gocleanupManagedPipelineResources cleans the tenant RoleBinding once from attempts[0], assuming it's shared across all attempts. This is correct by construction (retryManagedPipeline reuses the existing binding) but the assumption is not documented with a code comment.

Previous run (3)

Review

Findings

Medium

  • [backward-compatibility] controllers/release/adapter.go:553EnsureManagedPipelineIsProcessed and EnsureManagedPipelineProcessingIsTracked now use GetReleasePipelineRunAttempt, which requires the new ReleaseAttemptLabel on PipelineRuns. Any managed PipelineRun created before this code change (i.e., before the label was introduced) will lack the label and will not be found. During a rolling upgrade, this could cause the controller to create a duplicate PipelineRun for a Release that already has one in-flight, or to fail to track an existing run.
    Remediation: Add a fallback — if GetReleasePipelineRunAttempt returns nil and the release already has ManagedPipelineAttempts with a PipelineRun reference, fall back to GetReleasePipelineRun. Alternatively, document that operators must drain in-flight releases before upgrading.

  • [stuck-release-liveness] controllers/release/adapter.go:838EnsureManagedPipelineProcessingIsCompleted requeues on any error from GetActiveReleasePlanAdmissionFromRelease. If the RPA is permanently deleted while the release is in a failed-attempt state, the controller will requeue indefinitely without ever finalizing the release as failed, creating a stuck release.
    Remediation: Add a fallback that marks the release as failed if the RPA cannot be found (IsNotFound error), rather than endlessly requeueing.

Low

  • [error-handling-gap] controllers/release/adapter.go:1004 — In computeRetryOverrides, when the failed PipelineRun is not found, the function falls back to retrying without mitigations. For OOM failures, this may waste a retry attempt. Consider documenting this as intentional fallback behavior.

  • [edge-case] retry/mitigations.go:175 — In addCappedDuration, when base is nil, duration starts at 0 plus the increment. This could set a timeout more restrictive than Tekton's default (1 hour). Consider documenting this behavior.

  • [fragile-assumption] controllers/release/adapter.go:888cleanupManagedPipelineResources assumes the tenant RoleBinding is shared across all attempts (only cleans the first attempt's). If retry logic ever changes to create new RoleBindings, cleanup will leak them. Consider adding a comment documenting this assumption.

  • [resource-exhaustion] retry/mitigations.go:52 — Memory mitigation multiplier is applied cumulatively across retries. Without MaxComputeResources (which is optional), memory requests/limits can grow significantly. Consider making MaxComputeResources required when OOMKill mitigations are configured.

  • [missing-upper-bound] api/v1alpha1/releaseplanadmission_types.go:96MaxRetries in RetryInfo (status subresource) has no upper-bound kubebuilder validation. Consider adding a Maximum constraint as defense-in-depth.

  • [architectural-debt] controllers/release/adapter.goMarkCurrentManagedPipelineAttemptFailed no longer sets the overall managed condition to Failed, deferring finalization to EnsureManagedPipelineProcessingIsCompleted. This is a correct design change for retry support, but the deferred-finalization pattern should be documented with a code comment.

  • [documentation-staleness] AGENTS.md:33 — States '~20 sequential operations'; the new EnsureManagedPipelineProcessingIsCompleted should be mentioned in the documented pipeline stages.

  • [api-breaking-change] api/v1alpha1/release_types.go:19 — Go type ManagedPipelineAttempt renamed to PipelineAttempt. Wire format is preserved (JSON tag unchanged). As v1alpha1, this is acceptable but should be noted.

  • [comment-capitalization] retry/mitigations.go — Multiple inline comments use lowercase first letter ('multiply the memory limit', 'bump tasks timeout', 'tekton rejects'). The codebase convention capitalizes the first word of comments.

  • [comment-capitalization] controllers/release/adapter.go:893 — Comments 'clean the shared RoleBinding once from the first attempt' and 'use the failed PipelineRun specs...' use lowercase first letter.

  • [function-naming-consistency] retry/mitigations.goApply* prefix on ApplyMemoryMitigation, ApplyTaskTimeoutMitigation, ApplyPipelineTimeoutMitigation — these are pure functions returning new values. The adapter uses compute* for similar patterns.

  • [authorization-scope-mismatch] — No linked GitHub issue. The Jira ticket RELEASE-2120 is referenced per the repo's documented convention. Consider linking the Jira ticket in the PR body for easier reviewer access.


Labels: PR introduces a new retry feature for managed pipelines with OOM/timeout mitigations.

Previous run (4)

Review

Findings

Medium

  • [error-handling-gap] controllers/release/adapter.go:1014 — In computeRetryOverrides, when the failed PipelineRun cannot be fetched (err != nil or failedPipelineRun == nil), the function logs an info-level message and returns originalSpecs/originalTimeouts. This means a retry will be created WITHOUT mitigations, consuming one of the limited retry attempts for no benefit (e.g., an OOM-killed pipeline retries with the same memory limits).
    Remediation: Consider returning an error to the caller so it can either requeue (if the PipelineRun might appear on a subsequent reconcile) or skip the retry entirely rather than wasting an attempt without mitigations.

  • [logic-error] controllers/release/adapter.go:940 — In retryManagedPipeline, if createManagedPipelineRun fails with a non-retryable error, handlePipelineCreationError marks the release as failed. However, no new PipelineAttempt entry has been appended to ManagedPipelineAttempts yet (that happens in registerManagedProcessingData on the success path), so the creation failure details are not recorded in the per-attempt history.
    Remediation: Record the creation failure in the attempts history, or document that pipeline creation errors are only captured in the release condition.

  • [missing-doc] AGENTS.md — The Architecture section documents the reconciliation pipeline with ~20 sequential operations but does not reflect the newly added EnsureManagedPipelineProcessingIsCompleted operation. This operation handles retry logic for failed managed pipeline attempts and represents a significant behavioral change.
    Remediation: Update AGENTS.md to document managed pipeline retry capability (e.g., "Pipeline processing in order: tenant collectors, managed collectors, tenant, managed (with automatic retry for OOM/timeout failures), final").

Low

  • [logic-edge-case] api/v1alpha1/releaseplanadmission_types.go:136IsRetryEnabled() returns true when Enabled=true and MaxRetries is a non-nil pointer to 0. The actual retry guard in EnsureManagedPipelineProcessingIsCompleted correctly prevents retries (0 < 0 = false), so runtime behavior is correct. The method name is slightly misleading.

  • [type-rename-coherence] api/v1alpha1/release_types.go:18 — Type renamed from ManagedPipelineAttempt to PipelineAttempt but the field name remains ManagedPipelineAttempts. Consider adding a comment explaining PipelineAttempt is generic, currently only used for managed pipelines.

  • [scope-expansion] controllers/utils/retry/mitigations.go — New package controllers/utils/retry creates a new organizational pattern not documented in AGENTS.md. Consider moving to controllers/release/retry/ if release-controller-specific.

  • [RBAC-lifecycle] controllers/release/adapter.go:930 — In retryManagedPipeline(), the tenant RoleBinding is looked up from the failed attempt but is not re-created if missing. Currently safe because cleanup only runs after HasManagedPipelineProcessingFinished() (which returns false during retry window), but the assumption could be fragile to future changes.

  • [cleanup-robustness] controllers/release/adapter.go:888 — In cleanupManagedPipelineResources(), the tenant RoleBinding is always retrieved from attempts[0]. All retry attempts currently reuse the same RoleBinding, but this assumption is implicit rather than enforced.

  • [doc-comment-style] controllers/utils/retry/mitigations.goenforcePipelineCeiling godoc comment has awkward phrasing ("ensures pipeline greater than or equal to tasks plus finally"). Consider rewording for clarity.

  • [missing-feature-doc] README.md — New user-facing retry functionality (configurable mitigations for OOM and timeout failures) has no user documentation explaining configuration or usage.

Previous run (5)

Review

Findings

Medium

  • [missing-test] controllers/release/adapter_test.go — The EnsureManagedPipelineProcessingIsCompleted tests cover skip paths and finalize-failure paths but there is no test for the retry happy path: a retriable failure (OOM/timeout) + retries enabled + under MaxRetries. The core retry orchestration (retryManagedPipeline and computeRetryOverrides) is untested at the adapter level, though the utility functions in controllers/utils/retry have thorough unit tests.
    Remediation: Add at least one test that sets up a retriable failure, configures RetryInfo.Enabled=true with MaxRetries>0, and verifies the retry path is taken.

  • [error-handling] controllers/release/adapter.go — In computeRetryOverrides, when the failed PipelineRun cannot be fetched (GetReleasePipelineRunAttempt returns error), the function falls back to originalSpecs with log "retrying without mitigations" — meaning the retry uses the same resource limits that caused the OOM/timeout. When listTaskRunsForPipelineRun fails, mitigation still applies at the task level rather than step level (degraded but functional).
    Remediation: Consider returning an error when the failed PipelineRun cannot be fetched, so the caller can decide whether to requeue rather than create a retry with identical resources.

Low

  • [edge-case] controllers/release/adapter.go — In retryManagedPipeline, the tenantRoleBinding is loaded from the failed attempt rather than the first attempt. All attempts share the same RoleBinding (cleanup only runs after all processing finishes), but the assumption is implicit. A clarifying comment would help.

  • [edge-case] controllers/utils/retry/mitigations.goApplyMemoryMitigation accepts a multiplier of exactly 1.0, which produces no actual change in memory, resulting in a retry with identical resources that will likely fail the same way.

  • [resource-escalation] api/v1alpha1/retryable_pipeline.goMaxRetries on RetryPolicy has Minimum=0 but no Maximum kubebuilder validation. This is on the admin-controlled ReleaseServiceConfig CRD so risk is low, but adding a reasonable ceiling (e.g., 10) would be a good defense-in-depth measure.

  • [naming-convention] api/v1alpha1/release_types.go:20 — Type renamed from ManagedPipelineAttempt to PipelineAttempt (generic) but the field Release.Status.ManagedPipelineAttempts keeps the Managed prefix, creating a minor naming inconsistency.

  • [edge-case] api/v1alpha1/releaseplanadmission_types.goIsRetryEnabled returns true when MaxRetries is non-nil, even if *MaxRetries == 0. The caller handles this correctly (retryCount < 0 is always false), but the semantics are confusing. Consider documenting that MaxRetries=0 effectively disables retries.

  • [architectural-coherence] controllers/release/adapter.go — The new EnsureManagedPipelineProcessingIsCompleted operation creates an asymmetry: managed pipelines now have Processed+Tracked+Completed, while tenant/final only have Processed+Tracked. Consider updating CLAUDE.md to reflect the new three-step pattern.

  • [architectural-coherence] controllers/release/adapter.goMarkCurrentManagedPipelineAttemptFailed no longer sets the overall condition to Failed. This is the key behavioral change enabling retries, but it changes the error handling pattern described in CLAUDE.md. Consider updating the documentation to reflect that attempt failure no longer finalizes the phase.

Info

  • [error-handling-idiom] controllers/release/adapter.go:865 — Error message changed from "invalid type" to "unsupported pipeline type" with test updated to match.

  • [abstraction-alignment] controllers/release/adapter.go — Retry logic split between adapter (computeRetryOverrides) and new retry package follows the existing pattern where pure utility functions (like tekton/utils) live outside the adapter while domain orchestration stays in the adapter.

Previous run (6)

Review

Findings

High

  • [nil-deref] api/v1alpha1/release_types.go:470 — In MarkCurrentManagedPipelineAttemptProcessed, the diff replaces the if attempt == nil { return } guard with if r.IsCurrentManagedPipelineAttemptDone() { return }. IsCurrentManagedPipelineAttemptDone() returns false when there are no attempts (because GetCurrentManagedPipelineAttempt() returns nil and the attempt != nil check fails). This means if IsManagedPipelineProcessing() is true but no attempt has been appended yet, the code falls through to attempt := r.GetCurrentManagedPipelineAttempt() which returns nil, then attempt.Status = AttemptSucceededReason panics with a nil-pointer dereference. The same pattern exists in MarkCurrentManagedPipelineAttemptFailed (line ~697).
    Remediation: Re-add a nil check after obtaining the attempt pointer, or make IsCurrentManagedPipelineAttemptDone an additive guard rather than a replacement: attempt := r.GetCurrentManagedPipelineAttempt(); if attempt == nil || r.IsCurrentManagedPipelineAttemptDone() { return }

Medium

  • [design-direction] api/v1alpha1/release_types.go:712MarkCurrentManagedPipelineAttemptFailed no longer sets the overall managed processed condition to Failed (the SetConditionWithMessage call was removed). This is intentional for retry support: the condition stays Progressing so HasManagedPipelineProcessingFinished() returns false, allowing EnsureManagedPipelineProcessingIsCompleted to decide whether to retry or finalize. Downstream operations gate on HasManagedPipelineProcessingFinished(), not IsFailed(), so they correctly skip during the retry window. However, this state-machine invariant is undocumented and could be broken by future changes.
    Remediation: Document the state-machine change in AGENTS.md — explain that managed pipeline failures now have an intermediate state where the attempt is failed but the phase is not finalized, pending retry evaluation.

Low

  • [api-contract] controllers/release/adapter.go:550EnsureManagedPipelineIsProcessed still uses GetReleasePipelineRun (selects by pipeline type label only, client.Limit(1)) instead of the new GetReleasePipelineRunAttempt. With retries creating multiple managed PipelineRuns, this may return a stale PipelineRun from a prior attempt. The current control flow prevents issues (the IsManagedPipelineProcessing() gate blocks re-creation), but switching to GetReleasePipelineRunAttempt would improve consistency.

  • [error handling gap] controllers/release/adapter.go:1008 — In computeRetryOverrides, when GetReleasePipelineRunAttempt returns an error, it is logged at Info level and the retry proceeds with original (unmitigated) specs. Transient API server errors are indistinguishable from NotFound. Consider using Warning/Error log level for non-NotFound errors.

  • [fail-open] api/v1alpha1/releaseplanadmission_types.go:96MaxRetries in RetryInfo has no kubebuilder validation constraint. While RetryInfo lives in the Status subresource (admin-controlled) and a negative value would actually disable retries (fail-closed), adding +kubebuilder:validation:Minimum=0 would be a defense-in-depth improvement.

  • [test-weakened] api/v1alpha1/release_types_test.go:1522 — The test renamed to "should not set the overall condition to Failed" correctly tests the new behavior but drops the message field assertion that was present in the original test.

  • [missing-test] controllers/release/adapter_test.go:2438 — Tests for EnsureManagedPipelineProcessingIsCompleted cover non-retry paths (generic error, retries not enabled) but do not test the successful retry path where the attempt is retriable, retries are enabled, and retry count is below max. retryManagedPipeline and computeRetryOverrides also lack adapter-level integration tests.

Info

  • [architectural-documentation-staleness] AGENTS.md:33 — AGENTS.md documents the reconciliation pipeline but does not mention retry logic or the new EnsureManagedPipelineProcessingIsCompleted operation.

  • [scope-creep] The PR bundles a type rename, retry logic, mitigation package, loader interface expansion, cleanup refactoring, and label addition. All pieces are needed for retry to work, but the cross-cutting scope is significant.

Previous run (7)

Review

Findings

Medium

  • [error-swallowing] controllers/release/adapter.go:957 — In cleanupManagedPipelineResources, when iterating over attempts, API errors from GetReleasePipelineRunAttempt are silently swallowed with continue (line 906-909: if err != nil || pipelineRun == nil { continue }). If the API server returns a transient error, the PipelineRun finalizer will never be cleaned up by this code path. While this aligns with the existing cleanup philosophy (orphaned cleanup handles failures), distinguishing NotFound errors from transient API errors would improve reliability.
    Remediation: Distinguish between NotFound errors (which should continue) and other API errors (which should be returned or accumulated).

Low

  • [nil-deref] api/v1alpha1/release_types.go:470 — In MarkCurrentManagedPipelineAttemptProcessed and MarkCurrentManagedPipelineAttemptFailed, the defensive nil-check on attempt was replaced by IsCurrentManagedPipelineAttemptDone(), but GetCurrentManagedPipelineAttempt() is then called without a nil check. If ManagedPipelineAttempts is empty while IsManagedPipelineProcessing() returns true, attempt.Status would panic. The preceding guards make this unlikely in normal reconciliation flow, but MarkCurrentManagedPipelineAttemptProcessing (line 588-590) does have a nil check, creating an inconsistency.

  • [test-inadequate] controllers/release/adapter_test.go — Tests for EnsureManagedPipelineProcessingIsCompleted cover the non-retriable and retry-disabled cases but do not test the successful retry path where retryManagedPipeline is invoked.

  • [test-inadequate] controllers/release/adapter_test.go — No tests for computeRetryOverrides, which contains non-trivial branching logic (switch on failure reason, nil checks, mitigation application). The underlying mitigation functions are tested in controllers/utils/retry/mitigations_test.go, but the orchestration logic itself is not.

  • [fail-open] controllers/release/adapter.go:1006 — In computeRetryOverrides, when the failed PipelineRun cannot be found, the function falls back to original specs — retrying without mitigations rather than failing. This is a deliberate design choice (logged at Info level) but means the mitigation benefit is silently lost.

  • [scope-creep] api/v1alpha1/release_types.go:18 — The PR renames ManagedPipelineAttempt to PipelineAttempt, suggesting generalization beyond managed pipelines, but the type is still only used for managed pipelines. The field name ManagedPipelineAttempts remains unchanged.

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label Jun 15, 2026
@seanconroy2021

Copy link
Copy Markdown
Member Author

/retest

@seanconroy2021 seanconroy2021 force-pushed the RELEASE-2120 branch 2 times, most recently from a155f18 to 6c427aa Compare June 16, 2026 09:30
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 16, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 9:32 AM UTC · Completed 9:50 AM UTC
Commit: ffde3b2 · View workflow run →

fullsend-ai-review[bot]

This comment was marked as outdated.

@fullsend-ai-review fullsend-ai-review Bot removed the requires-manual-review Review requires human judgment label Jun 16, 2026
Comment thread api/v1alpha1/release_types.go
Comment thread controllers/release/adapter.go
@fullsend-ai-review

Copy link
Copy Markdown

🤖 Review · Started 10:47 AM UTC
Commit: 218f229 · View workflow run →

@qodo-app-for-konflux-ci

Copy link
Copy Markdown

PR Code Suggestions ✨

Warning

/improve is deprecated. Use /agentic_review instead (removal date not yet scheduled).

Inline suggestions were posted as code suggestions.

Comment thread controllers/release/adapter.go
Comment thread controllers/release/adapter.go
@fullsend-ai-review

Copy link
Copy Markdown

🤖 Review · Started 10:59 AM UTC
Commit: 218f229 · View workflow run →

@qodo-app-for-konflux-ci

Copy link
Copy Markdown

PR Code Suggestions ✨

Warning

/improve is deprecated. Use /agentic_review instead (removal date not yet scheduled).

No code suggestions found for the PR.

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label Jun 18, 2026
@fullsend-ai-review

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 10:59 AM UTC · Completed 11:13 AM UTC
Commit: 218f229 · View workflow run →

happybhati
happybhati previously approved these changes Jun 18, 2026

@happybhati happybhati left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, LGTM
e2e happypath failing seems unrelated.

@seanconroy2021

Copy link
Copy Markdown
Member Author

/retest

@seanconroy2021

Copy link
Copy Markdown
Member Author

Codecov Report

❌ Patch coverage is 63.60656% with 111 lines in your changes missing coverage. Please review. ✅ Project coverage is 85.50%. Comparing base (b3e813e) to head (ae49387). ⚠️ Report is 1 commits behind head on main.
Files with missing lines Patch % Lines
controllers/release/adapter.go 32.88% 88 Missing and 12 partials ⚠️
loader/loader_mock.go 50.00% 2 Missing and 2 partials ⚠️
retry/mitigations.go 96.70% 2 Missing and 1 partial ⚠️
api/v1alpha1/release_types.go 90.90% 1 Missing and 1 partial ⚠️
loader/loader.go 92.85% 1 Missing and 1 partial ⚠️
❌ Your patch check has failed because the patch coverage (63.60%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
Additional details and impacted files
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1683      +/-   ##
==========================================
- Coverage   87.43%   85.50%   -1.94%     
==========================================
  Files          34       35       +1     
  Lines        3566     3842     +276     
==========================================
+ Hits         3118     3285     +167     
- Misses        285      378      +93     
- Partials      163      179      +16     

Flag Coverage Δ
e2e-tests 50.28% <29.50%> (-2.21%) ⬇️
unit-tests 79.93% <62.29%> (-1.56%) ⬇️
Flags with carried forward coverage won't be shown. Click here to find out more.
Files with missing lines Coverage Δ
api/v1alpha1/releaseplanadmission_types.go 100.00% <100.00%> (ø)
controllers/release/controller.go 90.19% <100.00%> (+0.19%) ⬆️
controllers/releaseplanadmission/adapter.go 87.23% <ø> (ø)
metadata/labels.go 100.00% <ø> (ø)
retry/matcher.go 87.27% <ø> (ø)
api/v1alpha1/release_types.go 98.16% <90.90%> (-0.40%) ⬇️
loader/loader.go 88.35% <92.85%> (+0.57%) ⬆️
retry/mitigations.go 96.70% <96.70%> (ø)
loader/loader_mock.go 49.29% <50.00%> (+0.08%) ⬆️
controllers/release/adapter.go 78.92% <32.88%> (-5.60%) ⬇️
Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3e813e...ae49387. Read the comment docs.

🚀 New features to boost your workflow:

If this is up to date, we will need more tests added @seanconroy2021

I don't think it has been updated. Have deleted the CodeCov comment (normally triggers a new one). Will fix anything I am missing.

@qodo-app-for-konflux-ci

Copy link
Copy Markdown

PR Code Suggestions ✨

Warning

/improve is deprecated. Use /agentic_review instead (removal date not yet scheduled).

Inline suggestions were posted as code suggestions.

Comment thread controllers/release/adapter.go
Comment thread loader/loader.go
Comment thread retry/mitigations.go
Comment on lines +101 to +110
func ApplyPipelineTimeoutMitigation(current *tektonv1.TimeoutFields, mitigation *v1alpha1.TimeoutIncrement) *tektonv1.TimeoutFields {
if mitigation == nil {
return current
}
adjustedTimeouts := copyTimeoutFields(current)
newPipelineTimeout := addCappedDuration(adjustedTimeouts.Pipeline, mitigation)
adjustedTimeouts.Pipeline = &metav1.Duration{Duration: newPipelineTimeout}

return adjustedTimeouts
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Call enforcePipelineCeiling to ensure the mitigated pipeline timeout remains valid. [possible issue, importance: 8]

Suggested change
func ApplyPipelineTimeoutMitigation(current *tektonv1.TimeoutFields, mitigation *v1alpha1.TimeoutIncrement) *tektonv1.TimeoutFields {
if mitigation == nil {
return current
}
adjustedTimeouts := copyTimeoutFields(current)
newPipelineTimeout := addCappedDuration(adjustedTimeouts.Pipeline, mitigation)
adjustedTimeouts.Pipeline = &metav1.Duration{Duration: newPipelineTimeout}
return adjustedTimeouts
}
func ApplyPipelineTimeoutMitigation(current *tektonv1.TimeoutFields, mitigation *v1alpha1.TimeoutIncrement) *tektonv1.TimeoutFields {
if mitigation == nil {
return current
}
adjustedTimeouts := copyTimeoutFields(current)
newPipelineTimeout := addCappedDuration(adjustedTimeouts.Pipeline, mitigation)
adjustedTimeouts.Pipeline = &metav1.Duration{Duration: newPipelineTimeout}
enforcePipelineCeiling(adjustedTimeouts)
return adjustedTimeouts
}

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 30, 2026

Copy link
Copy Markdown

🤖 Review · ❌ Terminated · Started 5:09 PM UTC · Ended 5:15 PM UTC
Commit: ec21706 · View workflow run →

@fullsend-ai-review

Copy link
Copy Markdown

🤖 Finished Review · ❌ Failure · Started 5:09 PM UTC · Completed 5:15 PM UTC
Commit: ec21706 · View workflow run →

@johnbieren johnbieren requested a review from happybhati June 30, 2026 18:10

@happybhati happybhati left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @seanconroy2021 is the codecov report valid?
#1683 (comment)

@seanconroy2021

Copy link
Copy Markdown
Member Author

Hey @seanconroy2021 is the codecov report valid? #1683 (comment)

I think it's correct. I am missing tests for computeRetryOverrides and retryManagedPipeline. Adding them back in now :)

@konflux-ci konflux-ci deleted a comment from codecov Bot Jul 1, 2026
@seanconroy2021

Copy link
Copy Markdown
Member Author

Hey @seanconroy2021 is the codecov report valid? #1683 (comment)

I think it's correct. I am missing tests for computeRetryOverrides and retryManagedPipeline. Adding them back in now :)

Have gone through the codecov and added in any missing tests

@fullsend-ai-review

fullsend-ai-review Bot commented Jul 1, 2026

Copy link
Copy Markdown

🤖 Review · ❌ Terminated · Started 7:55 AM UTC · Ended 8:11 AM UTC
Commit: ec21706 · View workflow run →

@qodo-app-for-konflux-ci

Copy link
Copy Markdown

PR Code Suggestions ✨

Warning

/improve is deprecated. Use /agentic_review instead (removal date not yet scheduled).

Inline suggestions were posted as code suggestions.

Comment thread controllers/release/adapter_test.go
Comment on lines +3551 to +3555
cleaned, err := adapter.loader.GetReleasePipelineRunAttempt(adapter.ctx, adapter.client, adapter.release, 0)
Expect(err).NotTo(HaveOccurred())
Expect(cleaned).NotTo(BeNil())
Expect(cleaned.Finalizers).To(HaveLen(0))
Expect(k8sClient.Delete(ctx, firstPipelineRun)).To(Succeed())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Use the newly fetched cleaned object for deletion instead of the original pointer. [possible issue, importance: 6]

Suggested change
cleaned, err := adapter.loader.GetReleasePipelineRunAttempt(adapter.ctx, adapter.client, adapter.release, 0)
Expect(err).NotTo(HaveOccurred())
Expect(cleaned).NotTo(BeNil())
Expect(cleaned.Finalizers).To(HaveLen(0))
Expect(k8sClient.Delete(ctx, firstPipelineRun)).To(Succeed())
cleaned, err := adapter.loader.GetReleasePipelineRunAttempt(adapter.ctx, adapter.client, adapter.release, 0)
Expect(err).NotTo(HaveOccurred())
Expect(cleaned).NotTo(BeNil())
Expect(cleaned.Finalizers).To(HaveLen(0))
Expect(k8sClient.Delete(ctx, cleaned)).To(Succeed())

Comment on lines +1039 to +1048
pipelineRun, err := a.createManagedPipelineRun(resources, taskRunSpecs, timeouts)
if err != nil {
return err
}

return a.cleanupPipelineResources(pipelineRun, roleBindings...)
a.logger.Info(fmt.Sprintf("Created %s Release PipelineRun (retry)", metadata.ManagedPipelineType),
"PipelineRun.Name", pipelineRun.Name, "PipelineRun.Namespace", pipelineRun.Namespace,
"Attempt", len(a.release.Status.ManagedPipelineAttempts))

return a.registerManagedProcessingData(pipelineRun, tenantRoleBinding)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a check to reuse an existing PipelineRun for the next attempt before creating a new one in retryManagedPipeline(). [possible issue, importance: 9]

Suggested change
pipelineRun, err := a.createManagedPipelineRun(resources, taskRunSpecs, timeouts)
if err != nil {
return err
}
return a.cleanupPipelineResources(pipelineRun, roleBindings...)
a.logger.Info(fmt.Sprintf("Created %s Release PipelineRun (retry)", metadata.ManagedPipelineType),
"PipelineRun.Name", pipelineRun.Name, "PipelineRun.Namespace", pipelineRun.Namespace,
"Attempt", len(a.release.Status.ManagedPipelineAttempts))
return a.registerManagedProcessingData(pipelineRun, tenantRoleBinding)
nextAttempt := len(a.release.Status.ManagedPipelineAttempts)
existing, getErr := a.loader.GetReleasePipelineRunAttempt(a.ctx, a.client, a.release, nextAttempt)
if getErr != nil && !errors.IsNotFound(getErr) {
return getErr
}
if existing != nil {
a.logger.Info(fmt.Sprintf("Reusing %s Release PipelineRun (retry)", metadata.ManagedPipelineType),
"PipelineRun.Name", existing.Name, "PipelineRun.Namespace", existing.Namespace,
"Attempt", nextAttempt)
return a.registerManagedProcessingData(existing, tenantRoleBinding)
}
pipelineRun, err := a.createManagedPipelineRun(resources, taskRunSpecs, timeouts)
if err != nil {
return err
}
a.logger.Info(fmt.Sprintf("Created %s Release PipelineRun (retry)", metadata.ManagedPipelineType),
"PipelineRun.Name", pipelineRun.Name, "PipelineRun.Namespace", pipelineRun.Namespace,
"Attempt", nextAttempt)
return a.registerManagedProcessingData(pipelineRun, tenantRoleBinding)

@fullsend-ai-review fullsend-ai-review Bot dismissed stale reviews from themself July 1, 2026 08:10

Superseded by updated review

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label Jul 1, 2026
@fullsend-ai-review

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 7:55 AM UTC · Completed 8:11 AM UTC
Commit: ec21706 · View workflow run →

@konflux-ci konflux-ci deleted a comment from codecov Bot Jul 1, 2026
@fullsend-ai-review

fullsend-ai-review Bot commented Jul 1, 2026

Copy link
Copy Markdown

🤖 Review · ❌ Terminated · Started 8:29 AM UTC · Ended 8:43 AM UTC
Commit: ec21706 · View workflow run →

@fullsend-ai-review

Copy link
Copy Markdown

🤖 Finished Review · ❌ Failure · Started 8:29 AM UTC · Completed 8:43 AM UTC
Commit: ec21706 · View workflow run →

@konflux-ci-qe-bot

Copy link
Copy Markdown

Scenario: konflux-e2e-tests
@seanconroy2021: The following test has Failed, say /retest to rerun failed tests.

PipelineRun Name Status Rerun command Build Log Test Log
konflux-e2e-tests-zc87p Failed /retest View Pipeline Log View Test Logs

Inspecting Test Artifacts

To inspect your test artifacts, follow these steps:

  1. Install ORAS (see the ORAS installation guide).
  2. Download artifacts with the following commands:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/konflux-test-storage/konflux-team/release-service:konflux-e2e-tests-zc87p

Test results analysis

🚨 Failed to provision a cluster, see the log for more details:

Click to view logs
�[37mDEBU�[0m running 'mapt aws kind create'               
�[37mDEBU�[0m context initialized for mapte90bd553         
�[37mDEBU�[0m checking stack spotOption-kind-konflux-e2e-tests-zc87p 
�[37mDEBU�[0m managing stack spotOption-kind-konflux-e2e-tests-zc87p 
�[36mINFO�[0m Updating (spotOption-kind-konflux-e2e-tests-zc87p): 
�[36mINFO�[0m                                              
�[36mINFO�[0m  +  pulumi:pulumi:Stack kind-konflux-e2e-tests-zc87p-spotOption-kind-konflux-e2e-tests-zc87p creating (0s)  
�[37mDEBU�[0m Based on prices for instance types [m5a.4xlarge m6in.4xlarge m6a.4xlarge m6i.4xlarge m6id.4xlarge m5.4xlarge m5ad.4xlarge m4.4xlarge] is az ap-south-1a, current price is 0.27 with a score of 9 
�[37mDEBU�[0m Spot data: {[m5a.4xlarge m6in.4xlarge m6a.4xlarge m6i.4xlarge m6id.4xlarge m5.4xlarge m5ad.4xlarge m4.4xlarge] 0.4932 ap-south-1 ap-south-1a 0} 
�[36mINFO�[0m @ updating.....                              
�[36mINFO�[0m  +  rh:qe:aws:bso bso-bso creating (0s)      
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  pulumi:pulumi:Stack kind-konflux-e2e-tests-zc87p-spotOption-kind-konflux-e2e-tests-zc87p created (3s)  
�[36mINFO�[0m  +  rh:qe:aws:bso bso-bso created            
�[36mINFO�[0m Outputs:                                     
�[36mINFO�[0m     az           : "ap-south-1a"             
�[36mINFO�[0m     instancetypes: [                         
�[36mINFO�[0m         [0]: "m5a.4xlarge"                   
�[36mINFO�[0m         [1]: "m6in.4xlarge"                  
�[36mINFO�[0m         [2]: "m6a.4xlarge"                   
�[36mINFO�[0m         [3]: "m6i.4xlarge"                   
�[36mINFO�[0m         [4]: "m6id.4xlarge"                  
�[36mINFO�[0m         [5]: "m5.4xlarge"                    
�[36mINFO�[0m         [6]: "m5ad.4xlarge"                  
�[36mINFO�[0m         [7]: "m4.4xlarge"                    
�[36mINFO�[0m     ]                                        
�[36mINFO�[0m     max          : 0.4932                    
�[36mINFO�[0m     region       : "ap-south-1"              
�[36mINFO�[0m     score        : 0                         
�[36mINFO�[0m                                              
�[36mINFO�[0m Resources:                                   
�[36mINFO�[0m     + 2 created                              
�[36mINFO�[0m                                              
�[36mINFO�[0m Duration: 4s                                 
�[36mINFO�[0m                                              
�[37mDEBU�[0m managing stack stackKind-kind-konflux-e2e-tests-zc87p 
�[36mINFO�[0m Updating (stackKind-kind-konflux-e2e-tests-zc87p): 
�[36mINFO�[0m                                              
�[36mINFO�[0m  +  pulumi:pulumi:Stack kind-konflux-e2e-tests-zc87p-stackKind-kind-konflux-e2e-tests-zc87p creating (0s)  
�[36mINFO�[0m @ updating..............                     
�[36mINFO�[0m  +  aws:ec2:Vpc vpc-main-akd-net creating (0s)  
�[36mINFO�[0m  +  aws:ec2:Eip main-akd-lbeip creating (0s)  
�[36mINFO�[0m @ updating.....                              
�[36mINFO�[0m  +  tls:index:PrivateKey main-akd-pk creating (0s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  tls:index:PrivateKey main-akd-pk created (0.87s)  
�[36mINFO�[0m  +  aws:ec2:Eip main-akd-lbeip created (2s)  
�[36mINFO�[0m  +  aws:ec2:KeyPair main-akd-pk creating (0s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:ec2:Vpc vpc-main-akd-net created (3s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:lb:TargetGroup main-akd-tg-9443 creating (0s)  
�[36mINFO�[0m  +  aws:lb:TargetGroup main-akd-tg-6443 creating (0s)  
�[36mINFO�[0m  +  aws:ec2:SecurityGroup main-akd-sg creating (0s)  
�[36mINFO�[0m  +  aws:ec2:KeyPair main-akd-pk created (1s)  
�[36mINFO�[0m @ updating.....                              
�[36mINFO�[0m  +  aws:lb:TargetGroup main-akd-tg-22 creating (0s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:lb:TargetGroup main-akd-tg-9443 created (3s)  
�[36mINFO�[0m  +  aws:lb:TargetGroup main-akd-tg-6443 created (3s)  
�[36mINFO�[0m  +  aws:ec2:Subnet subnet-publicmain-akd-net0 creating (0s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:lb:TargetGroup main-akd-tg-22 created (2s)  
�[36mINFO�[0m  +  aws:ec2:InternetGateway igw-main-akd-net creating (0s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:ec2:SecurityGroup default-main-akd-net-main-akd-net creating (0s)  
�[36mINFO�[0m  +  aws:ec2:SecurityGroup main-akd-sg created (5s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:ec2:Subnet subnet-publicmain-akd-net0 created (2s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:lb:TargetGroup main-akd-tg-8888 creating (0s)  
�[36mINFO�[0m  +  aws:ec2:InternetGateway igw-main-akd-net created (2s)  
�[37mDEBU�[0m Requesting a spot instance of types: m5a.4xlarge, m6in.4xlarge, m6a.4xlarge, m6i.4xlarge, m6id.4xlarge, m5.4xlarge, m5ad.4xlarge, m4.4xlarge at ap-south-1a paying: 0.493200 
�[36mINFO�[0m  +  aws:lb:LoadBalancer main-akd-lb creating (0s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:ec2:LaunchTemplate main-akd-lt creating (0s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:lb:TargetGroup main-akd-tg-8888 created (2s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:ec2:RouteTable routeTable-publicmain-akd-net0 creating (0s)  
�[36mINFO�[0m  +  aws:ec2:SecurityGroup default-main-akd-net-main-akd-net created (5s)  
�[36mINFO�[0m @ updating......                             
�[36mINFO�[0m  +  aws:ec2:RouteTable routeTable-publicmain-akd-net0 created (2s)  
�[36mINFO�[0m  +  aws:ec2:RouteTableAssociation routeTableAssociation-publicmain-akd-net0 creating (0s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:ec2:RouteTableAssociation routeTableAssociation-publicmain-akd-net0 created (1s)  
�[36mINFO�[0m @ updating.....                              
�[36mINFO�[0m  +  aws:ec2:LaunchTemplate main-akd-lt created (7s)  
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  aws:autoscaling:Group main-akd-asg creating (0s)  
�[36mINFO�[0m @ updating................                   
�[36mINFO�[0m  +  aws:autoscaling:Group main-akd-asg created (13s)  
�[36mINFO�[0m @ updating................................................................................................................................................... 
�[36mINFO�[0m  +  aws:lb:LoadBalancer main-akd-lb created (167s)  
�[36mINFO�[0m @ updating.....                              
�[36mINFO�[0m  +  aws:lb:Listener main-akd-listener-8888 creating (0s)  
�[36mINFO�[0m  +  aws:lb:Listener main-akd-listener-6443 creating (0s)  
�[36mINFO�[0m @ updating.....                              
�[36mINFO�[0m  +  aws:lb:Listener main-akd-listener-8888 created (2s)  
�[36mINFO�[0m  +  aws:lb:Listener main-akd-listener-22 creating (0s)  
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (0s)  
�[36mINFO�[0m  +  aws:lb:Listener main-akd-listener-6443 created (2s)  
�[36mINFO�[0m  +  aws:lb:Listener main-akd-listener-9443 creating (0s)  
�[36mINFO�[0m @ updating.....                              
�[36mINFO�[0m  +  aws:lb:Listener main-akd-listener-22 created (2s)  
�[36mINFO�[0m @ updating.....                              
�[36mINFO�[0m  +  aws:lb:Listener main-akd-listener-9443 created (3s)  
�[36mINFO�[0m @ updating...............                    
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (16s) Dial 1/inf failed: retrying 
�[36mINFO�[0m @ updating...................                
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (31s) Dial 2/inf failed: retrying 
�[36mINFO�[0m @ updating..................                 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (46s) Dial 3/inf failed: retrying 
�[36mINFO�[0m @ updating..................                 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (62s) Dial 4/inf failed: retrying 
�[36mINFO�[0m @ updating..................                 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (77s) Dial 5/inf failed: retrying 
�[36mINFO�[0m @ updating.......                            
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (81s)  
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (81s) status: done 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (81s) extended_status: done 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (81s) boot_status_code: enabled-by-generator 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (81s) last_update: Thu, 01 Jan 1970 00:00:53 +0000 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (81s) detail: DataSourceEc2Local 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (81s) errors: [] 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd creating (81s) recoverable_errors: {} 
�[36mINFO�[0m  +  command:remote:Command main-kind-readiness-akd-cmd created (81s) recoverable_errors: {} 
�[36mINFO�[0m @ updating....                               
�[36mINFO�[0m  +  command:remote:Command main-kubeconfig-akd-cmd creating (0s)  
�[36mINFO�[0m @ updating......                             
�[36mINFO�[0m  +  command:remote:Command main-kubeconfig-akd-cmd created (2s)  
�[36mINFO�[0m @ updating.....                              
�[36mINFO�[0m  +  pulumi:pulumi:Stack kind-konflux-e2e-tests-zc87p-stackKind-kind-konflux-e... the content is too long - please download the artifact to see the full content

OCI Artifact Browser URL

View in Artifact Browser

Retry managed PipelineRuns that fail with OOM or timeout.
Generic errors are not retried.

*Add EnsureManagedPipelineProcessingIsCompleted operation
for retry or mark release as failed
*Keep managed condition as Progressing during retries
*Apply memory and timeout mitigations on each retry
using the failed PipelineRun specs as the base
*Bump pipeline timeouts when task timeout grows
*Add attempt label for PipelineRun lookup
*Handle cleanup across multiple PipelineRun attempts
*Rename ManagedPipelineAttempt to PipelineAttempt
*Add GetRoleBindingFromPipelineAttempt and
GetReleasePipelineRunAttempt to loader

Assisted-by: Claude

Signed-off-by: Sean Conroy <sconroy@redhat.com>
@konflux-ci konflux-ci deleted a comment from codecov Bot Jul 1, 2026
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.70227% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.46%. Comparing base (3e0d009) to head (98c225b).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
controllers/release/adapter.go 79.86% 16 Missing and 14 partials ⚠️
loader/loader_mock.go 50.00% 2 Missing and 2 partials ⚠️
api/v1alpha1/release_types.go 90.90% 1 Missing and 1 partial ⚠️
retry/mitigations.go 97.84% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1683      +/-   ##
==========================================
+ Coverage   87.43%   87.46%   +0.03%     
==========================================
  Files          34       35       +1     
  Lines        3566     3846     +280     
==========================================
+ Hits         3118     3364     +246     
- Misses        285      303      +18     
- Partials      163      179      +16     
Flag Coverage Δ
e2e-tests 50.02% <29.12%> (-2.47%) ⬇️
unit-tests 82.03% <87.37%> (+0.54%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
api/v1alpha1/releaseplanadmission_types.go 100.00% <100.00%> (ø)
controllers/release/controller.go 90.19% <100.00%> (+0.19%) ⬆️
controllers/releaseplanadmission/adapter.go 87.23% <ø> (ø)
loader/loader.go 89.24% <100.00%> (+1.46%) ⬆️
metadata/labels.go 100.00% <ø> (ø)
retry/matcher.go 87.27% <ø> (ø)
api/v1alpha1/release_types.go 99.08% <90.90%> (+0.51%) ⬆️
retry/mitigations.go 97.84% <97.84%> (ø)
loader/loader_mock.go 49.29% <50.00%> (+0.08%) ⬆️
controllers/release/adapter.go 83.74% <79.86%> (-0.77%) ⬇️

Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e0d009...98c225b. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fullsend-ai-review

fullsend-ai-review Bot commented Jul 1, 2026

Copy link
Copy Markdown

🤖 Finished Review · ❌ Failure · Started 9:50 AM UTC · Completed 10:07 AM UTC
Commit: ec21706 · View workflow run →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-assisted Contains 'Assisted-By' enhancement New feature or request prod/needs-approval requires-manual-review Review requires human judgment Review effort 4/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants