Skip to content

ROSARoleConfig Controller Crashes with Nil Pointer Dereference #5860

@tinaafitz

Description

@tinaafitz

/kind bug
ROSARoleConfig controller panic: nil pointer dereference in reconcileAccountRoles()

The ROSARoleConfig controller intermittently crashes with a nil pointer dereference panic during the first reconciliation of new ROSARoleConfig resources. The crash occurs in the reconcileAccountRoles() function when attempting to access r.Runtime.AWSClient.

Impact:

  • Some ROSARoleConfig resources successfully create AWS IAM roles (14s-2m37s)
  • Others fail and remain stuck for 15+ minutes before timeout
  • The crash is non-deterministic, making ROSA HCP cluster provisioning unreliable
Full Panic Stack Trace:
E0205 09:57:27.912082       1 signal_unix.go:925] "Observed a panic" 
  controller="rosaroleconfig" 
  controllerGroup="infrastructure.cluster.x-k8s.io" 
  controllerKind="ROSARoleConfig" 
  ROSARoleConfig="ns-rosa-hcp/abc-ui-rosa-hcp-test-roles" 
  namespace="ns-rosa-hcp" 
  name="abc-ui-rosa-hcp-test-roles" 
  reconcileID="3872e30c-f6c7-404d-89f0-4d9ad903821d" 
  panic="runtime error: invalid memory address or nil pointer dereference" 
  panicGoValue="\"invalid memory address or nil pointer dereference\"" 
  stacktrace=<
  	panic({0x5357620?, 0x8a61240?})
  		/usr/lib/golang/src/runtime/panic.go:792 +0x132
Controller Error:
  E0205 09:57:27.912147       1 controller.go:347] "Reconciler error"
  err="panic: runtime error: invalid memory address or nil pointer dereference [recovered]"
  controller="rosaroleconfig"
  controllerGroup="infrastructure.cluster.x-k8s.io"
  controllerKind="ROSARoleConfig"
  ROSARoleConfig="ns-rosa-hcp/abc-ui-rosa-hcp-test-roles"
  namespace="ns-rosa-hcp"
  name="abc-ui-rosa-hcp-test-roles"
  reconcileID="3872e30c-f6c7-404d-89f0-4d9ad903821d"

Additional Context
The crash is recoverable at times:

Successful ROSARoleConfigs (recovered from crash or avoided it):

  • test-rosa-hcp-roles: 14 seconds
  • new-rosa-hcp-test-roles: 2 minutes 37 seconds

Failed ROSARoleConfigs (stuck in crash loop):

  • tst-rosa-hcp-roles: 15+ minutes before manual deletion
  • abc-ui-rosa-hcp-test-roles: Crashed on first reconcile

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions