Skip to content

Issue#138 add probes and resource limits#139

Merged
AkarshES merged 9 commits intooracle:mainfrom
amaanx86:138-add-probes-and-resource-limits
Mar 31, 2026
Merged

Issue#138 add probes and resource limits#139
AkarshES merged 9 commits intooracle:mainfrom
amaanx86:138-add-probes-and-resource-limits

Conversation

@amaanx86
Copy link
Copy Markdown
Contributor

Fix Cloud Guard Container Security Findings by Adding Health Probes

Subject

Resolve OCI Cloud Guard findings (missing health probes) in OCI Native Ingress Controller by adding readiness and liveness probes to the deployment template.

Problem Statement

Cloud Guard was flagging multiple Container Security findings against the OCI Native Ingress Controller deployed as an OKE managed add-on, specifically:

  • Container without readiness probe (Medium risk)
  • Container without liveness probe (Medium risk)

This issue persisted across our OCI environment and could not be resolved by end users since the controller is deployed as a managed add-on (Deployment spec cannot be safely modified). However, these same findings would affect Helm-based deployments as well.

Solution

Added TCP socket-based health probes to the Helm deployment template:

Readiness Probe:

  • Port: webhook-server (9443)
  • Initial Delay: 30 seconds (allows controller startup time)
  • Period: 10 seconds (frequent health checks)
  • Timeout: 5 seconds
  • Failure Threshold: 3 attempts

Liveness Probe:

  • Port: webhook-server (9443)
  • Initial Delay: 60 seconds (allows stabilization)
  • Period: 20 seconds
  • Timeout: 5 seconds
  • Failure Threshold: 3 attempts

Implementation Details

  • Probes use TCP socket checks on the webhook server port (simpler, more reliable than HTTP for control plane components)
  • Conservative timing prevents flapping while ensuring quick failure detection
  • No breaking changes; existing deployments will inherit health probes from updated Helm chart

Testing

  • Verified probes are correctly configured in deployment template
  • Tested with probe timings to ensure no false positives
  • Cloud Guard findings should resolve after deployment update

Relates To

Closes #138

Commits

  1. Fix OCI Native Ingress Controller (OKE managed add-on) flagged by Cloud Guard for missing probes and resource limits #138: Add readiness probe to OCI Native Ingress Controller
  2. Fix OCI Native Ingress Controller (OKE managed add-on) flagged by Cloud Guard for missing probes and resource limits #138: Add liveness probe to OCI Native Ingress Controller
  3. Fix OCI Native Ingress Controller (OKE managed add-on) flagged by Cloud Guard for missing probes and resource limits #138: Document health probes configuration in values.yaml

Add TCP socket readiness probe on webhook-server port (9443) for CloudGuard compliance and operational reliability.

Signed-off-by: Amaan Ul Haq Siddiqui <amaanulhaq.s@outlook.com>
Add TCP socket liveness probe on webhook-server port (9443) for CloudGuard compliance and operational reliability. Ensures container restarts automatically if unhealthy.

Signed-off-by: Amaan Ul Haq Siddiqui <amaanulhaq.s@outlook.com>
Add comments explaining readiness and liveness probe behavior for Cloud Guard compliance.

Signed-off-by: Amaan Ul Haq Siddiqui <amaanulhaq.s@outlook.com>
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Feb 14, 2026
AkarshES
AkarshES previously approved these changes Feb 16, 2026
@amaanx86
Copy link
Copy Markdown
Contributor Author

Thank you @AkarshES for approving the changes, let me know what is further required to finalize this pull request!

nirpai
nirpai previously approved these changes Feb 18, 2026
Comment thread helm/oci-native-ingress-controller/templates/deployment.yaml Outdated
@amaanx86 amaanx86 dismissed stale reviews from nirpai and AkarshES via d433363 February 18, 2026 17:10
… server

Replace TCP socket probes on webhook-server with HTTP GET endpoints
(/healthz/ready for readiness, /healthz/live for liveness) that connect
to the metrics server port.

Signed-off-by: Amaan Ul Haq Siddiqui <amaanulhaq.s@outlook.com>
Signal to the health checker that all informer caches have been synced
after setup, enabling readiness checks to report ready status.

Signed-off-by: Amaan Ul Haq Siddiqui <amaanulhaq.s@outlook.com>
…racking

Register /healthz/ready and /healthz/live HTTP endpoints on the metrics
server and mark controllers as ready after initialization for proper
health probe support.

Signed-off-by: Amaan Ul Haq Siddiqui <amaanulhaq.s@outlook.com>
…iveness probes

Implement HealthChecker with endpoints for tracking cache synchronization
and controller readiness status. Provides /healthz/ready and /healthz/live
handlers for Kubernetes probe support.

Signed-off-by: Amaan Ul Haq Siddiqui <amaanulhaq.s@outlook.com>
@amaanx86 amaanx86 force-pushed the 138-add-probes-and-resource-limits branch from d433363 to 0373bae Compare February 18, 2026 17:19
@amaanx86
Copy link
Copy Markdown
Contributor Author

Hi @nirpai, @AkarshES,
As requested, I’ve replaced the webhook TCP probes with dedicated HTTP health check endpoints and controller readiness logic.

Helm Deployment Test

Controller deployed from my branch image:
https://hub.docker.com/r/amaanx86/oci-native-ingress-controller

image

Health endpoints verified inside the pod

image

Ingress Functional Test

image

Document HTTP readiness and liveness endpoints on the metrics server.

Signed-off-by: Amaan Ul Haq Siddiqui <amaanulhaq.s@outlook.com>
AkarshES
AkarshES previously approved these changes Feb 20, 2026
Copy link
Copy Markdown
Contributor

@AkarshES AkarshES left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the HTTP healthcheck as Niranjan requested

@amaanx86 amaanx86 requested a review from nirpai February 20, 2026 07:50
nirpai
nirpai previously approved these changes Feb 20, 2026
# maxUnavailable: 1

# The TCP port the Webhook server binds to. (default 9443)
# Health probes for operational reliability and Cloud Guard compliance
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is misplaced on webhook Port.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have moved it to metrics server @nirpai

requesting approval on PR @nirpai @AkarshES

Relocate health probe documentation from webhookBindPort to the metrics
section where the probes actually connect.

Signed-off-by: Amaan Ul Haq Siddiqui <amaanulhaq.s@outlook.com>
@amaanx86 amaanx86 dismissed stale reviews from nirpai and AkarshES via 11faf4a February 20, 2026 10:09
@amaanx86 amaanx86 requested review from AkarshES and nirpai February 20, 2026 10:13
@amaanx86
Copy link
Copy Markdown
Contributor Author

Hi @nirpai @AkarshES Can we merge this now ?

@AkarshES
Copy link
Copy Markdown
Contributor

We are doing some validation in OKE environment so that we can go ahead and merge this change. Thank you for your effort and patience on this

@amaanx86
Copy link
Copy Markdown
Contributor Author

@AkarshES Thank you for keeping me updated

@amaanx86
Copy link
Copy Markdown
Contributor Author

What this fixes

Cloud Guard was flagging the OCI Native Ingress Controller for missing readiness and liveness probes (Medium risk, Container Security). This affects both managed OKE add-on deployments and Helm-based ones.


What changed

Initial implementation (TCP probes)

Added readiness and liveness probes to the Helm deployment template using TCP socket checks on the webhook server port (9443):

  • Readiness: TCP on port 9443, initial delay 30s, period 10s, timeout 5s, failure threshold 3
  • Liveness: TCP on port 9443, initial delay 60s, period 20s, timeout 5s, failure threshold 3
  • Documented probe configuration in values.yaml

Moved to HTTP probes with dedicated health endpoints

After reviewer feedback, replaced the TCP probes with proper HTTP health endpoints for more reliable checks:

  • Added pkg/server/health.go - a HealthChecker that tracks informer cache sync state and controller readiness, exposing /healthz/ready and /healthz/live handlers
  • Updated pkg/server/server.go - registers both endpoints on the existing metrics server (port 2223)
  • Updated main.go - signals cache sync after informer setup so readiness only returns healthy once the controller is warmed up
  • Updated deployment.yaml - probes now use HTTP GET /healthz/ready and HTTP GET /healthz/live on the metrics port instead of TCP on the webhook port
  • Moved probe comments in values.yaml to the metrics section where they belong

Testing

  • Deployed from branch image: amaanx86/oci-native-ingress-controller (Docker Hub)
  • Verified /healthz/ready and /healthz/live respond correctly inside the pod
  • Tested ingress creation end-to-end after deployment

Note on PR description

Updating the description here for visibility and documentation - the initial PR body described the original TCP probe approach. Since then the implementation evolved significantly based on reviewer feedback, so this updated description reflects the full picture of what was done.


@nirpai @AkarshES - the PR is already approved and from what I understand just pending your OKE environment validation. Could you share an update on where that stands? Happy to help with anything if needed. Once validation is good, would appreciate a merge when you get a chance. Thank you both for your time and feedback throughout this.

Closes #138

@AkarshES AkarshES merged commit 2f39412 into oracle:main Mar 31, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OCI Native Ingress Controller (OKE managed add-on) flagged by Cloud Guard for missing probes and resource limits

3 participants