Skip to content

Multicluster Support for Clowder#1773

Open
JuanmaBM wants to merge 11 commits into
RedHatInsights:masterfrom
JuanmaBM:feat/configurable-certificate-authorities
Open

Multicluster Support for Clowder#1773
JuanmaBM wants to merge 11 commits into
RedHatInsights:masterfrom
JuanmaBM:feat/configurable-certificate-authorities

Conversation

@JuanmaBM

@JuanmaBM JuanmaBM commented May 12, 2026

Copy link
Copy Markdown
Contributor

This PR implements comprehensive multicluster support for Clowder, enabling applications to seamlessly connect to dependencies across different clusters with proper authentication and TLS configuration.

Motivation

The Multicluster Challenge

Modern cloud-native applications need to connect to services across multiple clusters:

  • In-cluster services (ClowdApp) - Same cluster, direct k8s Service access, network isolation provides security
  • Cross-cluster services (ClowdAppRef) - Different clusters, routed through API gateways, require authentication

The existing V1 endpoint system doesn't distinguish between these scenarios, forcing applications to implement custom logic for:

  • Determining if authentication is needed
  • Selecting the appropriate protocol and port
  • Configuring TLS certificates correctly
  • Handling different connection patterns

V1 Limitations

Example V1 configuration:

{
  "endpoints": [
    {
      "app": "rbac",
      "hostname": "rbac-service.env.svc",
      "port": 8000,
      "tlsPort": 8443,
      "h2cPort": 0,
      "h2cTLSPort": 0
    }
  ]
}

Problems:

  • Applications must decide which port/protocol to use
  • No indication if authentication is required
  • Inconsistent choices across different apps
  • CA certificate configuration is global, not per-endpoint
  • Error-prone (easy to use wrong port or forget TLS)

Solution: V2 Opinionated Endpoints

Clean, Simple Configuration

Example V2 configuration (in-cluster):

{
  "dependencyEndpoints": {
    "v2": {
      "rbac": {
        "service": {
          "uri": "https://rbac-service.env.svc:8443",
          "ca_certificate": "/cdapp/certs/service-ca.crt",
          "authenticated": false
        }
      }
    }
  }
}

Example V2 configuration (cross-cluster):

{
  "dependencyEndpoints": {
    "v2": {
      "external-service": {
        "api": {
          "uri": "https://external-api.gateway.example.com:443",
          "authenticated": true
        }
      }
    }
  }
}

Key Features

Single URI - Applications use one field, no selection logic needed
Automatic Protocol Selection - TLS > H2C TLS > H2C > plaintext priority
Per-Endpoint CA Certificates - Only present when needed for in-cluster TLS
Authenticated Field - Explicit signal for authentication requirements
Backward Compatible - V1 endpoints still generated alongside V2

Endpoint Selection Priority

V2 endpoints follow this priority order (highest to lowest):

  1. TLS Port (tlsPort) - HTTPS with TLS encryption
  2. H2C TLS Port (h2cTLSPort) - HTTP/2 cleartext over TLS
  3. H2C Port (h2cPort) - HTTP/2 cleartext without TLS
  4. Standard Port (port) - HTTP plaintext

Example: If a service exposes port 8000 (HTTP), tlsPort 8443 (HTTPS), and h2cPort 8080, V2 will always select the HTTPS endpoint at port 8443.

Configuration Structure

Complete V2 Configuration Example

{
  "dependencyEndpoints": {
    "v2": {
      "rbac": {
        "service": {
          "uri": "https://rbac-service.env.svc:8443",
          "ca_certificate": "/cdapp/certs/service-ca.crt",
          "authenticated": false
        }
      },
      "external-api": {
        "gateway": {
          "uri": "https://api.external.example.com:443",
          "authenticated": true
        }
      }
    }
  },
  "privateDependencyEndpoints": {
    "v2": {
      "internal-service": {
        "admin": {
          "uri": "https://internal-admin.env.svc:10443",
          "ca_certificate": "/cdapp/certs/service-ca.crt",
          "authenticated": false
        }
      }
    }
  },
  "tlsCAPath": "/cdapp/certs/service-ca.crt",  // Root-level for backward compatibility
  "endpoints": [...],  // V1 format still present
  "privateEndpoints": [...]  // V1 private format still present
}

Backward Compatibility

Fully backward compatible

  • V1 endpoints (endpoints[] and privateEndpoints[]) continue to be generated
  • Root-level tlsCAPath maintained for existing applications
  • V2 is additive - both formats coexist in the same configuration
  • No CRD changes required for basic usage
  • Existing applications are completely unaffected

Migration Path

  1. Deploy this Clowder version - V2 endpoints appear in config alongside V1
  2. Update applications incrementally - Read from dependencyEndpoints.v2
  3. Remove custom logic - Delete endpoint selection and CA configuration code
  4. Use authenticated field - Add authentication only when authenticated: true
  5. Simplify applications - One line of code instead of complex selection logic

Related Issues

Closes: https://redhat.atlassian.net/browse/ENGPROD-9704
Closes: https://redhat.atlassian.net/browse/ENGPROD-9960


Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com
Signed-off-by: JuanmaBM juanmabareamartinez@gmail.com

@JuanmaBM JuanmaBM force-pushed the feat/configurable-certificate-authorities branch 5 times, most recently from 78f08eb to 48d9a9c Compare May 13, 2026 12:44
@JuanmaBM

Copy link
Copy Markdown
Contributor Author

/test-e2e

1 similar comment
@JuanmaBM

Copy link
Copy Markdown
Contributor Author

/test-e2e

@JuanmaBM JuanmaBM force-pushed the feat/configurable-certificate-authorities branch 3 times, most recently from 51ab74d to c3e250b Compare May 18, 2026 15:11
JuanmaBM and others added 4 commits May 20, 2026 10:11
Add support for ClowdApps to configure custom CA certificates for TLS
connections to dependencies. This enables multi-environment support
(non-OpenShift), custom PKI infrastructure, and granular security control.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add clowder/ca-secret-hash annotation to deployments that mount CA
secrets. When a CA certificate is rotated, the hash changes and
Kubernetes automatically restarts pods with the new certificate.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@JuanmaBM JuanmaBM force-pushed the feat/configurable-certificate-authorities branch from c3e250b to 2a10caf Compare May 20, 2026 08:11
@JuanmaBM

Copy link
Copy Markdown
Contributor Author

/test-e2e

@JuanmaBM JuanmaBM force-pushed the feat/configurable-certificate-authorities branch 2 times, most recently from 77b9ff2 to a9f0d3e Compare June 3, 2026 10:04
@JuanmaBM

JuanmaBM commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/test-e2e

@JuanmaBM JuanmaBM force-pushed the feat/configurable-certificate-authorities branch from a9f0d3e to f4650ff Compare June 3, 2026 10:45
@JuanmaBM

JuanmaBM commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/test-e2e

2 similar comments
@JuanmaBM

JuanmaBM commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/test-e2e

@JuanmaBM

JuanmaBM commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/test-e2e

@JuanmaBM JuanmaBM force-pushed the feat/configurable-certificate-authorities branch 2 times, most recently from b63e387 to ff52637 Compare June 3, 2026 14:54
@JuanmaBM

JuanmaBM commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/test-e2e

@JuanmaBM JuanmaBM force-pushed the feat/configurable-certificate-authorities branch from ff52637 to 5c66073 Compare June 3, 2026 15:30
@JuanmaBM

JuanmaBM commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/test-e2e

@JuanmaBM JuanmaBM force-pushed the feat/configurable-certificate-authorities branch from 5c66073 to 7a4f8d4 Compare June 3, 2026 16:03
@JuanmaBM

JuanmaBM commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/test-e2e

1 similar comment
@JuanmaBM

JuanmaBM commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/test-e2e

After extensive discussions with HCC migration team and platform architects,
refined implementation to focus on minimum viable feature set for multi-cluster
migration use case.

Changes:
- Remove consumer-level CA configuration (tlsCertificateAuthorityName field)
- Remove certificateauthority provider (no longer needed)
- Implement automatic per-dependency CA assignment:
  * ClowdApp (in-cluster) → service-ca.crt
  * ClowdAppRef (cross-cluster) → system trust store
- Refactor V2 endpoint generation to emit single opinionated entry per service
  (eliminates duplicate protocol variants: service, service_tls, service_h2c, service_h2c_tls)
- Update KUTTL tests to verify V2 endpoint structure for all 4 scenarios

Key decisions from team discussions:
- CA determined automatically by dependency type, not consumer configuration
- Simpler mental model: "local services use service-ca, remote services use system trust"
- Custom CA support deferred until there is actual demand (YAGNI principle)

Backward compatibility:
- V1 endpoints unchanged (ClowdApp behavior identical)
- V2 is new feature (no existing consumers)
- ClowdAppRef is new resource (PR RedHatInsights#1769, no production usage)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@JuanmaBM JuanmaBM force-pushed the feat/configurable-certificate-authorities branch from 7a4f8d4 to 2fb2f07 Compare June 9, 2026 10:28
@JuanmaBM

JuanmaBM commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

/test-e2e

Comment thread controllers/cloud.redhat.com/providers/dependencies/impl.go Outdated

if envTLSConfig.Enabled {
// if TLS is enabled environment-wide, set 'tlsCAPath' in the root level of cdappconfig
web.Config.TlsCAPath = provutils.GetServiceCACertPath()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will applications consuming this root-level field will silently get null now?. Any app that reads the root-level field (as a default/fallback) will break. Is this removal intentional?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code still removes the root-level tlsCAPath from both web/default.go and web/local.go. Any app reading cdappconfig.json at .tlsCAPath (the root-level field) will now silently get null when they upgrade to this version of Clowder. Can we fix this to not introduce a breaking change?

}
}

if envTLSConfig.Enabled {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code still removes the root-level tlsCAPath from both web/default.go and web/local.go. Any app reading cdappconfig.json at .tlsCAPath (the root-level field) will now silently get null when they upgrade to this version of Clowder. Can we fix this to not introduce a breaking change?

}
v2Map[ep.App][ep.Name+"_tls"] = endpoint
switch {
case ep.TlsPort != nil && *ep.TlsPort > 0:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if an app provides both REST API and a H2C service (like grpc) from the same application? We won't broadcast both of them anymore right? If I recall, some apps wanted to offer both of these protocols from the same app/container.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you take a look at the document written by Jozef here, one of the key points is: "One entry per dependency endpoint, not per protocol"

Of course, you can keep using V1 endpoints to define multiple protocols. Although it's a good point that you mentioned, if there are apps that have different protocols, we definitely should provide a mechanism to do that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsquizz I think that will still be possible - a service can expose REST as the public endpoint and GRPC as the private endpoint. The configuration (uri, ca_certificate) is specific for each endpoint.

As for which port to use, the H2C port is very specific (only needed by Gateways). GRPC clients initiate HTTP 2 connections right away and don't need the hint from Kubernetes API. I.e. I think this can be simplified to just

  • TLS if TLS port is set
  • plaintext port otherwise

@JuanmaBM JuanmaBM Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jharting mentioned simplifying the V2 endpoint selection logic to just:

  • TLS if TLS port is set
  • plaintext port otherwise

However, I'm concerned about breaking apps that expose both REST and gRPC on different ports within the same endpoint (public or private). For example:

Scenario:

  # ClowdEnvironment
  spec:                                                     
    providers:
      web:                                                                                                              
        port: 8000      # REST HTTP                                                                                     
        h2cPort: 9800   # gRPC HTTP/2                                                                                   
                                                            
  # ClowdApp
  spec:                                                                                                                 
    deployments:
      - name: service                                                                                                   
        webServices:                                        
          public:
            enabled: true
            h2cEnabled: true  # Exposes BOTH protocols on same endpoint

Current implementation (TLS > H2C TLS > H2C > plaintext):

  • V2 would expose: http://service:9800 (H2C has priority)
  • gRPC client connects to port 9800 (OK)

Simplified version (TLS > plaintext):

  • V2 would expose: http://service:8000 (plaintext)
  • gRPC client would try to connect to port 8000 (BROKEN)
  • Port 9800 wouldn't be exposed in V2 at all

Questions:

  1. Are there real use cases where apps use h2cPort != port for gRPC within the same endpoint?
  2. Or do gRPC apps typically:
    - Use the same port for both REST and gRPC (gRPC clients initiate HTTP/2 on any port)?
    - Or use public for REST and private for gRPC (separate endpoints)?

If there are no real cases where h2cPort needs to be different from port, then the simplification makes sense. Otherwise, I should keep the full priority logic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I am wrong, but I think that kessel enables both from the same deployment ... https://github.com/project-kessel/inventory-api/blob/main/deploy/kessel-inventory.yaml#L101-L105

@jharting jharting Jun 19, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuanmaBM v2 dependency endpoint should either use TLSPort, or Port. Never h2c. More context in https://redhat-internal.slack.com/archives/C08QV47C53J/p1781607554558099

Signed-off-by: JuanmaBM <juanmabareamartinez@gmail.com>
@JuanmaBM

Copy link
Copy Markdown
Contributor Author

/test-e2e

jq -r '.data["cdappconfig.json"]' < "${TMP_DIR}/${TEST_NAME}" | base64 -d > "${TMP_DIR}/${TEST_NAME}-json"

# Run assertions - verify v2 endpoint structure (plaintext, in-cluster)
jq -r '.dependencyEndpoints.v2.rbac.service.uri == "http://rbac-service.test-v2-clowdapp-plaintext.svc:8000"' -e < "${TMP_DIR}/${TEST_NAME}-json"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider also validating that ca_certificate is undefined in this case

jq -r '.data["cdappconfig.json"]' < "${TMP_DIR}/${TEST_NAME}" | base64 -d > "${TMP_DIR}/${TEST_NAME}-json"

# Run assertions - verify v2 endpoint structure
jq -r '.dependencyEndpoints.v2.rbac.service.uri == "http://rbac-dev.local:8000"' -e < "${TMP_DIR}/${TEST_NAME}-json"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider also validating that ca_certificate is undefined in this case

Comment thread tests/kuttl/test-v2-endpoints-clowdappref-tls/02-json-asserts.sh
Signed-off-by: JuanmaBM <juanmabareamartinez@gmail.com>
@JuanmaBM JuanmaBM requested a review from bsquizz June 11, 2026 17:15
@JuanmaBM

Copy link
Copy Markdown
Contributor Author

/test-e2e

@bsquizz bsquizz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my newest replies, plus one new issue found during another Claude review.

Comment thread controllers/cloud.redhat.com/providers/dependencies/impl.go Outdated
JuanmaBM and others added 2 commits June 17, 2026 14:13
… to signal whether clients should authenticate when connecting to endpoints.

Key changes:
- Add 'authenticated' field to DependencyEndpointV2 schema
- ClowdApp (in-cluster) dependencies set authenticated=false
- ClowdAppRef (cross-cluster) dependencies set authenticated=true
- Refactor endpoint builders to eliminate code duplication (~29% reduction)
- Rewrite V2 unit tests to use CRDs instead of V1 conversion
- Add KUTTL test assertions for 'authenticated' field

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: JuanmaBM <juanmabareamartinez@gmail.com>
The condition here only checks depAppRef.Spec.RemoteEnvironment.TLS.Enabled, but TLS can also be enabled via the env default (envWebConfig.TLS.Enabled).

Co-authored-by: Brandon Squizzato <35474886+bsquizz@users.noreply.github.com>
@JuanmaBM

Copy link
Copy Markdown
Contributor Author

/test-e2e

@JuanmaBM JuanmaBM changed the title Feat/configurable certificate authorities Multicluster Support for Clowder Jun 18, 2026
appMap[iapp.Name] = *iapp
}

v2Map := buildV2EndpointMap(publicFormat)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: with these changes, buildV2EndpointMap() seems no longer used, i.e. dead code that should be removed.

JuanmaBM added 2 commits June 23, 2026 10:06
Signed-off-by: JuanmaBM <juanmabareamartinez@gmail.com>
Signed-off-by: JuanmaBM <juanmabareamartinez@gmail.com>
@JuanmaBM

Copy link
Copy Markdown
Contributor Author

/test-e2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants