- 
                Notifications
    
You must be signed in to change notification settings  - Fork 775
 
feat: Add package registry to eck #8800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
          ✅ Snyk checks have passed. No issues have been found so far.
 💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.  | 
    
          🔍 Preview links for changed docs | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick look from the side of the team maintaining Package Registry.
It looks great, thanks for adding support for package registry in ECK, this will help many users.
Added some comments, please let us know if you need a more in-depth review from our side.
| AgentImage Image = "elastic-agent/elastic-agent" | ||
| MapsImage Image = "elastic-maps-service/elastic-maps-server" | ||
| LogstashImage Image = "logstash/logstash" | ||
| PackageRegistryImage Image = "package-registry/distribution" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an image used by default, or setting the image is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the default version is package-registry/distribution:<version> where the user specifies the version. Version is a required field so if they do not specify one it will fail. Something I thought about was adding a epr_type or something like that where the user could specify the different EPR versions we publish. Like 9.1.2, lite-9.1.2, production, and lite. However I think its just as easy to specify the image in the template if you want something other than package-registry/distribution:<version>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While doing a first test I just realized that the default image is around ~5Gb 😅 (I was wondering why my container was not starting 🙃 ). I'm also wondering if it would not make sense to have short flag to select the image "type".
Edit: I just had a Pod that failed to start with the following error on GKE:
  Warning  Failed               90s    kubelet            Failed to pull image "docker.elastic.co/package-registry/distribution:9.1.0": failed to pull and unpack image "docker.elastic.co/package-registry/distribution:9.1.0": failed to extract layer sha256:0f0888ef6ac576c67e3a9acf8ec7216533b7f3144aeb14c9b93d0db9469830cd: write /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/889/fs/packages/package-storage/security_detection_engine-9.0.9-beta.1.zip: no space left on device: unknown
I also had disk pressure conditions. I think the image size should be highlighted in the documentation so that K8s nodes can handle it.
Edit 2: This maybe also means that we need to check the disk size on the nodes used for our e2e tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsoriano do you think we should change the default image to be something smaller? I selected this image because it is what we recommend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the lite images were added as smaller images for this kind of use cases, but even this image is starting to be too big.
There are vanilla images that don't contain any package. They fail to start if no directory with packages is configured, but they can also be configured in proxy mode, to forward requests for example to the public EPR.
Latest of these images is docker.elastic.co/package-registry/package-registry:v1.31.1.
There is an open issue about allowing to start even if no package is available yet: elastic/package-registry#1351.
More about the proxy mode in https://github.com/elastic/package-registry/?tab=readme-ov-file#proxy-mode.
We also have a WIP to create custom distributions, and smaller images elastic/package-registry#1335.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds Elastic Package Registry (EPR) support to ECK, introducing a new CRD for deploying EPR instances and enabling Kibana to reference EPR instances for Fleet package management.
- Adds 
ElasticPackageRegistryCRD with controller to manage EPR deployments - Enables Kibana to associate with EPR instances via 
packageRegistryReffield - Implements TLS certificate handling and CA mounting for secure communication between Kibana and EPR
 
Reviewed Changes
Copilot reviewed 60 out of 61 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description | 
|---|---|
pkg/apis/epr/v1alpha1/ | 
New API definitions for ElasticPackageRegistry CRD | 
pkg/controller/packageregistry/ | 
Controller implementation for managing EPR resources | 
pkg/controller/association/controller/kibana_epr.go | 
Association controller for Kibana-EPR relationships | 
pkg/apis/kibana/v1/kibana_types.go | 
Adds packageRegistryRef field and EPR association support | 
pkg/controller/kibana/ | 
Updates Kibana controller to handle EPR associations and CA certificates | 
test/e2e/ | 
E2E tests for EPR functionality and associations | 
Comments suppressed due to low confidence (1)
pkg/controller/kibana/pod_test.go:1
- The comment on line 67 says 'readinessProbe is the readiness probe for the maps container' but this function is in the packageregistry controller and should refer to the package registry container.
 
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
          
 Added EPR to ECK diagnostics here. I still need to update our documentation. I know we changed how we do documentation so if you could please point me in the right direction where ECK docs live that would be great. This PR is ready for another review. Thanks!  | 
    
          
 Documentation has moved here: https://github.com/elastic/docs-content Please note that the  I'll try to take another look to your PR this week, I'm struggling to keep up with the pace of PRs opened in this repo 😅  | 
    
| 
           buildkite test this -f p=gke,E2E_TAGS=epr  | 
    
| 
           buildkite test this -f p=gke,E2E_TAGS=epr  | 
    
| 
           E2E tests are still failing with: We also have to find a solution for this. IIUC elastic/package-registry#1335 would be the way to go? I'll also try to check if we can increase the disk on GKE nodes, but it would still mean that we may have to skip other providers (AWS, Azure, Kind...). IIUC the image currently requires ~14Gi:  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pods are being recreated in my dev env because of changes in packageregistry.k8s.elastic.co/config-hash while I do nothing:
Edit: Nevermind, it's because I used short lived certificates to test certificate rotation, and it seems it requires the Pod to be recreated. @tehbooom Could you confirm that certificates are not hot reloaded?
| if [[ -f "` + volume.EPRCACertPath + `" ]]; then | ||
| echo "EPR CA certificate found, checking for user-provided CA bundle..." | ||
| # Check if user provided their own NODE_EXTRA_CA_CERTS (from init container env) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make things easier to implement and debug if we ask the users to not use NODE_EXTRA_CA_CERTS but instead allow them to add custom certificates using an alternate env. variable, for example KIBANA_EXTRA_CA_CERTS (I'm asking because the way we handle NODE_EXTRA_CA_CERTS feels a bit "hacky" to me tbh)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one issue with that would be that there might be already users that we would break if we stopped supporting user defined NODE_EXTRA_CA_CERTS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one issue with that would be that there might be already users that we would break if we stopped supporting user defined
NODE_EXTRA_CA_CERTS
Agreed. However, if we did use KIBANA_EXTRA_CA_CERTS what would it make it easier to debug and implement? You are still checking if the user provided extra CAs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial look at this I failed to submit previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks reasonable with one exception: I am also not a big fan of of the NODE_EXTRA_CA_CERTS init container hackery but don't see a good alternative right now if we want to be backwards compatible.
Main blocker to merge this imo is the lack of UBI images for the package registry.
I could not get the recommended default image to work, as the k8s nodes in my environment ran out of ephemeral storage while trying to pull the image. I wonder how viable this image is for any environment? I was successful with the lite image though.
        
          
                deploy/eck-stack/charts/eck-package-registry/templates/epr.yaml
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | if [[ -f "` + volume.EPRCACertPath + `" ]]; then | ||
| echo "EPR CA certificate found, checking for user-provided CA bundle..." | ||
| # Check if user provided their own NODE_EXTRA_CA_CERTS (from init container env) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one issue with that would be that there might be already users that we would break if we stopped supporting user defined NODE_EXTRA_CA_CERTS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ive update the test image to be the lite image since others have stated it works with that image
| ############################# | ||
| # Handle EPR CA certificate consolidation with user-provided NODE_EXTRA_CA_CERTS | ||
| # This must run before the early exit to ensure certificates are always processed | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the scenario as its been sometime. But putting this after the check to see if it was already initialized would give me x509 errors. I think it was if I added EPR after Kibana was already deployed it wouldnt pick up the new certificate.
| if [[ -f "` + volume.EPRCACertPath + `" ]]; then | ||
| echo "EPR CA certificate found, checking for user-provided CA bundle..." | ||
| # Check if user provided their own NODE_EXTRA_CA_CERTS (from init container env) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one issue with that would be that there might be already users that we would break if we stopped supporting user defined
NODE_EXTRA_CA_CERTS
Agreed. However, if we did use KIBANA_EXTRA_CA_CERTS what would it make it easier to debug and implement? You are still checking if the user provided extra CAs.
| 
           buildkite test this -f p=gke,E2E_TAGS=epr  | 
    
| 
           buildkite test this -f p=gke,E2E_TAGS=epr  | 
    
Elastic Package Registry (EPR) has been highly requested to be added to ECK.
EPR does not have any references since it does not require a license nor any other application.
The following was implemented for EPR
xpack.fleet.registryUrland set the environment variableNODE_EXTRA_CA_CERTSto the path of EPR's CA which is mountedNODE_EXTRA_CA_CERTSwith a mount the controller will combine the certs appending the EPR's CA to the users specified CAThis was tested with and without setting
NODE_EXTRA_CA_CERTSusing the below manifest