Skip to content

feat: Add clustering for loki.source.kubernetes_events#6027

Open
petewall wants to merge 5 commits intomainfrom
petewall/k8s-events-clustering
Open

feat: Add clustering for loki.source.kubernetes_events#6027
petewall wants to merge 5 commits intomainfrom
petewall/k8s-events-clustering

Conversation

@petewall
Copy link
Copy Markdown
Contributor

@petewall petewall commented Apr 9, 2026

Brief description of Pull Request

This change adds a clustering option for loki.source.kubernetes_events that will distribute the work according to the list. If no namespaces are specificied, then only a single instance will run. This is great, because it means that it will be safe to run on Alloy instances with multiple replicas without resulting in duplication.

Pull Request Details

Issue(s) fixed by this Pull Request

Fixes #401

Notes to the Reviewer

PR Checklist

  • Documentation added
  • Tests updated
  • Config converters updated

This change adds a `clustering` option for `loki.source.kubernetes_events` that will distribute the work according to the  list. If no namespaces are specificied, then only a single instance will run. This is great, because it means that it will be safe to run on Alloy instances with multiple replicas without resulting in duplication.

Signed-off-by: Pete Wall <pete.wall@grafana.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

💻 Deploy preview available (feat: Add clustering for loki.source.kubernetes_events):

@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Apr 9, 2026

Cool but I do think we will have at least one problem, that loki.source.kuberenetes also is affected by. It's that we use position file for tracking where we last read from.

The problem here and with that other component is that if a target moves to another instance we loose track of where we should start reading from and would start from beginning.

Not really a blocker itself but we should look into how we can solve this for both this one and loki.source.kubernetes.

@petewall
Copy link
Copy Markdown
Contributor Author

petewall commented Apr 9, 2026

A bit of testing:

  1. Created a kind cluster
  2. Deployed two alloys, both as 2-replica deployments, both using the same config:
          loki.source.kubernetes_events "cluster_events" {
            job_name   = "integrations/kubernetes/eventhandler"
            log_format = "logfmt"
            forward_to = [loki.write.loki.receiver]
            // These lines are only on the test alloy
            clustering {
              enabled = true
            }
          }

          loki.write "loki" {
            endpoint {
              url = "http://loki.loki.svc:3100/loki/api/v1/push"
              tenant_id = "1"
            }
            external_labels = {
              "source" = "control",
              "collector" = constants.hostname,
            }
          }
  1. First alloy using latest published version, second alloy using my source-built image

Both control-group alloy replicas reported the same events (duplication)
only one of the test group alloy replicas send events (deduplication)

Signed-off-by: Pete Wall <pete.wall@grafana.com>
@petewall petewall marked this pull request as ready for review April 9, 2026 16:47
@petewall petewall requested review from a team and clayton-cornell as code owners April 9, 2026 16:47
@petewall
Copy link
Copy Markdown
Contributor Author

petewall commented Apr 9, 2026

Cool but I do think we will have at least one problem, that loki.source.kuberenetes also is affected by. It's that we use position file for tracking where we last read from.

The problem here and with that other component is that if a target moves to another instance we loose track of where we should start reading from and would start from beginning.

Not really a blocker itself but we should look into how we can solve this for both this one and loki.source.kubernetes.

There's some good points here, but I'm not sure if it's big enough deal to hold this up. By default the TTL for kubernetes events is 1 hour, so any duplication would be bounded by that. And I feel the benefits (no more requirement for a singleton deployment) outweigh the concerns (potential short-term log duplication).

Is there an existing issue or a TODO for this, and we can try and fix that in another PR?

@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Apr 13, 2026

Is there an existing issue or a TODO for this, and we can try and fix that in another PR?

Yeah there is a issue for loki.source.kubernetes as mentioned in slack. But I saw that you added a note about this and that is enough I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make loki.source.kubernetes_events clusterable in flow-mode

2 participants