Skip to content

Fix Loki to work with Flux grafana/prometheus examples again#47

Merged
matheuscscp merged 3 commits intofluxcd:mainfrom
kingdonb:fix-loki
Feb 12, 2025
Merged

Fix Loki to work with Flux grafana/prometheus examples again#47
matheuscscp merged 3 commits intofluxcd:mainfrom
kingdonb:fix-loki

Conversation

@kingdonb
Copy link
Copy Markdown
Member

@kingdonb kingdonb commented Feb 12, 2025

Fix #19, Close #23

I got this working locally, running on Kind

In the grafana dashboard, there is a loki data source configured with address loki-stack:3100

I was unable to find this string in the kube-prometheus-stack chart, so I'm guessing it has been inferred somehow by Grafana's chart? The old loki-stack chart that provided promtail is obsolete.

I updated this to use loki chart instead, that is maintained, there is an OCI chart but it isn't kept up-to-date (the tag is labeled "weekly" but is several months out-of-date, and the latest release is in the legacy helm repository for grafana charts)

The loki chart is kept current. There's a promtail chart in the grafana helm charts repo as well.

I disabled authentication, because I believe that's how kube-prometheus-stack did it. In no way is this a sensible Loki configuration, but it works in kind for demonstrative purposes. I will personally be disabling loki on my clusters, not because this is too hard, but because we're using cloudtrail for logs (which I think has Grafana integration too)

Anyway, we can't demo cloudtrail in a kind cluster so this is better.

It still needs some documentation updates, the README for example shows the loki and promtail pods deployed from loki-stack chart, but mine looks a bit more like this now:

monitoring           kube-prometheus-stack-grafana-58977b8976-vw65n              3/3     Running   0          78m
monitoring           kube-prometheus-stack-kube-state-metrics-676657b78f-sfknq   1/1     Running   0          78m
monitoring           kube-prometheus-stack-operator-7467d457b7-x2jwc             1/1     Running   0          78m
monitoring           kube-prometheus-stack-prometheus-node-exporter-cq2nn        1/1     Running   0          78m
monitoring           loki-canary-w6c8r                                           1/1     Running   0          21m
monitoring           loki-stack-0                                                2/2     Running   0          21m
monitoring           loki-stack-gateway-589dcdd94f-qljpd                         1/1     Running   0          23m
monitoring           loki-stack-minio-0                                          1/1     Running   0          23m
monitoring           prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          77m
monitoring           promtail-m4snt                                              1/1     Running   0          10m

I also disabled the two redis instances that loki installs in the default monolithic version, and verified that I can see some logs in the "Flux Logs" dashboard. I am not certain about this config but I will come back and review it ASAP so we can close out long-standing issues like #19 and hopefully avoid removing Loki from the example.

Some of these logs do not look like they are from Flux:

Screenshot 2025-02-12 at 12 44 20 PM

I have a feeling if I look more closely at the config for promtail that I discarded, I'll find that it was meant to select the flux pods somehow. Unsure. I can't spend more time on this right now, but will try to come back with another try before the dev meeting next week. (At a minimum I'll clean up the bootstrap artifacts and other noise from this PR so we can merge it.)

@kingdonb
Copy link
Copy Markdown
Member Author

rebasing from babf064 to squash commits and fix the DCO

@kingdonb kingdonb marked this pull request as ready for review February 12, 2025 18:00
@kingdonb
Copy link
Copy Markdown
Member Author

This should be basically in shape to merge. Worth a second test, and someone could possibly look at that Loki Flux Logs dashboard and tell whether it's working properly or not, I am not sure the Flux Events checkbox is performing its function.

@kingdonb
Copy link
Copy Markdown
Member Author

There are a couple of changes I'd like to borrow from #19 and #23 before this merges.

It looks like we don't really need this canary service, and we could use the configmap for data sources to ensure that loki is picked up at the proper address. Also, promtail may have some helm tests that we want to be sure we're not invoking. (??)

I built this PR in isolation without looking at #23 but I should probably go back and check that one out before we call it good here.

@kingdonb
Copy link
Copy Markdown
Member Author

I reviewed the changes in #23 and the only one I can't make sense of is compaction.

Setting the compaction as it was set there results in:

% k -n monitoring logs -f loki-stack-0 -c loki
failed parsing config: /etc/loki/config/config.yaml: yaml: unmarshal errors:
  line 25: field shared_store not found in type compactor.Config. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file

which indicates that shared_store: filesystem is no longer a valid setting.

I'm not a Grafana or Loki expert, but I think that setting compactor: replicas: 0 probably pre-empts this compaction config and renders it inert, anyway. I'll delete it, since I'm sure we are not thinking about retention for this example.

@kingdonb
Copy link
Copy Markdown
Member Author

I did not end up borrowing this section:

https://github.com/fluxcd/flux2-monitoring-example/pull/23/files#diff-5e041afacf25eb055565b4a1c32d5b81201ddce29c84adf13a6ae88463e0832bR62-R82

which would have allowed us to call the release loki instead of loki-stack

It may be worth adopting in a later follow-on PR 👍

@kingdonb
Copy link
Copy Markdown
Member Author

I think that loki must depend on minio now, the error:

caller=log.go:216 msg="error running loki" err="at least one bucket name must be specified

I had it working with minio earlier, but I was trying to remove unnecessary components that might have licensing issues. I don't think this really does, because we aren't distributing a modified minio, just using it in an example, but I don't see the point of deploying a minio when we're trying to use local filesystem for storage. Maybe that just isn't supported anymore.

I'm not sure how to solve this, will come back to it later.

@kingdonb
Copy link
Copy Markdown
Member Author

It does look like the filesystem store just is not supported anymore:

https://grafana.com/docs/loki/latest/operations/storage/filesystem/

@kingdonb
Copy link
Copy Markdown
Member Author

I did wind up getting everything working. Two things I am not sure about:

  • The dependency order. Does promtail need to go after loki? If it does not, installation could be much faster.
  • The dashboard connection to the data source. When you go to the Flux Logs page, the Loki datasource is usually selected. But depending on the order, it might need to be selected and you see a data source not found error.

It is working on a kind cluster without issue, I don't think either of those issues are show-stoppers. I did wind up needing that config from #23, in df39529. Without it, the data source was not being created. Maybe there is some auto-detection that happens in some odd cases, or on grafana boot, but since grafana is in kube-prometheus-stack it boots first.

So, upon repeated testing I found that without this config section I would not get a data source for loki at all.

I think this is good to merge 👍

@kingdonb kingdonb marked this pull request as ready for review February 12, 2025 19:42
@kingdonb
Copy link
Copy Markdown
Member Author

I would like to resolve the few remaining references to loki-stack out before this merges, sec... (retesting)

kingdonb and others added 3 commits February 12, 2025 15:32
* Select latest version of kube-prometheus-stack (69.x.x)
* remove loki-stack parts from chart
* Grafana loki chart is at 6.26.0
* Disable auth: loki (this is not for production!)
* Add back promtail manually

(this was previously included in the loki-stack chart)

* Continue using the name loki-stack

Something in grafana is preconfigured to pull from the data source
loki-stack:3100

> Provisioned data source
> This data source was added by config and cannot be modified using the UI. Please contact your server admin to update this data source.

I was unable to find the location where this string comes from!
Perhaps the use of a misleading name is not needed, but this worked?

* try out later monolithic Loki config from the docs, with these
  divergences:

* Disable chunksCache that requests 8GB+ memory by default
* resultsCache:enabled: false as well

Signed-off-by: Kingdon B <kingdon@urmanac.com>

fix README for loki-stack change

Signed-off-by: Kingdon B <kingdon@urmanac.com>

disable minio, use filesystem for logging instead

Signed-off-by: Kingdon B <kingdon@urmanac.com>

clean up duplicate yaml

Signed-off-by: Kingdon B <kingdon@urmanac.com>

disable test (loki)

Signed-off-by: Kingdon B <kingdon@urmanac.com>

remove compactor config

Signed-off-by: Kingdon B <kingdon@urmanac.com>

just use minio

Signed-off-by: Kingdon B <kingdon@urmanac.com>

turns out we need this after all

Signed-off-by: Kingdon B <kingdon@urmanac.com>

update README to match

Signed-off-by: Kingdon B <kingdon@urmanac.com>

update loki-stack -> loki (strings)

Signed-off-by: Kingdon B <kingdon@urmanac.com>

depend on kube-prometheus-stack instead

don't depend on loki here

Signed-off-by: Kingdon B <kingdon@urmanac.com>

update README

Signed-off-by: Kingdon B <kingdon@urmanac.com>
Signed-off-by: Kingdon B <kingdon@urmanac.com>
Signed-off-by: Kingdon B <kingdon@urmanac.com>
@kingdonb
Copy link
Copy Markdown
Member Author

LGTM now 👍

@matheuscscp matheuscscp merged commit 2703382 into fluxcd:main Feb 12, 2025
2 checks passed
@kingdonb kingdonb deleted the fix-loki branch February 12, 2025 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

loki-stack helm charts is deprecated

2 participants