Skip to content

Add prometheus metrics at /api/v1/metrics#3576

Closed
Wint3rmute wants to merge 7 commits intoiv-org:masterfrom
Wint3rmute:add-prometheus-metrics-endpoint
Closed

Add prometheus metrics at /api/v1/metrics#3576
Wint3rmute wants to merge 7 commits intoiv-org:masterfrom
Wint3rmute:add-prometheus-metrics-endpoint

Conversation

@Wint3rmute
Copy link
Copy Markdown
Contributor

@Wint3rmute Wint3rmute commented Jan 18, 2023

Hello,

Please note that this MR is purely a suggestion for a feature. If you don't think that such a feature is needed, feel free to close it :)


This MR adds a prometheus-compatible /api/v1/metrics endpoint, which serves the subset of metrics (only numeric ones) contained in the /api/v1/stats, but in Prometheus-compatible format. The endpoint will only work if statistics_enabled: true in the configuration.

Motivation

Having a prometheus-compatible endpoint could make it easier for instance maintainers to monitor their instance. I can only speak for myself, but I've found Prometheus to be quite useful both for monitoring and for setting alerts when one of my services goes down. There's not much useful information contained in the metrics added by this MR, however I can imagine that more useful metrics could be extracted from invidious. Unfortunately, I'm not familiar with the project's code at the moment, so I'm trying to take small steps here :)

Possible paths for further development

  1. Extracting more useful metrics. If you know any places in the code that could emit metrics that would be useful for instance maintaners, please point me there 😄
  2. Separate configuration option for enabling the metrics, instead of relying on statistics_enabled
  3. Serving the metrics on a separate port and address, so it can be isolated to a network only accessible for the instance maintainer.

@Wint3rmute Wint3rmute requested a review from a team as a code owner January 18, 2023 00:03
@Wint3rmute Wint3rmute requested review from unixfox and removed request for a team January 18, 2023 00:03
@Wint3rmute Wint3rmute force-pushed the add-prometheus-metrics-endpoint branch 3 times, most recently from 8e938bc to c57505a Compare January 18, 2023 00:20
@unixfox
Copy link
Copy Markdown
Member

unixfox commented Jan 20, 2023

Not a bad idea and without any external dependency, so it's great.

@SamantazFox
Copy link
Copy Markdown
Member

  1. Extracting more useful metrics. If you know any places in the code that could emit metrics that would be useful for instance maintaners, please point me there smile

I've had the idea of more metrics in mind for quite a while.
the main problem is that almost no place in the code emits them, and the current storage (postgres DB) is not made for that purpose.

I think that once I'm done with #3628, we'll be able to add various metrics.

A non-exhaustive list of (imo) useful metrics:

  • amount of queries per major endpoint (channel, watch page, videoplayback, images proxy, static files)
  • amount of data (MiB) returned per major endpoints
  • average response latency (per 1/6/12/24h)
  • number of DB queries
  1. Separate configuration option for enabling the metrics, instead of relying on statistics_enabled

I'm not sure about that one. Statistics are metrics.

  1. Serving the metrics on a separate port and address, so it can be isolated to a network only accessible for the instance maintainer.

In my opinion, this should be handled in the reverse proxy config. We need to inform the maintainers across the documentation, but I don't think we should add all that complexity.

@alx-alexpark
Copy link
Copy Markdown

Seems like a nice feature, just make sure it doesn't compromise the privacy of users.

@Wint3rmute Wint3rmute force-pushed the add-prometheus-metrics-endpoint branch 2 times, most recently from 92e08af to 81f0bc4 Compare March 19, 2023 16:24
@Wint3rmute
Copy link
Copy Markdown
Contributor Author

Wint3rmute commented Mar 19, 2023

I've learned more about the Kemal framework and rewrote the metrics collection, trying to fullfill the ideas described by @SamantazFox:

  • amount of queries per major endpoint (channel, watch page, videoplayback, images proxy, static files)
  • average response latency (per 1/6/12/24h)

*response latency per X hours can be calculated using a monitoring system such as Prometheus, imo such aggregation should not be performed by Invidious. Metrics that this MR provides return the sum of seconds spent on handling a particular route, which can be used to calculate the latency per X hours. Afaik this is also what Prometheus suggests to do in this scenario.

Here's a sample of the new metrics, running on a local instance of Invidious:

http_requests_total{method="/api/v1/metrics" route="GET" response_code="200"} 66
http_requests_total{method="/" route="GET" response_code="302"} 2
http_requests_total{method="/feed/popular" route="GET" response_code="200"} 2
http_requests_total{method="/search" route="GET" response_code="200"} 2
http_requests_total{method="/vi/:id/:name" route="GET" response_code="200"} 43
http_requests_total{method="/watch" route="GET" response_code="200"} 3
http_requests_total{method="/ggpht/*" route="GET" response_code="200"} 26
http_requests_total{method="/latest_version" route="GET" response_code="302"} 6
http_requests_total{method="/api/v1/storyboards/:id" route="GET" response_code="200"} 3
http_requests_total{method="/api/v1/comments/:id" route="GET" response_code="200"} 3
http_request_duration_seconds_sum{method="/api/v1/metrics" route="GET" response_code="200"} 0.05781634
http_request_duration_seconds_sum{method="/" route="GET" response_code="302"} 0.002360641
http_request_duration_seconds_sum{method="/feed/popular" route="GET" response_code="200"} 0.001779746
http_request_duration_seconds_sum{method="/search" route="GET" response_code="200"} 1.5741987
http_request_duration_seconds_sum{method="/vi/:id/:name" route="GET" response_code="200"} 12.719181
http_request_duration_seconds_sum{method="/watch" route="GET" response_code="200"} 2.0207345
http_request_duration_seconds_sum{method="/ggpht/*" route="GET" response_code="200"} 5.050008
http_request_duration_seconds_sum{method="/latest_version" route="GET" response_code="302"} 0.12236025
http_request_duration_seconds_sum{method="/api/v1/storyboards/:id" route="GET" response_code="200"} 0.02403159
http_request_duration_seconds_sum{method="/api/v1/comments/:id" route="GET" response_code="200"} 1.0594385

@alx-alexpark as you can see, there are no metrics which would compromise the privacy :)

Implementation notes

  1. The names of the metrics are inspired by the Prometheus FastAPI Instrumentator.
  2. The middleware for Kemal is taken from Crystal prometheus exporter project. I wasn't able to reuse more of this project's code, as its approach to the metrics is straight out wrong (instead of providing a http endpoint for Prometheus, it pushes them out through a raw TCP socket)

Problems encountered

For some reason, the metrics collection handler is not called when the /videoplayback/ endpoint is being handled. I suspect that this is due to the before_all configuration, unfortunately I'm not very familiar with neither Kemal nor Invidious, so I'm having trouble debugging this.

Other notes

Imo adding database metrics to this PR would make it too large, so I'd prefer to split it into another PR :)

@Wint3rmute Wint3rmute force-pushed the add-prometheus-metrics-endpoint branch 4 times, most recently from 40a7381 to 7cbf78a Compare March 19, 2023 17:18
@github-actions github-actions bot added the stale label May 4, 2023
@SamantazFox SamantazFox removed the stale label May 7, 2023
@SamantazFox
Copy link
Copy Markdown
Member

Oh, sorry, I didn't notice that you update your code! That's looking great :D
I'll review that as soon as possible!

@github-actions github-actions bot added the stale label Jul 8, 2023
@iv-org iv-org deleted a comment from github-actions bot Jul 14, 2023
@SamantazFox SamantazFox removed the stale label Jul 14, 2023
@iv-org iv-org deleted a comment from github-actions bot Jul 15, 2023
@Wint3rmute Wint3rmute force-pushed the add-prometheus-metrics-endpoint branch 2 times, most recently from 7ef3b00 to bc826b5 Compare July 20, 2023 18:16
@Wint3rmute Wint3rmute requested a review from SamantazFox July 20, 2023 18:19
@github-actions github-actions bot added the stale label Sep 4, 2023
@SamantazFox SamantazFox removed the stale label Sep 15, 2023
@iv-org iv-org deleted a comment from github-actions bot Oct 8, 2023
@Wint3rmute Wint3rmute force-pushed the add-prometheus-metrics-endpoint branch 4 times, most recently from 05294ab to 7075465 Compare January 23, 2024 20:33
@Wint3rmute Wint3rmute changed the title Suggestion: Add prometheus metrics at /api/v1/metrics Add prometheus metrics at /api/v1/metrics Jan 23, 2024
@github-actions github-actions bot added the stale label Apr 23, 2024
@SamantazFox SamantazFox removed the stale label Apr 25, 2024
@iv-org iv-org deleted a comment from github-actions bot Apr 25, 2024
@SamantazFox SamantazFox added the need-testing This feature needs to be deployed and tested to see if it's working, and doesn't break something label Apr 25, 2024
@github-actions github-actions bot added the stale label Jul 25, 2024
@SamantazFox SamantazFox removed the stale label Jul 25, 2024
@iv-org iv-org deleted a comment from github-actions bot Jul 30, 2024
@Soblow
Copy link
Copy Markdown

Soblow commented Aug 8, 2024

I've tried this PR on my personal private instance (running this PR rebased on latest master commit ( iv-org:master 90e94d4 ) as a docker ).

It works fine, thank you for the PR!

Extra infos if needed:

To add it on a Prometheus server:

  - job_name: invidious
    static_configs:
      - targets: ['localhost:3088']
    metrics_path: /api/v1/metrics

Here is a prototype dashboard you may use in Grafana for testing: grafana_invidious_panel.json

Screenshot of the grafana panel

(sorry units are wrong on the HTTP rates, feel free to correct!)

@SamantazFox
Copy link
Copy Markdown
Member

@Soblow Thanks for trying it out :D

@Fijxu
Copy link
Copy Markdown
Member

Fijxu commented Mar 7, 2025

I know that this pull request is really old but works fine with a single instance of Invidious running! With replicas, it doesn't work really great, all the metrics look messed up when two processes of Invidious are running under the same domain but this is not a problem that should be addressed on this PR anyways. What is keeping this from getting merged?

image

@Fijxu
Copy link
Copy Markdown
Member

Fijxu commented Mar 17, 2025

It has a bug!. When using any URL that is not in any registered route by invidious like https://inv.nadeko.net/health, one should expect being redirected to https://inv.nadeko.net/channel/UCF11Yrx9bAy2Q6j6lgo0JwQ, but with this PR you get "500 Internal Server Error" instead with this backtrace:

Exception: Radix::Result(T)#payload cannot be nil (NilAssertionError)
  from lib/radix/src/radix/result.cr:19:5 in 'payload'
  from lib/kemal/src/kemal/ext/context.cr:27:7 in 'route'
  from src/invidious/metrics.cr:26:24 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from src/ext/kemal_static_file_handler.cr:162:16 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from lib/kemal/src/kemal/exception_handler.cr:8:7 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from src/invidious/helpers/logger.cr:21:35 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'call'
  from /usr/lib/crystal/http/server/request_processor.cr:51:11 in 'process'
  from /usr/lib/crystal/http/server.cr:521:5 in 'handle_client'
  from /usr/lib/crystal/http/server.cr:451:5 in '->'
  from /usr/lib/crystal/fiber.cr:148:11 in 'run'
  from /usr/lib/crystal/fiber.cr:100:34 in '->'
  from ???

2025-03-17T02:34:00.894358Z  ERROR - http.server: Unhandled exception on HTTP::Handler
Missing hash key: "preferences" (KeyError)
  from /usr/lib/crystal/hash.cr:1193:11 in '[]'
  from lib/kemal/src/kemal/ext/context.cr:51:7 in 'get'
  from src/invidious/helpers/errors.cr:26:12 in 'error_template_helper'
  from src/invidious.cr:251:3 in '->'
  from lib/kemal/src/kemal/config.cr:96:102 in '->'
  from src/invidious/helpers/handlers.cr:58:71 in 'call_exception_with_status_code'
  from lib/kemal/src/kemal/exception_handler.cr:15:14 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from src/invidious/helpers/logger.cr:21:35 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'call'
  from /usr/lib/crystal/http/server/request_processor.cr:51:11 in 'process'
  from /usr/lib/crystal/http/server.cr:521:5 in 'handle_client'
  from /usr/lib/crystal/http/server.cr:451:5 in '->'
  from /usr/lib/crystal/fiber.cr:148:11 in 'run'
  from /usr/lib/crystal/fiber.cr:100:34 in '->'
  from ???

Maybe because it's too old?

@Wint3rmute Wint3rmute force-pushed the add-prometheus-metrics-endpoint branch from 7075465 to 5999dc3 Compare March 23, 2025 10:55
@Wint3rmute
Copy link
Copy Markdown
Contributor Author

Woah, I almost forgot about this MR now :)

I rebased it against the master branch, which unfortunately breaks the code, there's a 500 error with the following message:

Exception: Radix::Result(T)#payload cannot be nil (NilAssertionError)
  from lib/radix/src/radix/result.cr:19:5 in 'payload'
  from lib/kemal/src/kemal/ext/context.cr:27:7 in 'route'
  from src/invidious/metrics.cr:26:24 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from src/ext/kemal_static_file_handler.cr:162:16 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from lib/kemal/src/kemal/exception_handler.cr:8:7 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from src/invidious/helpers/logger.cr:21:35 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'call'
  from /usr/lib/crystal/http/server/request_processor.cr:51:11 in 'process'
  from /usr/lib/crystal/http/server.cr:521:5 in 'handle_client'
  from /usr/lib/crystal/http/server.cr:451:5 in '->'
  from /usr/lib/crystal/fiber.cr:148:11 in 'run'
  from /usr/lib/crystal/fiber.cr:100:34 in '->'
  from ???

2025-03-23T11:21:40.533488Z  ERROR - http.server: Unhandled exception on HTTP::Handler
Missing hash key: "preferences" (KeyError)
  from /usr/lib/crystal/hash.cr:1193:11 in '[]'
  from lib/kemal/src/kemal/ext/context.cr:51:7 in 'get'
  from src/invidious/helpers/errors.cr:26:12 in 'error_template_helper'
  from src/invidious.cr:227:3 in '->'
  from lib/kemal/src/kemal/config.cr:96:102 in '->'
  from src/invidious/helpers/handlers.cr:58:7 in 'call_exception_with_status_code'
  from lib/kemal/src/kemal/exception_handler.cr:15:14 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from src/invidious/helpers/logger.cr:21:35 in 'call'
  from /usr/lib/crystal/http/server/handler.cr:30:7 in 'call_next'
  from lib/kemal/src/kemal/init_handler.cr:12:7 in 'call'
  from /usr/lib/crystal/http/server/request_processor.cr:51:11 in 'process'
  from /usr/lib/crystal/http/server.cr:521:5 in 'handle_client'
  from /usr/lib/crystal/http/server.cr:451:5 in '->'
  from /usr/lib/crystal/fiber.cr:148:11 in 'run'
  from /usr/lib/crystal/fiber.cr:100:34 in '->'
  from ???

I don't have the resources or the required Crystal knowledge to debug this. If anyone is willing to continue the work on adding metrics to Invidious, feel free to use this MR as a starting point.

@Wint3rmute Wint3rmute closed this Mar 23, 2025
R0gu3h3h3h3 pushed a commit to R0gu3h3h3h3/oiia that referenced this pull request May 16, 2025
R0gu3h3h3h3 pushed a commit to R0gu3h3h3h3/oiia that referenced this pull request May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

need-testing This feature needs to be deployed and tested to see if it's working, and doesn't break something

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants