Skip to content

heussd/nats-news-analysis

Repository files navigation

Title summary
Personal News Analysis
Automatically find relevant news from the Web.

Systematically retrieves online news articles, enriches them, scans them for keywords and sends hits to raindrop.io. All analysis components are loosely-coupled with NATS.io queues, which also allows scaling single-core-CPU-intensive components easily.

Open In Draw.io

The system has the following NATS queues:

  1. feed-urls - URLs of RSS feeds.
  2. article-urls - URLs of individual articles of RSS feeds.
  3. news - News texts and their metadata.
  4. match-urls - URLs of positive matching articles.

The outcomes of this system are:

  1. A Raindrop.io collection with articles matching to keywords.
  2. An Azure AI Search index prepared for hybrid search on news.
  3. A Grafana dashboard with metrics.
  4. An MCP-Server that allows consuming 1 and 2 with an agentic LLM (such as Roo Code).

Involved services

All services are orchestrated and scaled with compose.yml.

Custom services

Third party services

Message queue for scaling

Instead of blocking the application with a single core keyword matching operation, or even trying to build a complex multi-threading keyword matching, we are using the scale option of docker compose to run multiple single-core keyword matching components in parallel, wired together with the message queue. This allows us to keep individual components super straight-forward and easy to maintain.

Keyword matching containers, scaled up

One core per keyword matching

Observability

A typical Prometheus-Loki-Grafana stack is used to monitor application metrics and statistics.

NATS server stats are made available to Prometheus via Prometheus NATS Exporter.

Keyword-matcher-containers use zerolog and expose their logs to Loki using the Docker Loki logging driver.

A Grafana dashboard ships with the source of the repository.

Comparing Python with Golang

As one of the core components responsible for the main analysis task, keyword-matcher has been ported from Python to Golang, for fun and research purposes. Both implementations of keyword-matcher can play alongside or even to compete with each other:

NAME                                                 CPU %     MEM USAGE / LIMIT
loki                                                 1.33%     74.55MiB / 7.667GiB
nats-news-analysis_fullfeedrss_1                     0.00%     76.68MiB / 7.667GiB
nats-news-analysis_fullfeedrss_2                     0.01%     70.62MiB / 7.667GiB
nats-news-analysis_grafana_1                         0.17%     35.95MiB / 7.667GiB
nats-news-analysis_keyword-matcher-go_1              0.00%     8.051MiB / 7.667GiB
nats-news-analysis_keyword-matcher-go_2              0.00%     8.422MiB / 7.667GiB
nats-news-analysis_keyword-matcher-go_3              0.00%     8.781MiB / 7.667GiB
nats-news-analysis_keyword-matcher-go_4              0.00%     8.059MiB / 7.667GiB
nats-news-analysis_keyword-matcher-python_1          0.00%     22.64MiB / 7.667GiB
nats-news-analysis_keyword-matcher-python_2          0.00%     23.21MiB / 7.667GiB
nats-news-analysis_keyword-matcher-python_3          0.00%     24.23MiB / 7.667GiB
nats-news-analysis_keyword-matcher-python_4          0.00%     23.8MiB / 7.667GiB
nats-news-analysis_loadbalancer_1                    0.00%     2.316MiB / 7.667GiB
nats-news-analysis_nats-server_1                     1.34%     92.97MiB / 7.667GiB
nats-news-analysis_natsexporter_1                    0.03%     7.41MiB / 7.667GiB
nats-news-analysis_pocket-integration_1              0.00%     18.41MiB / 7.667GiB
nats-news-analysis_prometheus_1                      0.00%     37.22MiB / 7.667GiB
nats-news-analysis_rss-article-url-feeder-go-1st_1   0.05%     15.32MiB / 7.667GiB
nats-news-analysis_rss-article-url-feeder-go-2nd_1   11.46%    12.95MiB / 7.667GiB

Here are some interesting stats from Docker and Loki, collected during regular operation:

Metric Python Golang Comparison
Docker image size 424MB 6.09MB Go impl. is ~70x smaller
Memory consumption 23,8MiB 8,33MiB Go impl. needs ~3x less memory
LoC 447 485 Python impl. has ~8% less lines

About

Find news articles, retrieves their content, matches keywords, and puts results on Raindrop.io

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 3

  •  
  •  
  •