Skip to content

CrowdSec exits if the VictoriaLogs data source goes down temporarily #3653

Open
@thebondo

Description

@thebondo

What happened?

I have a setup using fluent-bit to collect logs, ship them to a central VictoriaLogs instance, and then CrowdSec is ingesting logs from VictoriaLogs. Everything works great when started in the right order. CrowdSec will only start if VictoriaLogs is already running.

However, if VictoriaLogs goes down temporarily, CrowdSec dies.

What did you expect to happen?

I expected that CrowdSec would notice the lost connection, and at least retry a few times before dying. I hoped it would just keep retrying with some kind of back-off strategy.

How can we reproduce it (as minimally and precisely as possible)?

Requires a working docker-compose setup.

  1. Create a working folder.
  2. Create a file vlogs.yaml with these contents
source: victorialogs
mode: tail
log_level: info
url: http://vlogs:9428/
limit: 10
query: '*'
labels:
  type: other
  1. Create a file compose.yaml with these contents
services:
  vlogs:
    image: victoriametrics/victoria-logs:v1.23.2-victorialogs
  fluentbit:
    image: fluent/fluent-bit:4.0.2
    depends_on:
      - "vlogs"
    command:
      - "-i dummy"
      - "-p dummy='{_msg=\"This is a test message\"}'"
      - "-o http"
      - "-p host=vlogs"
      - "-p port=9428"
      - "-p uri=/insert/jsonline?_stream_fields=host,service&_msg_field=log&_time_field=date"
      - "-p format=json_lines"
      - "-p json_date_format=iso8601"
  crowdsec:
    image: crowdsecurity/crowdsec:v1.6.8
    depends_on:
      - "vlogs"
    volumes:
      - ./acquis.yaml:/etc/crowdsec/acquis.d/vlogs.yaml
  1. Run docker-compose up

This should start three containers, with fluent-bit generating test log messages, VictoriaLogs collecting them, and CrowdSec ingesting them. The CrowdSec init takes a short delay, but eventually, you should just see lines from the fluent-bit container for each record.

  1. At that point, in another terminal run docker stop to stop the VictoriaLogs container.

You should see the VictoriaLogs container stop after a short delay (it tries to quit gracefully, but the CrowdSec request is hanging on, so it waits for a time out). Just after VictoriaLogs exits, the CrowdSec container will also exit with an error. You will see fluent-bit chugging along trying to send data, but failing.

  1. Cleanup by running docker-compose down

Anything else we need to know?

I have forked crowdsecurity/crowdsec and create a branch victorialogs-retry-bug with a fix that is working for me.

Crowdsec version

$ cscli version
version: v1.6.8-f209766e
Codename: alphaga
BuildDate: 2025-03-25_15:56:53
GoVersion: 1.24.1
Platform: docker
libre2: C++
User-Agent: crowdsec/v1.6.8-f209766e-docker
Constraint_parser: >= 1.0, <= 3.0
Constraint_scenario: >= 1.0, <= 3.0
Constraint_api: v1
Constraint_acquis: >= 1.0, < 2.0
Built-in optional components: cscli_setup, datasource_appsec, datasource_cloudwatch, datasource_docker, datasource_file, datasource_http, datasource_journalctl, datasource_k8s-audit, datasource_kafka, datasource_kinesis, datasource_loki, datasource_s3, datasource_syslog, datasource_victorialogs, datasource_wineventlog

OS version

These are on the host running Docker. My actual setup does not use Docker, and all the programs are running natively. The same thing happens there as well.
# On Linux:
$ cat /etc/os-release
NAME="Pop!_OS"
VERSION="22.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 22.04 LTS"
VERSION_ID="22.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os
$ uname -a
Linux holmes 6.12.10-76061203-generic #202412060638~1743109366~22.04~1fce33b SMP PREEMPT_DYNAMIC Thu M x86_64 x86_64 x86_64 GNU/Linux

Enabled collections and parsers

Acquisition config

No response

Config show

No response

Prometheus metrics

No response

Related custom configs versions (if applicable) : notification plugins, custom scenarios, parsers etc.

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions