Description
What happened?
I have a setup using fluent-bit to collect logs, ship them to a central VictoriaLogs instance, and then CrowdSec is ingesting logs from VictoriaLogs. Everything works great when started in the right order. CrowdSec will only start if VictoriaLogs is already running.
However, if VictoriaLogs goes down temporarily, CrowdSec dies.
What did you expect to happen?
I expected that CrowdSec would notice the lost connection, and at least retry a few times before dying. I hoped it would just keep retrying with some kind of back-off strategy.
How can we reproduce it (as minimally and precisely as possible)?
Requires a working docker-compose setup.
- Create a working folder.
- Create a file vlogs.yaml with these contents
source: victorialogs
mode: tail
log_level: info
url: http://vlogs:9428/
limit: 10
query: '*'
labels:
type: other
- Create a file compose.yaml with these contents
services:
vlogs:
image: victoriametrics/victoria-logs:v1.23.2-victorialogs
fluentbit:
image: fluent/fluent-bit:4.0.2
depends_on:
- "vlogs"
command:
- "-i dummy"
- "-p dummy='{_msg=\"This is a test message\"}'"
- "-o http"
- "-p host=vlogs"
- "-p port=9428"
- "-p uri=/insert/jsonline?_stream_fields=host,service&_msg_field=log&_time_field=date"
- "-p format=json_lines"
- "-p json_date_format=iso8601"
crowdsec:
image: crowdsecurity/crowdsec:v1.6.8
depends_on:
- "vlogs"
volumes:
- ./acquis.yaml:/etc/crowdsec/acquis.d/vlogs.yaml
- Run docker-compose up
This should start three containers, with fluent-bit generating test log messages, VictoriaLogs collecting them, and CrowdSec ingesting them. The CrowdSec init takes a short delay, but eventually, you should just see lines from the fluent-bit container for each record.
- At that point, in another terminal run docker stop to stop the VictoriaLogs container.
You should see the VictoriaLogs container stop after a short delay (it tries to quit gracefully, but the CrowdSec request is hanging on, so it waits for a time out). Just after VictoriaLogs exits, the CrowdSec container will also exit with an error. You will see fluent-bit chugging along trying to send data, but failing.
- Cleanup by running docker-compose down
Anything else we need to know?
I have forked crowdsecurity/crowdsec and create a branch victorialogs-retry-bug with a fix that is working for me.
Crowdsec version
$ cscli version
version: v1.6.8-f209766e
Codename: alphaga
BuildDate: 2025-03-25_15:56:53
GoVersion: 1.24.1
Platform: docker
libre2: C++
User-Agent: crowdsec/v1.6.8-f209766e-docker
Constraint_parser: >= 1.0, <= 3.0
Constraint_scenario: >= 1.0, <= 3.0
Constraint_api: v1
Constraint_acquis: >= 1.0, < 2.0
Built-in optional components: cscli_setup, datasource_appsec, datasource_cloudwatch, datasource_docker, datasource_file, datasource_http, datasource_journalctl, datasource_k8s-audit, datasource_kafka, datasource_kinesis, datasource_loki, datasource_s3, datasource_syslog, datasource_victorialogs, datasource_wineventlog
OS version
# On Linux:
$ cat /etc/os-release
NAME="Pop!_OS"
VERSION="22.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 22.04 LTS"
VERSION_ID="22.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os
$ uname -a
Linux holmes 6.12.10-76061203-generic #202412060638~1743109366~22.04~1fce33b SMP PREEMPT_DYNAMIC Thu M x86_64 x86_64 x86_64 GNU/Linux
Enabled collections and parsers
Acquisition config
No response
Config show
No response
Prometheus metrics
No response
Related custom configs versions (if applicable) : notification plugins, custom scenarios, parsers etc.
No response