This guide outlines the process for deploying a Fluent Bit log forwarder on the Codon HPC cluster. The goal is to collect logs generated by SLURM jobs and forward them to a centralized EFK (Elasticsearch, Fluentd, Kibana) stack.
This setup uses a periodic SLURM job, managed by scrontab, to collect and ship logs. This approach is necessary in HPC environments where users typically lack permissions to run persistent daemons on compute nodes, which aligns with the best practice of keeping those nodes dedicated to scheduled jobs.
The log forwarding process works as follows:
- Log Generation: SLURM jobs are configured to write their logs to a designated directory on a shared filesystem (
/homes/bia_svc/log-forwarder/logs/). - Scheduled Collection: A
scrontabjob runs periodically (e.g., every 30 minutes), submitting a SLURM batch script (run_demo_cycle.sbatch). - Log Forwarding: This script executes Fluent Bit within a Singularity container. Fluent Bit reads new log entries, tracks its progress to avoid duplication, and sends them to the central Fluentd service.
- Centralized Storage & Visualization: The logs are received by the EFK stack, indexed in Elasticsearch, and become available for searching and visualization in Kibana.
This entire process is automated and designed to run with minimal user privileges on the HPC cluster.
Before you begin, ensure you have the following:
- Singularity/Apptainer: Codon, like many HPC clusters, uses Singularity as a container runtime instead of Docker. You will need to build or pull a Singularity image for Fluent Bit.
- Centralized Log Storage: Your SLURM jobs should be configured to write logs to a shared filesystem accessible by your forwarding job (e.g.,
/hpsor/nfs). - EFK Cluster Endpoint: You need the hostname and port for the HTTP input of your centralized Fluentd/Elasticsearch instance.
Run
kubectl --kubeconfig=k8s-hh-wp-webadmin-32-conf.yaml -n logging-test get service fluentd-httpand
kubectl --kubeconfig=k8s-hh-wp-webadmin-32-conf.yaml -n logging-test get nodes -o wide and combine the internal ip and port number.
- One can see the logs in Kibana's Discover tab: Run
kubectl --kubeconfig=k8s-hh-wp-webadmin-32-conf.yaml -n logging-test get service kibanaand
kubectl --kubeconfig=k8s-hh-wp-webadmin-32-conf.yaml -n logging-test get nodes -o wide and combine the internal ip and port number. Make sure to update it in the config file.
Organize your files on the shared filesystem as follows. Currently files are in /homes/bia_svc/log-forwarder for testing purposes.
This structure keeps your configuration, scripts, and log data organized.
/homes/bia_svc/log-forwarder/
├── add_tag.lua # Lua file for tagging the logs
├── fluent-bit.conf # Fluent Bit configuration
├── generate_log.sh # Log generator for testing
├── parsers.conf # Parser configuration
├── fluentbit_data/ # Stores state database for log file offsets (separated by env)
└── logs/ # Directory where your jobs write their logs
├── bia/
│ ├── dev/
│ │ └── ingest_run.log
│ ├── staging/
│ │ └── ingest_run.log
│ └── prod/
│ └── ingest_run.log
└── empiar/
├── dev/
│ └── some_job.log
├── staging/
│ └── some_job.log
└── prod/
└── some_job.log
└── run_demo_cycle.sbatch # The SLURM submission script
- See fluentbit conf file.
- See parser conf file.
Before submitting the job to SLURM, you must build a Singularity image for Fluent Bit and test it to ensure it functions correctly.
singularity pull docker://fluent/fluent-bit:latestThis command downloads the latest Fluent Bit image from Docker Hub and converts it into a Singularity Image File (.sif) named fluent-bit_latest.sif.
Note that the deployment is handled by GitLab CI under the section deploy-hpc.
To test the forwarder, you must first generate some log data. You can do this by running the generate_log.sh script, which will create or append to a log file in the logs/ directory. After generating logs, submit the sbatch job to send them.
To run the job with the default application (bia) and environment (dev), simply submit the script:
# 1. Generate sample logs
./generate_log.sh -a bia -e dev
# 2. Submit the job to forward the logs
sbatch run_demo_cycle.sbatchThis section provides a quick guide to setting up the bia-ingest tool from the fluent-logging branch and performing a test run.
# 1. Clone the repository
git clone https://github.com/BioImage-Archive/bia-integrator.git
cd bia-integrator
# 2. Switch to the feature branch
git checkout fluent-logging
# 3. Set up a Python environment and install dependencies
python -m venv venv
source venv/bin/activate
pip install poetry
poetry --directory=bia-ingest install
# If pyproject.toml has changed, you may need to update the lock file first:
poetry --directory=bia-ingest lockThis example executes the ingest logic without saving any data (--dryrun) and directs all detailed logs to a file (--logfile).
poetry --directory=bia-ingest run biaingest ingest --dryrun S-BIAD1285 --logfile /homes/bia_svc/log-forwarder/logs/ingest_run.logTo run your log forwarding job periodically, use scrontab, SLURM's built-in cron utility.
- Edit your scrontab:
scrontab -e
- Add a new cron job: Add a line to the file that defines the schedule and the script to run. The format is similar to a standard Linux crontab. For example, to run the job every 30 minutes:
# Log forwarder (namespace=logging-test)
#SCRON --time=15
#SCRON --mem=500M
*/30 * * * * sbatch /homes/bia_svc/log-forwarder/run_demo_cycle.sbatch
- Save and exit the editor. Verify the job is scheduled:
scrontab -l
This will list your currently scheduled jobs.
- Configure Your Jobs: Ensure your primary SLURM jobs write their application logs to the
/homes/bia_svc/log-forwarder/logs/directory. - Let it Run: The scrontab scheduler will automatically submit the
run_demo_cycle.sbatchjob at the specified interval in scrontab.
Each time it runs, it will:
- Scan the
/logsdirectory for new or updated log files. - Send any new log entries since its last run to your EFK cluster.
- Exit gracefully thanks to the
Exit_On_Eofflag.
Then one can see the logs in Kibana's Discover tab.
Once logs are forwarded to the EFK stack, they are indexed in Elasticsearch with a tag that identifies their application and environment. This allows for powerful filtering in Kibana.
- Navigate to Kibana: Open your Kibana dashboard using the URL you found in the previous step (
http://<NODE_IP>:<NODE_PORT>). - Go to the Discover Tab: This is the primary interface for exploring log data.
- Set the Time Range: Make sure the time filter in the top-right corner is set to a range that includes when your logs were generated (e.g., "Last 15 minutes").
The logs forwarded from the HPC contain an event_tag field that identifies their application and environment (e.g., codon.bia.staging). You can use this field in Kibana's Query Language (KQL) to filter your logs.
- To see all logs from the
biaapplication:event_tag : codon.bia.*
- To see all logs from the
stagingenvironment across all applications:event_tag : codon.*.staging
- To see logs for a specific combination, like
empiarinprod:event_tag : codon.empiar.prod
- Check SLURM Output: The
/homes/bia_svc/logs/log-forwarder-%j.outand/homes/bia_svc/logs/log-forwarder-%j.errfiles will contain the output from your submission script, including any errors from SLURM or the singularity command. - Fluent Bit Verbosity: If logs are not being sent, temporarily increase the Log_Level in
fluent-bit.confto debug to get more detailed output. - Path Mismatches: The most common issue is incorrect paths. Double-check all paths in
fluent-bit.confandrun_demo_cycle.sbatch, ensuring the--bindpaths for Singularity are correct.