You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A worker does keep archives, these archives are kept indefinitely. When an agent attached to the worker sends a lot of logging (because of a problem or any other unforeseen reason) it can get archives to be quite big.
When there are a few days (or maybe hours depending on the PersistentVolumeClaim size of the worker) of big archives the disk will fill up.
When the disk is filled up there will be a few errors in the worker;
wazuh-db: ERROR: SQLite: database or disk is full
wazuh-db: ERROR: Cannot set connection_status for agent 3
These errors will cause all agents connected to that specific worker to have a status of "disconnected".
Then it will cause Wazuh to not receive any logging from those agents.
So to summarize the problem;
If one the agents is sending a lot of logging causing the disk to fill up quickly will cause other agents not being able to send logging.
This means that archives (for retention purposes) are considered to be more important then receiving logging in general.
Current Solution
The current solution of Wazuh is to run CronJob resources in Kubernetes in order to delete archives older then X days. This solution is:
Not in the repository by default
A little bit cumbersome to achieve
Opposed Solution
Introduce a build in solution in either the repository or even better in the GUI of Wazuh. It should be an option to set retention times and a garbage collection process should run - on fixed or user defined - times in order to keep disks clean and archives being kept for the specified retention time.
In addition to that it should not be possible to "hang up" a worker by a log flood. The worker should always have some space reserved in order to keep agents connected and logging delivered upstream. To worker should give out an alert when the specified retention time cannot be met due to low disk space.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
So to summarize the problem;
If one the agents is sending a lot of logging causing the disk to fill up quickly will cause other agents not being able to send logging.
This means that archives (for retention purposes) are considered to be more important then receiving logging in general.
Current Solution
The current solution of Wazuh is to run CronJob resources in Kubernetes in order to delete archives older then X days. This solution is:
Opposed Solution
Introduce a build in solution in either the repository or even better in the GUI of Wazuh. It should be an option to set retention times and a garbage collection process should run - on fixed or user defined - times in order to keep disks clean and archives being kept for the specified retention time.
In addition to that it should not be possible to "hang up" a worker by a log flood. The worker should always have some space reserved in order to keep agents connected and logging delivered upstream. To worker should give out an alert when the specified retention time cannot be met due to low disk space.
Beta Was this translation helpful? Give feedback.
All reactions