Skip to content

Possible race condition with API log rotation and API calls #485

Open
@davidjiglesias

Description

@davidjiglesias

Hello team,

I have been studying a possible race condition occurring in the API when some API calls are made at the exact same time the API is rotating its log at midnight. When that happens, the API receives a SIGTERM and terminates.

Based on these logs provided by one of our users:

root@wazuh-manager:/var/ossec/logs# cat api.log
WazuhAPI 2020-05-06 00:00:00 foo: Rotated: /var/ossec/logs/api/2020/May/api-06-1.gz
WazuhAPI 2020-05-06 00:00:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:01 foo: Exiting... (SIGTERM)
WazuhAPI 2020-05-06 06:08:27 : Listening on: https://:::55000
WazuhAPI 2020-05-06 06:08:35 foo: [::ffff:192.168.0.4] GET /manager/info? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:36 foo: [::ffff:192.168.0.4] GET /version? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:45 foo: [::ffff:192.168.0.4] GET /manager/info? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/?limit=1&sort=-dateAdd&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/summary? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/groups? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=version&select=version - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=node_name&select=node_name - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=os.name%2Cos.platform%2Cos.version&select=os.name%2Cos.platform%2Cos.version - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:48 foo: [::ffff:192.168.0.4] GET /agents/?limit=500&offset=0 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.

We can see there are three different calls (GET /agents?limit=1, GET /cluster/status and GET /agents) scheduled by the Kibana cronjob every 15 minutes, and they are occurring at exact hours in this case (07:00, 07:15, 07:30, etc). They do not always occur at exact hours, it depends on first execution.

The API log rotation occurs exactly at 00:00 and will concur in this case with some of those calls.

We should further investigate the issue based on these premises.

Best regards,

David J. Iglesias

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions