Skip to content

Conversation

Revolyssup
Copy link
Contributor

@Revolyssup Revolyssup commented Jun 17, 2025

Fixes #
When Kafka server is slow or unavailable, the logger objects keep accumulating faster than they are released causing memory spike.

The solution is to implement a limit in batch processor manager that will allow it to drop new objects pushed to it, if a given number of callbacks have not been processed.

Reproduction steps

  1. Start APISIX (1 worker process), Kafka Server
  2. Create a route and enable kafka-logger. Instance configuration:
    kafka-logger:
    batch_max_size: 80
    brokers:
    • host: 127.0.0.1
      port: 9092
      buffer_duration: 180
      cluster_name: 1
      inactive_timeout: 5
      kafka_topic: test2
      max_retry_count: 1
      meta_format: default
      meta_refresh_interval: 30
      name: service kafka logger
      producer_batch_num: 200
      producer_batch_size: 104857600
      producer_max_buffering: 50000
      producer_time_linger: 1
      producer_type: async
      required_acks: 1
      retry_delay: 1
      timeout: 5
  3. Test the request with the curl command and make sure the request information is sent to the kafka server
  4. Limit the CPU usage of the kafka server (simulating the slow receiving speed of the kafka server), for example, limit the CPU to 0.1 CUP
    sudo docker update --cpus 0.1 apisix_kafka
  5. Use wrk/wrk2 to send pressure to the specified route, and you will soon see the memory usage of the APISIX worker increase rapidly. When it grows to a certain size, it will be killed by the system, causing the worker process to crash and exit , and then APISIX will start a new worker process.
    wrk2 -c 200 -d 60 -t 4 -R 50000 http://xxxxx/test

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jun 17, 2025
@Revolyssup Revolyssup marked this pull request as draft June 17, 2025 04:52
@dosubot dosubot bot added the enhancement New feature or request label Jun 17, 2025
@Revolyssup Revolyssup marked this pull request as ready for review June 17, 2025 19:41
return
end

self.processed_entries = self.processed_entries + #batch.entries
Copy link
Member

@nic-6443 nic-6443 Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please read the above code, for whole not ok case, the failed batch will be discard after exceed max_retry_count, in this case, batch will be free too, so it should be counting in processed_entries too.

Copy link
Member

@nic-6443 nic-6443 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add test cases for this feature? start a mocking tcp server that don't read data from connected socket, this will accumulating pending log entry in apisix, and we can set max_pending_entries to small value to confirm discard code is working.

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jun 19, 2025
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jun 19, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 19, 2025
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 19, 2025
Copy link
Member

@membphis membphis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this pr, I think we only need to update one plugin kafka-logger, this should be enough

We can change other plugins later via separate PRs

function _M:add_entry_to_new_processor(conf, entry, ctx, func, max_pending_entries)
if max_pending_entries and
self.total_pushed_entries - total_processed_entries(self) > max_pending_entries then
core.log.error("max pending entries limit exceeded. discarding entry")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to print more useful information, eg: max_pending_entries, self.total_pushed_entries - total_processed_entries(self)

if max_pending_entries then
local total_processed_entries = total_processed_entries(self)
if self.total_pushed_entries - total_processed_entries > max_pending_entries then
core.log.error("max pending entries limit exceeded. discarding entry.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad indentation

image

if max_pending_entries then
local total_processed_entries = total_processed_entries(self)
if self.total_pushed_entries - total_processed_entries > max_pending_entries then
core.log.error("max pending entries limit exceeded. discarding entry.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

max_pending_entries = {
type = "integer",
description = "maximum number of pending entries in the batch processor",
minimum = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the minimum should be 1 here

0 is meaningless for APISIX user.

membphis
membphis previously approved these changes Jun 20, 2025
Copy link
Member

@membphis membphis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except one minor issue

@Revolyssup Revolyssup merged commit 7d5aeaf into apache:master Jun 23, 2025
47 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants