Skip to content

Optimize JsonSerializer and Flusher_file #2184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 27, 2025

Conversation

Takuka0311
Copy link
Collaborator

@Takuka0311 Takuka0311 commented Apr 15, 2025

优化了JsonSerializer和Flusher_file的性能,主要改动:

  • json库替换为rapidjson
  • JsonSerializer内使用不释放的buffer优化cpu开销(cpu优化最多)
  • 移除Flusher_file内不必要的Batcher
  • Flusher_file内spdlog从与loongcollector日志模块共享线程变为独立线程(mem优化最多)

测试场景:10M/s正则解析场景,配置和数据样例列在最后,结果见CI。性能对比如下图。

CPU从平均1.43核降低至0.24核,减少83%:

image

内存从平均136M降低至30M,减少78%:

image

附:

配置:

enable: true
inputs:
  - Type: input_file
    FilePaths: 
      - /home/loongcollector/*.log
processors:
  - Type: processor_parse_regex_native
    SourceKey: content
    Regex: ^([^ ]*) ([^ ]*) ([^ ]*) \[([^\]]*)\] "(\S+) ([^\"]*) (\S*)" ([^ ]*) ([^ ]*) "([^\"]*)" "([^\"]*)"
    Keys:
      - ip
      - ident
      - auth
      - timestamp
      - method
      - request
      - http_version
      - response_code
      - bytes
      - referrer
      - user_agent
flushers:
  - Type: flusher_file
    FilePath: /home/loongcollector/test.out

@yyuuttaaoo yyuuttaaoo requested a review from Copilot April 24, 2025 07:56
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes performance by replacing the JSON library with rapidjson and refactoring file flushing. The key changes include:

  • Replacing the json library with rapidjson in JsonSerializer for faster serialization.
  • Removing unused overloads and the Batcher from FlusherFile to reduce CPU overhead.
  • Adjusting spdlog configuration to use an independent thread for logging.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
core/plugin/flusher/file/FlusherFile.h Removed unused SerializeAndPush overload and obsolete variables.
core/plugin/flusher/file/FlusherFile.cpp Refactored flush logic and updated spdlog thread pool creation; potential misassignment in config parameter.
core/collection_pipeline/serializer/JsonSerializer.cpp Migrated serialization to rapidjson with helper functions for common fields.

@Takuka0311 Takuka0311 merged commit 009dfb2 into alibaba:main Apr 27, 2025
15 checks passed
@Takuka0311 Takuka0311 deleted the flusher-file-dev branch April 28, 2025 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants