Skip to content

Comments

fix: Enhance logging and error handling in eBPF components#2465

Open
yyuuttaaoo wants to merge 1 commit intoalibaba:mainfrom
yyuuttaaoo:fix/ebpf_resource_clean
Open

fix: Enhance logging and error handling in eBPF components#2465
yyuuttaaoo wants to merge 1 commit intoalibaba:mainfrom
yyuuttaaoo:fix/ebpf_resource_clean

Conversation

@yyuuttaaoo
Copy link
Collaborator

  • Added warning logs for skipping operations when the dynamic library is not ready in EBPFAdapter.
  • Improved error handling in EBPFServer for consuming performance buffer data, logging warnings on failures.
  • Updated BPFWrapper to ensure proper initialization checks before performing operations.
  • Refined ProcessCacheManager to return the result of consuming performance buffer data.

These changes improve the robustness and debuggability of the eBPF components.

Change-Id: I9331a84baf69e5e1ecd1508094377a4f33a55755
Co-developed-by: Cursor noreply@cursor.com

- Added warning logs for skipping operations when the dynamic library is not ready in EBPFAdapter.
- Improved error handling in EBPFServer for consuming performance buffer data, logging warnings on failures.
- Updated BPFWrapper to ensure proper initialization checks before performing operations.
- Refined ProcessCacheManager to return the result of consuming performance buffer data.

These changes improve the robustness and debuggability of the eBPF components.

Change-Id: I9331a84baf69e5e1ecd1508094377a4f33a55755
Co-developed-by: Cursor <noreply@cursor.com>
Copilot AI review requested due to automatic review settings November 21, 2025 05:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

这个PR旨在增强eBPF组件的日志记录和错误处理能力,主要改进包括:

  • 在EBPFAdapter中为跳过操作添加警告日志(当动态库未就绪时)
  • 改进EBPFServer中消费性能缓冲区数据的错误处理和日志记录
  • 更新BPFWrapper以确保操作前进行正确的初始化检查
  • 优化ProcessCacheManager以返回消费性能缓冲区数据的结果

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
core/ebpf/plugin/ProcessCacheManager.cpp 修复了未捕获返回值的bug,现在正确捕获ConsumePerfBufferData的返回值
core/ebpf/driver/eBPFDriver.cpp 在DynamicAttachBPFObject失败时添加DeletePerfBuffers调用,改进资源清理
core/ebpf/driver/BPFWrapper.h 将mInited设置延后到Init完成后,为所有perf buffer操作添加初始化检查,移除未使用的mFlag和DetachAllPerfBuffers,将NULL改为nullptr
core/ebpf/EBPFServer.cpp 将mEBPFAdapter->Init()移到线程启动前,为ConsumePerfBufferData添加错误检查和相应的WARNING/DEBUG日志
core/ebpf/EBPFAdapter.cpp 在PollPerfBuffers和ConsumePerfBufferData中添加动态库未就绪时的警告日志

const char* name = bpf_program__name(prog);
mBpfProgs[name] = prog;
}
mInited = true;
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mInited的赋值时机延后到Init()函数末尾,虽然提高了正确性,但引入了并发安全问题。如果多个线程同时调用Init(),可能导致:

  1. 两个线程都通过if (mInited)检查
  2. 两个线程都执行T::open_and_load(),导致资源重复创建

建议使用std::call_once或双重检查锁定模式,例如:

if (mInited.load(std::memory_order_acquire)) {
    return 0;
}
static std::mutex initMutex;
std::lock_guard<std::mutex> lock(initMutex);
if (mInited.load(std::memory_order_acquire)) {
    return 0;
}
// ... initialization code ...
mInited.store(true, std::memory_order_release);

Copilot uses AI. Check for mistakes.
Comment on lines +355 to +357
if (!mInited) {
return -1;
}
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在多线程环境中,对mInited的读取应使用std::memory_order_acquire以确保内存可见性。当前使用默认的seq_cst虽然安全但性能略差。建议:

if (!mInited.load(std::memory_order_acquire)) {
    return -1;
}

这需要配合Init()中使用mInited.store(true, std::memory_order_release)。这个问题同样影响DeletePerfBuffer、ConsumePerfBuffer、GetPerfBufferEpollFd、CreatePerfBuffer等方法。

Copilot uses AI. Check for mistakes.
// stop perf threads ...
mFlag = false;
DetachAllPerfBuffers();
mInited = false;
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destroy()中设置mInited = false存在并发安全问题。如果其他线程正在调用PollPerfBuffer、ConsumePerfBuffer等方法,它们可能在检查mInited之后、实际使用资源之前,资源就被Destroy()释放了,导致use-after-free。建议在Destroy()中先设置mInited为false(使用memory_order_release),然后等待所有正在进行的操作完成后再释放资源,或者使用引用计数/智能指针管理资源生命周期。

Copilot uses AI. Check for mistakes.
Comment on lines 282 to +285
if (!dynamicLibSuccess()) {
LOG_WARNING(sLogger,
("poll perf buffer skip, dynamic lib not ready",
magic_enum::enum_name(pluginType)));
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些警告日志可能导致日志刷屏问题。PollPerfBuffers在事件驱动循环中被频繁调用(每200ms一次epoll_wait),如果动态库加载失败,将持续输出警告日志。建议:

  1. 使用日志限流机制(如每分钟只记录一次)
  2. 或者在初始化时就检查并报警,后续仅返回错误码不打印日志
  3. 或者将日志级别降低为DEBUG
Suggested change
if (!dynamicLibSuccess()) {
LOG_WARNING(sLogger,
("poll perf buffer skip, dynamic lib not ready",
magic_enum::enum_name(pluginType)));
static bool sWarnedDynamicLibNotReady = false;
if (!dynamicLibSuccess()) {
if (!sWarnedDynamicLibNotReady) {
LOG_WARNING(sLogger,
("poll perf buffer skip, dynamic lib not ready",
magic_enum::enum_name(pluginType)));
sWarnedDynamicLibNotReady = true;
}

Copilot uses AI. Check for mistakes.
Comment on lines +479 to +481
LOG_WARNING(sLogger,
("consume perf buffer data skip, dynamic lib not ready",
magic_enum::enum_name(pluginType)));
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处同样存在日志刷屏风险。ConsumePerfBufferData在事件驱动循环中被频繁调用(handleEpollEvents中处理epoll事件时调用),如果动态库未就绪会持续输出警告。建议采用与PollPerfBuffers相同的日志限流策略。

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant