fix: Enhance logging and error handling in eBPF components#2465
fix: Enhance logging and error handling in eBPF components#2465yyuuttaaoo wants to merge 1 commit intoalibaba:mainfrom
Conversation
- Added warning logs for skipping operations when the dynamic library is not ready in EBPFAdapter. - Improved error handling in EBPFServer for consuming performance buffer data, logging warnings on failures. - Updated BPFWrapper to ensure proper initialization checks before performing operations. - Refined ProcessCacheManager to return the result of consuming performance buffer data. These changes improve the robustness and debuggability of the eBPF components. Change-Id: I9331a84baf69e5e1ecd1508094377a4f33a55755 Co-developed-by: Cursor <noreply@cursor.com>
There was a problem hiding this comment.
Pull Request Overview
这个PR旨在增强eBPF组件的日志记录和错误处理能力,主要改进包括:
- 在EBPFAdapter中为跳过操作添加警告日志(当动态库未就绪时)
- 改进EBPFServer中消费性能缓冲区数据的错误处理和日志记录
- 更新BPFWrapper以确保操作前进行正确的初始化检查
- 优化ProcessCacheManager以返回消费性能缓冲区数据的结果
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| core/ebpf/plugin/ProcessCacheManager.cpp | 修复了未捕获返回值的bug,现在正确捕获ConsumePerfBufferData的返回值 |
| core/ebpf/driver/eBPFDriver.cpp | 在DynamicAttachBPFObject失败时添加DeletePerfBuffers调用,改进资源清理 |
| core/ebpf/driver/BPFWrapper.h | 将mInited设置延后到Init完成后,为所有perf buffer操作添加初始化检查,移除未使用的mFlag和DetachAllPerfBuffers,将NULL改为nullptr |
| core/ebpf/EBPFServer.cpp | 将mEBPFAdapter->Init()移到线程启动前,为ConsumePerfBufferData添加错误检查和相应的WARNING/DEBUG日志 |
| core/ebpf/EBPFAdapter.cpp | 在PollPerfBuffers和ConsumePerfBufferData中添加动态库未就绪时的警告日志 |
| const char* name = bpf_program__name(prog); | ||
| mBpfProgs[name] = prog; | ||
| } | ||
| mInited = true; |
There was a problem hiding this comment.
mInited的赋值时机延后到Init()函数末尾,虽然提高了正确性,但引入了并发安全问题。如果多个线程同时调用Init(),可能导致:
- 两个线程都通过
if (mInited)检查 - 两个线程都执行
T::open_and_load(),导致资源重复创建
建议使用std::call_once或双重检查锁定模式,例如:
if (mInited.load(std::memory_order_acquire)) {
return 0;
}
static std::mutex initMutex;
std::lock_guard<std::mutex> lock(initMutex);
if (mInited.load(std::memory_order_acquire)) {
return 0;
}
// ... initialization code ...
mInited.store(true, std::memory_order_release);| if (!mInited) { | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
在多线程环境中,对mInited的读取应使用std::memory_order_acquire以确保内存可见性。当前使用默认的seq_cst虽然安全但性能略差。建议:
if (!mInited.load(std::memory_order_acquire)) {
return -1;
}这需要配合Init()中使用mInited.store(true, std::memory_order_release)。这个问题同样影响DeletePerfBuffer、ConsumePerfBuffer、GetPerfBufferEpollFd、CreatePerfBuffer等方法。
| // stop perf threads ... | ||
| mFlag = false; | ||
| DetachAllPerfBuffers(); | ||
| mInited = false; |
There was a problem hiding this comment.
Destroy()中设置mInited = false存在并发安全问题。如果其他线程正在调用PollPerfBuffer、ConsumePerfBuffer等方法,它们可能在检查mInited之后、实际使用资源之前,资源就被Destroy()释放了,导致use-after-free。建议在Destroy()中先设置mInited为false(使用memory_order_release),然后等待所有正在进行的操作完成后再释放资源,或者使用引用计数/智能指针管理资源生命周期。
| if (!dynamicLibSuccess()) { | ||
| LOG_WARNING(sLogger, | ||
| ("poll perf buffer skip, dynamic lib not ready", | ||
| magic_enum::enum_name(pluginType))); |
There was a problem hiding this comment.
这些警告日志可能导致日志刷屏问题。PollPerfBuffers在事件驱动循环中被频繁调用(每200ms一次epoll_wait),如果动态库加载失败,将持续输出警告日志。建议:
- 使用日志限流机制(如每分钟只记录一次)
- 或者在初始化时就检查并报警,后续仅返回错误码不打印日志
- 或者将日志级别降低为DEBUG
| if (!dynamicLibSuccess()) { | |
| LOG_WARNING(sLogger, | |
| ("poll perf buffer skip, dynamic lib not ready", | |
| magic_enum::enum_name(pluginType))); | |
| static bool sWarnedDynamicLibNotReady = false; | |
| if (!dynamicLibSuccess()) { | |
| if (!sWarnedDynamicLibNotReady) { | |
| LOG_WARNING(sLogger, | |
| ("poll perf buffer skip, dynamic lib not ready", | |
| magic_enum::enum_name(pluginType))); | |
| sWarnedDynamicLibNotReady = true; | |
| } |
| LOG_WARNING(sLogger, | ||
| ("consume perf buffer data skip, dynamic lib not ready", | ||
| magic_enum::enum_name(pluginType))); |
There was a problem hiding this comment.
此处同样存在日志刷屏风险。ConsumePerfBufferData在事件驱动循环中被频繁调用(handleEpollEvents中处理epoll事件时调用),如果动态库未就绪会持续输出警告。建议采用与PollPerfBuffers相同的日志限流策略。
These changes improve the robustness and debuggability of the eBPF components.
Change-Id: I9331a84baf69e5e1ecd1508094377a4f33a55755
Co-developed-by: Cursor noreply@cursor.com