feat: Integrate coolbpf cpu profiling feature in loongcollector#2391
feat: Integrate coolbpf cpu profiling feature in loongcollector#2391wokron wants to merge 57 commits intoalibaba:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR integrates coolbpf CPU profiling capabilities into loongcollector by adding a new input_cpu_profiling plugin. The implementation enables continuous CPU profiling of specified processes through command-line pattern matching and container discovery.
Key changes:
- New CPU profiling plugin with process discovery mechanism
- Integration with coolbpf profiler library
- Plugin registration and lifecycle management
Reviewed Changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| core/plugin/input/InputCpuProfiling.{h,cpp} | New plugin implementation for CPU profiling input |
| core/ebpf/plugin/cpu_profiling/* | Core CPU profiling manager and process discovery logic |
| core/ebpf/driver/CpuProfiler.h | Wrapper for coolbpf profiler library integration |
| core/ebpf/Config.{h,cpp} | CPU profiling configuration option handling |
| core/ebpf/include/export.h | Type definitions for CPU profiling |
| core/unittest/input/InputCpuProfilingUnittest.cpp | Unit tests for the input plugin |
| core/unittest/ebpf/*Unittest.cpp | Unit tests for CPU profiling components |
| Various CMakeLists.txt | Build system updates for new components |
Comments suppressed due to low confidence (1)
core/unittest/input/InputCpuProfilingUnittest.cpp:1
- Corrected spelling of 'CommandLines' to 'CommandLines' in comment context.
| static void handler_without_ctx(uint32_t pid, const char* comm, const char* stack, uint32_t cnt) { | ||
| mHandler(pid, comm, stack, cnt, mCtx); | ||
| } |
There was a problem hiding this comment.
Static members mHandler and mCtx accessed without synchronization in handler_without_ctx, which is called from Poll() while holding a lock, but could race with Start() and Stop() that modify these members. The callback could be invoked with stale or null pointer values.
| mEBPFAdapter->UpdatePlugin(PluginType::CPU_PROFILING, | ||
| buildCpuProfilingConfig(std::move(totalPids), std::nullopt, nullptr, nullptr)); |
There was a problem hiding this comment.
Passing null handler and context to buildCpuProfilingConfig during update will overwrite the valid handler set during initialization, breaking profiling event handling.
| mCallback(std::move(result)); | ||
| } | ||
|
|
||
| std::this_thread::sleep_for(std::chrono::milliseconds(100)); |
There was a problem hiding this comment.
[nitpick] Fixed 100ms sleep interval for process discovery polling is aggressive and could cause unnecessary CPU usage. Consider making this configurable or using exponential backoff when no changes are detected.
| int maxRetry = 5; | ||
| for (int retry = 0; retry < maxRetry; ++retry) { | ||
| if (QueueStatus::OK == ProcessQueueManager::GetInstance()->PushQueue(info.mQueueKey, std::move(item))) { | ||
| break; | ||
| } | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(100)); | ||
| if (retry == maxRetry - 1) { |
There was a problem hiding this comment.
[nitpick] Magic numbers 5 and 100 for retry attempts and sleep duration should be extracted as named constants or made configurable to improve maintainability and allow tuning.
| int maxRetry = 5; | |
| for (int retry = 0; retry < maxRetry; ++retry) { | |
| if (QueueStatus::OK == ProcessQueueManager::GetInstance()->PushQueue(info.mQueueKey, std::move(item))) { | |
| break; | |
| } | |
| std::this_thread::sleep_for(std::chrono::milliseconds(100)); | |
| if (retry == maxRetry - 1) { | |
| for (int retry = 0; retry < kMaxQueuePushRetry; ++retry) { | |
| if (QueueStatus::OK == ProcessQueueManager::GetInstance()->PushQueue(info.mQueueKey, std::move(item))) { | |
| break; | |
| } | |
| std::this_thread::sleep_for(std::chrono::milliseconds(kQueuePushRetrySleepMs)); | |
| if (retry == kMaxQueuePushRetry - 1) { |
| std::lock_guard guard(mMutex); | ||
| mRouter.clear(); | ||
| for (auto& [configKey, pids] : result) { | ||
| for (auto& pid : pids) { | ||
| totalPids.insert(pid); | ||
| auto it = mRouter.emplace(pid, std::unordered_set<ConfigKey>{}).first; | ||
| auto& configSet = it->second; | ||
| configSet.insert(configKey); | ||
| } | ||
| } |
There was a problem hiding this comment.
Clearing mRouter in HandleProcessDiscoveryEvent creates a race condition with HandleCpuProfilingEvent which reads from mRouter. Events arriving between clear and rebuild could be lost or routed incorrectly.
| // TODO: make this non-static | ||
| inline static livetrace_profiler_read_cb_ctx_t mHandler = nullptr; | ||
| inline static void* mCtx = nullptr; |
There was a problem hiding this comment.
Static member variables for instance-specific handler and context violate encapsulation and prevent multiple CpuProfiler instances from working correctly. This TODO should be addressed before production use.
ecf5b83 to
a3a8e56
Compare
c00d080 to
0f58800
Compare
364e795 to
2a5948d
Compare
2a5948d to
6c51221
Compare
No description provided.