[WIP] Fix threading performance issue in EManager#2
Merged
Conversation
Agent-Logs-Url: https://github.com/famiji/maxbin2/sessions/af3c2213-d9a8-4ea9-8a91-9018784b20e8 Co-authored-by: famiji <63293851+famiji@users.noreply.github.com>
Copilot stopped work on behalf of
famiji due to an error
May 1, 2026 02:56
Copilot stopped work on behalf of
famiji due to an error
May 1, 2026 03:05
Copilot stopped work on behalf of
famiji due to an error
May 1, 2026 03:14
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
src/main.cpp(lines 105-108)threadfunc_Seq_Eandthreadfunc_Seq_Csignatures tosrc/EManager.h, add#include <atomic>threadfunc_Seq_Eimplementation insrc/EManager.cpp:dist_prob,dist_prob2,abund_probarraysEucDistandSpearmanDistper-threadk < ab_numthread pool with inline computationdiff_countfor (i < seqnum)E-step inrun_EMwithfor (i) enqueue threadfunc_Seq_Ethreadfunc_Seq_Cimplementation and refactorclassifysimilarlythread_mutexaroundget_prob_abundinthreadfunc_Ethread_mutexaroundseq->getSeqLenByNum(j)inthreadfunc_MOriginal prompt
The previous agent session failed and created an EMPTY pull request without actually modifying any files. The performance issue still exists.
You MUST ACTUALLY EDIT the following files:
src/main.cpp: Comment out the lines that limitthreadnumtoabund_count(around lines 105-108).src/EManager.h: Add a new thread function signature:void threadfunc_Seq_E(int i, int* diff_count);or similar, and remove/modify what's needed.src/EManager.cpp:i < seqnumloop inrun_EM(Expectation step) andclassify. Instead of enqueuingk < ab_num, enqueue the sequence indexi.dist_prob,abund_prob, etc., must be made LOCAL to the sequence thread function (or allocated as a 2D array per sequence) to avoid race conditions.thread_mutex.lock()aroundget_prob_abundandseq->getSeqLenByNum(j)because they are read-only and cause severe contention.Make sure you use the
create_or_update_filetool to ACTUALLY SAVE your changes before completing the task. The PR must have a valid diff showing these optimizations.The following is the prior conversation context from the user's chat exploration (may be truncated):
User: 查看代码 你能找到这个软件不能充分利用多线程的原因吗
Assistant: 通过查看代码可以发现,MaxBin2 不能充分利用多线程(CPU 核心)的主要原因在于:它的多线程并行化是基于“丰度文件(Abundance Files)的数量”来进行的,而不是基于数据量更大的序列(Sequences)或种子(Seeds)进行切分。
以下是代码中的具体体现:
在
src/main.cpp的参数解析和初始设置阶段,有如下硬编码的逻辑:if (threadnum > abund_count) { threadnum = abund_count; } em.setThreadNum(threadnum);这意味着:如果你只输入了 1 个丰度文件(单样本的情况,这非常常见),即便你通过
-thread参数设置了 32 或 64 个线程,程序也会强制将线程数threadnum重新覆盖为 1。因此,单样本运行时它退化成了纯单线程程序。在核心模块
src/EManager.cpp中的 Expectation-Maximization (EM) 算法循环中(即threadfunc_E的任务分发),任务也是按照ab_num(丰度文件数量)进入线程池队列的:这里的并行粒度仅仅是遍历不同的丰度样本。即使你有 4 个丰度文件并使用了 4 个线程,程序也最多只能利用 4 个 CPU 核心同时计算,剩余的核心将全部处于空闲状态。对于现代动辄拥有十几、几十甚至上百个核心的服务器来说,由于丰度文件数量通常很少,导致大量的计算资源被浪费,远远达不到“充分利用多线程”的水平。
总结:由于程序强行规定并发的最高上限不能超过输入丰度文件的数量(往往只有1个或个位数),导致该软件在大部分常见场景下只能发挥单核或少数几核的性能。如果想要真正发挥多核优势,应该将并行逻辑修改为针对巨大的
seqnum(序列数量) 或seed_num(种子数量) 层面去进行分块化(Chunking)并发。User: 那应该如何修改呢
Assistant: 要从根本上修复这个性能瓶颈,让软件能够真正充分利用多核 CPU 处理大量数据,需要对代码进行重构。主要思路是:将并行的粒度从“丰度文件(ab_num)”改为“序列(seqnum)”或“种子(seed_num)”级别。
具体修改步骤如下:
1. 移除
src/main.cpp中的线程数硬性限制找到并删除(或注释掉)将最大线程数限制为丰度文件数的逻辑。
2. 重构
src/EManager.cpp中的并行逻辑 (Expectation 阶段)在原代码中,外层是
for (i = 0; i < seqnum; i++)循环(单线程执行),内部把k < ab_num放进线程池。我们要反转这个逻辑,把外层的每条序列计算作为独立任务扔进线程池。核心改动点:
dist_prob、dist_prob2、abund_prob是作为类成员变量共享的,如果在不同线程中同时处理不同的序列i,这些全局缓存会被互相覆盖。需要将它们改为线程函数内的局部变量(Local variables)或者改成受序列i索引的二维数组。将原本在
run_EM函数内的大循环for (i = 0; i < seqnum; i++)的代码体抽取成一个新的成员函数(例如threadfunc_Seq(int i))。将外层循环改为向
ThreadPool下发任务:3. 解除不必要的全局锁
在原来的
threadfunc_E中,有一段很糟糕的代码:读取
seed_abundance并不改变其数值,本质上是一个只读操作,如果前面把abund_prob局部化了,这个互斥锁 (thread_mutex) 就可以完全去掉。去除这把无意义的锁会带来极大的性能提升。总结
你不需要完全重写这个工具。只需要:把作用于每个序列(sequence)的 E 步和 M 步操作打包封装,将
for(i=0; i<seqnum; i++)交给ThreadPool::enqueue来调度,同时确保线程内申请自己的局部数组来存放中间概率即可。 这样,不管你传入多少个样本丰度文件,只要 contig 序列足够多,就可以跑满所有 CPU 核心。User: 那你能完成这些修改然后编译吗
Assistant: 我已经为您启动了代码修改任务,将为您创建一个包含上述多线程重构的 Pull Request (PR)。
关于编译:作为一个在 GitHub 上的 AI 助手,我没有直接的命令行环境来为您运行
make并直接生成二进制可执行文件发送给您。但是,一旦 Pull Request 创建完成:src/目录下直接运行make命令,即可编译出充分利用多核性能的新版MaxBin可执行文件。请等待上方的工具运行完成确认修改,或者您可以在仓库的 Pull Requests 页面中查看进度!如果您需要我协助指导如何在本地编译,请随时告诉我。
User: 这是别人的仓库 你应该pull到我的中 然后再进行操作
Assistant: 我刚才查询了一下,您的账号
famiji下目前还没有maxbin2这个仓库。因为我作为 AI 没有权限直接替您在 GitHub 上执行“Fork(复刻)”创建新仓库的操作,需要麻烦您先手动复刻一下:
famiji/maxbin2)。完成 Fork 之后,请回复我一下。我就可以直接对
famiji/maxbin2仓库进行操作:帮您自动修改main.cpp和EManager.cpp中的多线程代码、生成正确的逻辑并提交到您的仓库中。到时候您只需要将自己仓库的代码git pull到本地,执行make就可以编译出高性能版本了!User: 我fork了 @famiji/maxbin2
Assistant: 我已经启动了代码智能代理,它正在您的仓库
famiji/maxbin2中自动帮您进行代码重构,并将创建一个 Pull Request (PR)。代理会自动完成以下工作:
src/main.cpp中强行限制线程数等于丰度文件数...This pull request was created from Copilot chat.