Skip to content

[WIP] Refactor multithreading to parallelize across sequences#1

Merged
famiji merged 1 commit into
mainfrom
copilot/refactor-multithreading-implementation
May 1, 2026
Merged

[WIP] Refactor multithreading to parallelize across sequences#1
famiji merged 1 commit into
mainfrom
copilot/refactor-multithreading-implementation

Conversation

Copy link
Copy Markdown

Copilot AI commented May 1, 2026

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Original prompt

The MaxBin2 software currently has a severe performance bottleneck when running with multi-threading. It parallelizes the Expectation (E) step based on the number of abundance files (ab_num), which is often just 1 or a very small number, leaving most CPU cores idle even when a large -thread value is provided.

Please refactor the multithreading implementation to parallelize across sequences (seqnum) rather than abundance files.

Specific requirements:

  1. Remove thread limits: In src/main.cpp, around line 112, remove or comment out the logic that forces the thread count to not exceed abund_count:
    if (threadnum > abund_count)
    {
        threadnum = abund_count;
    }
  2. Refactor E-step parallelization: In src/EManager.cpp, inside the Expectation step (where threadfunc_E is called within the i < seqnum loop), change the logic so that the ThreadPool enqueues tasks per sequence (i) rather than per abundance file. You might need to create a new thread function (e.g., threadfunc_Seq(int i)) that handles the loop body for a single sequence.
  3. Ensure Thread Safety:
    • The original code shares arrays like dist_prob, dist_prob2, and abund_prob as class members. If multiple threads process different sequences simultaneously, these arrays will suffer from race conditions. Make these arrays local to the thread function, or allocate them per-thread/per-sequence.
    • Remove unnecessary mutex locks. For instance, in threadfunc_E or the new sequence function, reading from seed_abundance is a read-only operation and does not need to be wrapped in thread_mutex.lock(). Remove these unnecessary locks to improve performance.
  4. Update src/EManager.h to reflect any new or modified function signatures or removed class variables.

Make sure the code compiles cleanly (make in src/) after the modifications.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: 查看代码 你能找到这个软件不能充分利用多线程的原因吗

Assistant: 通过查看代码可以发现,MaxBin2 不能充分利用多线程(CPU 核心)的主要原因在于:它的多线程并行化是基于“丰度文件(Abundance Files)的数量”来进行的,而不是基于数据量更大的序列(Sequences)或种子(Seeds)进行切分。

以下是代码中的具体体现:

  1. 强行限制最大线程数等于丰度文件数量
    src/main.cpp 的参数解析和初始设置阶段,有如下硬编码的逻辑:
	if (threadnum > abund_count)
	{
		threadnum = abund_count;
	}
	em.setThreadNum(threadnum);

这意味着:如果你只输入了 1 个丰度文件(单样本的情况,这非常常见),即便你通过 -thread 参数设置了 32 或 64 个线程,程序也会强制将线程数 threadnum 重新覆盖为 1。因此,单样本运行时它退化成了纯单线程程序。

  1. 线程池的任务分配也受限于丰度文件数
    在核心模块 src/EManager.cpp 中的 Expectation-Maximization (EM) 算法循环中(即 threadfunc_E 的任务分发),任务也是按照 ab_num(丰度文件数量)进入线程池队列的:
				for (k = 0; k < ab_num; k++)
				{
					thread_pool->enqueue(std::bind(&EManager::threadfunc_E, this, seq_abundance[k][i], k));
				}
				thread_pool->waitFinished();

这里的并行粒度仅仅是遍历不同的丰度样本。即使你有 4 个丰度文件并使用了 4 个线程,程序也最多只能利用 4 个 CPU 核心同时计算,剩余的核心将全部处于空闲状态。对于现代动辄拥有十几、几十甚至上百个核心的服务器来说,由于丰度文件数量通常很少,导致大量的计算资源被浪费,远远达不到“充分利用多线程”的水平。

总结:由于程序强行规定并发的最高上限不能超过输入丰度文件的数量(往往只有1个或个位数),导致该软件在大部分常见场景下只能发挥单核或少数几核的性能。如果想要真正发挥多核优势,应该将并行逻辑修改为针对巨大的 seqnum (序列数量) 或 seed_num (种子数量) 层面去进行分块化(Chunking)并发。

User: 那应该如何修改呢

Assistant: 要从根本上修复这个性能瓶颈,让软件能够真正充分利用多核 CPU 处理大量数据,需要对代码进行重构。主要思路是:将并行的粒度从“丰度文件(ab_num)”改为“序列(seqnum)”或“种子(seed_num)”级别

具体修改步骤如下:

1. 移除 src/main.cpp 中的线程数硬性限制

找到并删除(或注释掉)将最大线程数限制为丰度文件数的逻辑。

// 在 src/main.cpp 约第 112 行附近
// 删掉或者注释以下代码:
/*
if (threadnum > abund_count)
{
	threadnum = abund_count;
}
*/

2. 重构 src/EManager.cpp 中的并行逻辑 (Expectation 阶段)

在原代码中,外层是 for (i = 0; i < seqnum; i++) 循环(单线程执行),内部把 k < ab_num 放进线程池。我们要反转这个逻辑,把外层的每条序列计算作为独立任务扔进线程池。

核心改动点:

  • 私有化共享变量以消除数据竞争 (Race Condition):程序原有的 dist_probdist_prob2abund_prob 是作为类成员变量共享的,如果在不同线程中同时处理不同的序列 i,这些全局缓存会被互相覆盖。需要将它们改为线程函数内的局部变量(Local variables)或者改成受序列 i 索引的二维数组。
  • 将序列计算变为多线程函数
    将原本在 run_EM 函数内的大循环 for (i = 0; i < seqnum; i++) 的代码体抽取成一个新的成员函数(例如 threadfunc_Seq(int i))。
  • 分配任务池
    将外层循环改为向 ThreadPool 下发任务:
    for (int i = 0; i < seqnum; i++) {
        if (seq->getSeqLenByNum(i) >= min_seq_length && is_profile_N[i] == false) {
            thread_pool->enqueue(std::bind(&EManager::threadfunc_Seq, this, i));
        }
    }
    thread_pool->waitFinished();

3. 解除不必要的全局锁

在原来的 threadfunc_E 中,有一段很糟糕的代码:

thread_mutex.lock();
abund_prob[j][k] = get_prob_abund(abund, seed_abundance[j][k]);
thread_mutex.unlock();

读取 seed_abundance 并不改变其数值,本质上是一个只读操作,如果前面把 abund_prob 局部化了,这个互斥锁 (thread_mutex) 就可以完全去掉。去除这把无意义的锁会带来极大的性能提升。

总结

你不需要完全重写这个工具。只需要:把作用于每个序列(sequence)的 E 步和 M 步操作打包封装,将 for(i=0; i<seqnum; i++) 交给 ThreadPool::enqueue 来调度,同时确保线程内申请自己的局部数组来存放中间概率即可。 这样,不管你传入多少个样本丰度文件,只要 contig 序列足够多,就可以跑满所有 CPU 核心。

User: 那你能完成这些修改然后编译吗

Assistant: 我已经为您启动了代码修改任务,将为您创建一个包含上述多线程重构的 Pull Request (PR)。

关于编译:作为一个在 GitHub 上的 AI 助手,我没有直接的命令行环境来为您运行 make 并直接生成二进制可执行文件发送给您。但是,一旦 Pull Request 创建完成:

  1. 如果该仓库配置了 GitHub Actions CI,代码会在 PR 中自动触发云端编译和测试。
  2. 您可以将这个 P...

This pull request was created from Copilot chat.

@famiji famiji marked this pull request as ready for review May 1, 2026 02:23
@famiji famiji merged commit 620b9b4 into main May 1, 2026
1 check failed
Copilot AI requested a review from famiji May 1, 2026 02:23
Copilot stopped work on behalf of famiji due to an error May 1, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants