-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Hi,
I am using your tool for somatic SNV calling and have encountered a fatal error during the cellScan step. The issue appears to be a scalability problem when processing chromosomes that are rich in both cell count and candidate SNVs.
When running the cellScan module, the program crashes with a ValueError: Unstacked DataFrame is too big, causing int32 overflow. This happens after the log prints "Collect single cell level information from sequencing data...".
Could you please advise on a potential workaround, or if a fix for this scalability issue is planned?Thank you for your work on this great tool.
Here is the relevant portion of my log file, including the full traceback:
[2025-07-07 11:54:28,864] INFO Monopogen.py Collect single cell level information from sequencing data...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/kng01/long/monopogen/src/somatic.py", line 337, in bam2mat
mat_merge =mat_merge.pivot_table(index='snvIndex', columns='cellIndex', values='flag', aggfunc='first', fill_value='')
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/frame.py", line 7031, in pivot_table
return pivot_table(
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/pivot.py", line 146, in pivot_table
table = agged.unstack(to_unstack)
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/frame.py", line 7352, in unstack
result = unstack(self, level, fill_value)
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 417, in unstack
return _unstack_frame(obj, level, fill_value=fill_value)
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 444, in _unstack_frame
return _Unstacker(
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 116, in init
raise ValueError("Unstacked DataFrame is too big, causing int32 overflow")
ValueError: Unstacked DataFrame is too big, causing int32 overflow
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/kng01/long/monopogen/src/Monopogen.py", line 340, in
main()
File "/kng01/long/monopogen/src/Monopogen.py", line 333, in main
args.func(args)
File "/kng01/long/monopogen/src/Monopogen.py", line 172, in somatic
result = pool.map(bam2mat, joblst)
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: Unstacked DataFrame is too big, causing int32 overflow