Skip to content

ValueError in cellScan: "Unstacked DataFrame is too big, causing int32 overflow" #118

@lhy329

Description

@lhy329

Hi,

I am using your tool for somatic SNV calling and have encountered a fatal error during the cellScan step. The issue appears to be a scalability problem when processing chromosomes that are rich in both cell count and candidate SNVs.

When running the cellScan module, the program crashes with a ValueError: Unstacked DataFrame is too big, causing int32 overflow. This happens after the log prints "Collect single cell level information from sequencing data...".

Could you please advise on a potential workaround, or if a fix for this scalability issue is planned?Thank you for your work on this great tool.

Here is the relevant portion of my log file, including the full traceback:

[2025-07-07 11:54:28,864] INFO Monopogen.py Collect single cell level information from sequencing data...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/kng01/long/monopogen/src/somatic.py", line 337, in bam2mat
mat_merge =mat_merge.pivot_table(index='snvIndex', columns='cellIndex', values='flag', aggfunc='first', fill_value='')
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/frame.py", line 7031, in pivot_table
return pivot_table(
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/pivot.py", line 146, in pivot_table
table = agged.unstack(to_unstack)
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/frame.py", line 7352, in unstack
result = unstack(self, level, fill_value)
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 417, in unstack
return _unstack_frame(obj, level, fill_value=fill_value)
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 444, in _unstack_frame
return _Unstacker(
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 116, in init
raise ValueError("Unstacked DataFrame is too big, causing int32 overflow")
ValueError: Unstacked DataFrame is too big, causing int32 overflow
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/kng01/long/monopogen/src/Monopogen.py", line 340, in
main()
File "/kng01/long/monopogen/src/Monopogen.py", line 333, in main
args.func(args)
File "/kng01/long/monopogen/src/Monopogen.py", line 172, in somatic
result = pool.map(bam2mat, joblst)
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
ValueError: Unstacked DataFrame is too big, causing int32 overflow

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions