ValueError in cellScan: "Unstacked DataFrame is too big, causing int32 overflow"

Hi,

I am using your tool for somatic SNV calling and have encountered a fatal error during the cellScan step. The issue appears to be a scalability problem when processing chromosomes that are rich in both cell count and candidate SNVs.

When running the cellScan module, the program crashes with a ValueError: Unstacked DataFrame is too big, causing int32 overflow. This happens after the log prints "Collect single cell level information from sequencing data...".

Could you please advise on a potential workaround, or if a fix for this scalability issue is planned?Thank you for your work on this great tool.

Here is the relevant portion of my log file, including the full traceback:

[2025-07-07 11:54:28,864] INFO       Monopogen.py Collect single cell level information from sequencing data...
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/kng01/long/monopogen/src/somatic.py", line 337, in bam2mat
    mat_merge =mat_merge.pivot_table(index='snvIndex', columns='cellIndex', values='flag', aggfunc='first', fill_value='')
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/frame.py", line 7031, in pivot_table
    return pivot_table(
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/pivot.py", line 146, in pivot_table
    table = agged.unstack(to_unstack)
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/frame.py", line 7352, in unstack
    result = unstack(self, level, fill_value)
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 417, in unstack
    return _unstack_frame(obj, level, fill_value=fill_value)
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 444, in _unstack_frame
    return _Unstacker(
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/site-packages/pandas/core/reshape/reshape.py", line 116, in __init__
    raise ValueError("Unstacked DataFrame is too big, causing int32 overflow")
ValueError: Unstacked DataFrame is too big, causing int32 overflow
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/kng01/long/monopogen/src/Monopogen.py", line 340, in <module>
    main()
  File "/kng01/long/monopogen/src/Monopogen.py", line 333, in main
    args.func(args)
  File "/kng01/long/monopogen/src/Monopogen.py", line 172, in somatic
    result = pool.map(bam2mat, joblst)
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/long/miniconda3/envs/monopogen_env/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: Unstacked DataFrame is too big, causing int32 overflow





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError in cellScan: "Unstacked DataFrame is too big, causing int32 overflow" #118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ValueError in cellScan: "Unstacked DataFrame is too big, causing int32 overflow" #118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions