Skip to content

Fix pandas 2.x ValueError in monomer SDF loading#20

Open
anagnorisis2peripeteia wants to merge 1 commit into
Boehringer-Ingelheim:masterfrom
anagnorisis2peripeteia:fix/pandas2-monomer-loading
Open

Fix pandas 2.x ValueError in monomer SDF loading#20
anagnorisis2peripeteia wants to merge 1 commit into
Boehringer-Ingelheim:masterfrom
anagnorisis2peripeteia:fix/pandas2-monomer-loading

Conversation

@anagnorisis2peripeteia
Copy link
Copy Markdown

Summary

Fixes #18

Pandas 2.x introduced Arrow-backed storage for string columns. When
get_monomer_info() and _load_monomer_sdf() attempt to write a parsed
Python list into these columns via df.loc[idx, col] = [...], pandas 2.x
raises:

ValueError: Must have equal len keys and value when setting with an iterable

because it interprets the list as multiple row values rather than a single
list object for one cell.

Fix

Two changes, applied in both sequence.py and monomerlib.py:

  1. Cast the three list-valued columns (m_Rgroups, m_RgroupIdx,
    m_attachmentPointIdx) to object dtype before writing into them.
    Only these three columns are cast — the rest of the DataFrame retains
    its pandas 2.x type optimisations.

  2. Use df.at[idx, col] (single-cell assignment) instead of
    df.loc[idx, col] (which pandas 2.x misinterprets as a broadcast
    when given an iterable).

Note on duplication

get_monomer_info() in sequence.py and _load_monomer_sdf() in
monomerlib.py implement the same SDF loading and list-parsing logic
independently. This fix is applied to both. A follow-up refactor could
consolidate them into a single shared loader to avoid this kind of
divergence in future — happy to open a separate PR for that if useful.

Pandas 2.x introduced Arrow-backed string columns that reject list
assignment via loc/at, raising:
  ValueError: Must have equal len keys and value when setting with an iterable

Fix: cast the three list-valued columns (m_Rgroups, m_RgroupIdx,
m_attachmentPointIdx) to object dtype before writing parsed list values
into them, and use df.at[] (single-cell scalar assignment) instead of
df.loc[] (which pandas 2.x interprets as a multi-row broadcast when
given an iterable).

Applied in both get_monomer_info() (sequence.py) and
_load_monomer_sdf() (monomerlib.py), which duplicate the same loading
pattern.

Closes Boehringer-Ingelheim#18
@anagnorisis2peripeteia anagnorisis2peripeteia deleted the fix/pandas2-monomer-loading branch May 1, 2026 13:53
@anagnorisis2peripeteia anagnorisis2peripeteia restored the fix/pandas2-monomer-loading branch May 1, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new pandas breaks function "get_monomer_info"; fix provided

1 participant