-
Notifications
You must be signed in to change notification settings - Fork 2
Update for v3 data #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update for v3 data #142
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the codebase to use the new v3 data column naming conventions (e.g. "parent_heavy"/"parent_light" instead of "parent_h"/"parent_l") throughout tests and core modules. Key changes include renaming columns in tests files, updating dataset preparation functions, and aligning the neutral model outputs to use the new naming conventions.
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_tokens.py | Updated tuple assignment to use "parent_heavy" and "parent_light" |
| tests/test_simulation.py | Renamed column accesses from "parent_h"/"child_h" to "parent_heavy"/"child_heavy" |
| tests/test_netam.py | Adapted DataFrame column keys to new naming convention |
| tests/test_multihit.py | Renamed DataFrame columns and tensor keys to reflect heavy chain naming |
| tests/test_molevol.py | Updated tensor column access for neutral model outputs |
| netam/sequences.py | Renamed parameter usage in sequence preparation |
| netam/multihit.py | Updated references to DataFrame column keys and neutral model outputs |
| netam/framework.py | Adjusted CSV loading and heavy/light chain column processing |
| netam/dxsm.py, dnsm.py, ddsm.py | Updated error messages and log probability calculations using new column keys |
| netam/data_format.py | Added new column definitions for heavy/light data |
| netam/dasm.py | Updated log probability stacking to use heavy/light nomenclature |
ecea364 to
853cb6f
Compare
Addresses https://github.com/matsengrp/dnsm-experiments-1/issues/139
Also changes all anarci and plotter dict keys from "h" and "l" to "heavy" and "light", and introduces
netam.data_format.pywhich specifies pcp_df column types. The most notable change is that nowfamilyis always a string, instead of an int sometimes as was the case before.