Add model metadata #135

willdumm · 2025-05-02T18:55:19Z

This PR adds the following values to metadata of saved models:

multihit_model_name: expected to be a key in netam.pretrained.PRETRAINED_MULTIHIT_MODELS. Defaults to netam.models.DEFAULT_MULTIHIT_MODEL. For crepes saved without this data, defaults to None.
neutral_model_name: expected to be a named pretrained neutral model. Defaults to netam.models.DEFAULT_NEUTRAL_MODEL. For crepes saved without this data, defaults to ThriftyHumV0.2-59.
train_timestamp: a UTC timestamp taken at the time of model initialization, if not provided explicitly (e.g. 2025-05-01T22:05). For crepes saved without this data, defaults to old
model_type: either dnsm, dasm, or ddsm which must be provided at the time of model instantiation. For crepes saved without this data, defaults to unknown, and will throw warnings.

As hinted at above, I added a dictionary containing pretrained multihit models to netam.pretrained. These models can be accessed by name using netam.pretrained.load_multihit.

Requires companion PR https://github.com/matsengrp/dnsm-experiments-1/pull/132

Copilot

Pull Request Overview

This PR extends model metadata to include multihit and neutral model settings and integrates these changes across tests and core model functions.

Updates tests to load and use multihit models via load_multihit.
Extends AbstractBinarySelectionModel and SingleValueBinarySelectionModel with new metadata (including model_type, train_timestamp, neutral_model_name, and multihit_model_name) and adjusts hyperparameter defaults.
Enhances framework functions (including add_shm_model_outputs_to_pcp_df and DXSMBurrito initialization) to verify model metadata consistency.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_simulation.py	Uses load_multihit to retrieve multihit model and adds tolerance in allclose check; reassigns train_dataset to val_dataset.
tests/test_multihit.py	Updates model instantiation to pass model_type and generate multihit_model_name from model weights.
tests/test_dnsm.py, test_ddsm.py, test_dasm.py, test_ambiguous.py	Integrates new parameter model_type and multihit_model into model/dataset creation.
netam/pretrained.py	Introduces load_multihit and name_and_multihit_model_match for multihit model handling.
netam/models.py	Extends metadata in model constructors and updates reinitialize_weights, to_weights, and from_weights methods.
netam/framework.py	Adds default hyperparameter values for legacy models and filters sequences in add_shm_model_outputs_to_pcp_df.
netam/dxsm.py	Implements metadata validation with warnings regarding model_type and multihit model consistency.

netam/models.py

tests/test_simulation.py

netam/dxsm.py

matsen

Just a few final todos 👍

netam/dxsm.py

netam/pretrained.py

willdumm · 2025-05-28T22:21:26Z

netam/hit_class.py

@@ -66,13 +63,7 @@ def apply_multihit_correction(
    per_parent_hit_class = parent_specific_hit_classes(parent_codon_idxs)
    corrections = torch.cat([torch.tensor([0.0]), log_hit_class_factors]).exp()
    reshaped_corrections = corrections[per_parent_hit_class]
-    unnormalized_corrected_probs = clamp_probability(codon_probs * reshaped_corrections)


This is just a refactor -- the forward method of the multihit model still sets the parent codon probability, but this allows the model to expose a method that adjusts codon probs but does not set the parent codon probability.

willdumm added 7 commits April 30, 2025 14:00

neutral model and timestamp

69dcfb0

clearer metadata naming

62b2ae6

WIP start on multihit in metadata

1ae30c3

purge multihit crepe prefix

fbcd604

fix tests

9994454

convert checks to warnings

c782487

new warning

ecc55bf

willdumm requested a review from Copilot May 2, 2025 18:55

Copilot AI reviewed May 2, 2025

View reviewed changes

netam/models.py Outdated Show resolved Hide resolved

tests/test_simulation.py Show resolved Hide resolved

willdumm requested a review from matsen May 2, 2025 19:15

willdumm commented May 2, 2025

View reviewed changes

netam/dxsm.py Outdated Show resolved Hide resolved

matsen approved these changes May 2, 2025

View reviewed changes

netam/dxsm.py Outdated Show resolved Hide resolved

netam/pretrained.py Show resolved Hide resolved

willdumm and others added 18 commits May 2, 2025 14:46

immutable default arguments

46e6e4a

fix integer casting bug

1335c77

add new pretrained model

a3b4fb7

format

a0fb63c

update backward compat test for correct neutral model

20d55ab

enable simulation on ambiguous sequences

6086c06

Path management better practice

a701e00

fix zero branch length sampling

a14d1b1

tweaks for numerical stability

d31ba6f

format

d1845dc

slight multihit refactor

99aef17

incremental updates

ff1acdb

Woohoo

f513941

failing test

3210387

a failing test

7756ac3

more comprehensive test

5fe9f6b

better tests and fix sim masking

a2dd2a5

experiment with codon masking

f96d0f9

dnsm works as well as dasm

cebb386

willdumm commented May 28, 2025

View reviewed changes

testing ambiguities...

0bcda22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add model metadata #135

Add model metadata #135

Uh oh!

willdumm commented May 2, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matsen left a comment

Uh oh!

Uh oh!

Uh oh!

willdumm May 28, 2025

Uh oh!

Uh oh!

Add model metadata #135

Are you sure you want to change the base?

Add model metadata #135

Uh oh!

Conversation

willdumm commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

willdumm May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

willdumm commented May 2, 2025 •

edited

Loading