Traits by petrelharp · Pull Request #1792 · popsim-consortium/stdpopsim

petrelharp · 2026-01-21T18:07:01Z

This started with the code from @jeffspence's master branch that he and @roshnipatel got together.

Proposed rough low-level API:

# set up traits
traits = stdpopsim.TraitsModel(
    [(name, link_function), (name, link_function), ... ],
    default_environmental_sigma = 0,  # if no environments are added, they are all Gaussian with this sigma
    default_fitness_sigma = Inf, # similarly, default Gaussian
)
traits.add_environment( ... ) # optionally
traits.add_fitness_function(...) # optionally
# maybe add more of each

# set up "distribution of mutational effects" which is like DFE:
esd = stdpopsim.traits.DistributionOfMutationEffects(  # maybe? they will be (mostly?) generic
   mutation_types = [ some mutation types ],
   proportions = [ should add to 1 ],
   ]
)
contig.add_dme(esd, ...) # like add_dfe

# use them
engine.simulate(contig, traits)

Note: we could alternatively instantiate Environment and FitnessFunctions to pass into the Trait constructor,
but these have to refer to "which traits do I work with", so it's cleaner to have the traits set up
before we make them.

Notes:

terminology: "trait" is the concept; "phenotype" is the actual measured value (or maybe "trait value"?)

`TraitsModel`:

Encompasses all the stuff: how they are constructed, how they change with time. (Thus, "model".)

Has

a list of Traits, each of which has a link function
a list of Environments (each of which can affect a collection of the traits)
a list of FitnessFunctions (each of which operate on a collection of the traits)

`FitnessFunction`

Has:

an id (important for debugging)
trait type: quantitative, binary, fitness, or neutral
- this is just basically "which fitness function"
when and where this applies

`Environment`

This affects how the genetic value maps to trait phenotypic values,
including the part that changes with time/space.
Environments can affect more than one trait (so you have correlated noise).
Possibly not public?

Adds "noise" to the genetic value. Has:

and id (important for debugging)
distribution and parameters
when and where this applies

Trait

This describes how to produce the observed biological trait value (ie, the phenotype):
for instance, describes how disease liability translates to disease incidence,
so it should not change with time or location.
Link functions map only a single trait, independently of others.

GeneticValueTransform? PhenotypeTransform? Well, this applies to "genetic value plus environment" so it's not a "genetic value transform"? Something else?

This transforms the underlying (genetic value plus environemtn) to the "observed trait". Has:

an id
whether it's multiplicative or additive
a function
maybe some randomness (but NOT what could be put in with environment)

Examples:

identity
threshold
liability: trait equals 1 with probability logit(value)

`DistributionOfMutationEffects`

Maps mutations to traits, so has

a list of MutationTypes
and corresponding "what proportion of mutations"

`MutationType`

A (biologically motivated, hopefully) class of mutations. As before, but now there's more than one trait, so also needs to know:

which traits this affects (defaults to "fitness")

List of standard use cases:

See below

We should make it very easy to produce one of each of these,
something like stdpopsim.traits.StabilizingSelectionOnTrait(n, sigma)

Miscellaneous TODOs that may require getting into some weeds, and should turn into issues when we merge this:

gregorgorjanc · 2026-01-21T19:59:45Z

WIP ...

Some thoughts on the lower part of #1792 (comment), as in these are some standard use cases and an attempt to produce each of these with ease.

While thinking about these I struck me that the TraitsModel specifies properties of traits, while these "use cases" below are talking about how selection operates on these, which is another "thing". Do we need to think how to encode Selection? When thinking about Fitness this comes up naturally, but we might also want to do selection on non-fitness traits. Will the Fitness component of the TraitsModel capture all of these different types of selection on all the traits that the user would like to select on?

Stabilizing selection

stdpopsim.traits.StabilizingSelection(trait, sigma)
# trait: trait name or index
# sigma: scale of the Gaussian fitness function
* this will call fitness function
* what else?

Directional selection

stdpopsim.traits.DirectionalSelection(trait, ???, direction = "high", )
# trait: trait name or index
# ??? TODO: Peter mentioned exp(x) type thing so we increase probability of selection for individuals with high/low trait - do we need a parameter beta to do smth like exp(\beta x) 
# direction: "high" or "low"

TODO: Discuss how to handle multiple traits here.
We could add a linear combination argument (could also be non-linear function!?),
possibly trait name/index could take in a function too?
This function would combine multiple traits into a single score, then do the exp(x)

Truncation selection

stdpopsim.traits.TruncationSelection(trait, proportion_selected, direction = "high")
# trait: trait name or index
# propotion_selected: proportion of individuals selected each generation [0, 1]
# direction: "high" or "low"

TODO: Discuss how to handle multiple traits here - the above assumes a single trait.
We can not have proportion_selected across multiple traits.
We could add a linear combination argument (could also be non-linear function!?),
possibly trait name/index could take in a function too?
This function would combine multiple traits into a single score,
then we take the top/bottom prop of individuals based on the score.

Truncation selection (on a binary trait)

The same as directional selection above, but applied to 0/1 scores?
Say, we have proportion_selected and if Pr(1) is lower than that, we select all 1s,
and then fill the rest with 0s picked at random?

Direct effects on fitness (previous behavior)

TODO: This is just a special case of directional selection on a trait,
but the trait is fitness. Where is it stored? How to refer to it?

Neutral

Do nothing?

petrelharp · 2026-01-22T23:12:23Z

Outline:

Note: unlike what's shown in this picture, DistributionOfMutationEffects is not an attribute of TraitsModel; that gets added to a contig instead.

petrelharp · 2026-01-23T17:35:41Z

Okay - by my reading, MutationType and MultivariateMutationType are (under the current proposal) identical, except that MultivariateMutationType has an argument phenotype_ids (ie, which phenotypes does it affect). That means we can just use MutationType and keep backwards compatibility if we make the default be phenotype_ids = "fitness". How's that sound? For reference, the MutationType properties:

petrelharp · 2026-01-23T17:46:33Z

Okay, so we might also decide that DFE and DME (or whatever we are calling it) are the same thing, but I'm voting not to do this, because:

the name "DFE" doesn't describe what we want to use DME for
and DFE has a few things we don't think we'll have in DME (citation, maybe QC)

and so this points towards DFE being a sub-class of DME.

For reference, DFE:

petrelharp · 2026-01-23T18:17:34Z

Having consulted with @roshnipatel, I've gone ahead and done those last two things (we can un-do them, however).

And, I've moved everything to traits.py; for backwards compatibility dfe.py is still there, but just importing things from traits.py. (I think however we can just delete it...)

petrelharp · 2026-01-25T05:14:20Z

Discussing with @gregorgorjanc today, we realized that in practice the "threshold" fitness function should not take a fixed value, it should take a fixed quantile (e.g., "fitness 1 to the top 20% of the population; fitness 0 to the rest").

petrelharp · 2026-01-25T05:30:49Z

And consulting with @roshnipatel we realized that basically everything needs IDs, so that when we put together a model we can properly debug it (ie check things went where we thought they should).

codecov · 2026-01-25T05:45:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.82%. Comparing base (e23d255) to head (67c7fc3).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #1792    +/-   ##
========================================
  Coverage   99.81%   99.82%            
========================================
  Files         142      143     +1     
  Lines        4873     5012   +139     
  Branches      472      513    +41     
========================================
+ Hits         4864     5003   +139     
  Misses          6        6            
  Partials        3        3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

petrelharp · 2026-01-25T05:47:04Z

petrelharp · 2026-02-08T05:59:12Z

I've added @gregorgorjanc's edits from petrelharp#4 and realized I forgot to commit test_traits.py; now committed.

petrelharp · 2026-02-12T19:31:00Z

Okay, consulting with @roshnipatel, here's a proposal for the high-level API:

tm = stdpopsim.StabilizingSelection(...) # returns a Traits Model
dme = stdpopsim.dme_for_traits_model(tm, "gaussian", ...)  # returns a DME that fits with `tm`
contig.add_dme(dme, ...)
engine.simulate(model, contig, tm, samples, ...)

(dme_for_traits_model may not be the best name.)

Rationale:

we need the DME to match the TraitsModel
however, each TraitsModel doesn't have, like, a specific DME that best fits with it: like maybe there's only one or two natural DMEs, so it makes sense to have this choice orthogonal.

If this sounds good, more TODOs:

write dme_for_traits_model
enumerate the standard DMEs we want to provide

roshnipatel · 2026-02-12T19:43:56Z

seems like a starter list of very-basic use cases might be something like:

StabilizingSelection(num_traits, optima, sigma) - where maybe there are defaults for optima and sigma
DirectionalSelection(num_traits, ...) - @gregorgorjanc maybe you can advise here?
TruncationSelection(num_traits, proportion)
Neutral(num_traits)
StabilizingSelectionOptimaShift(num_traits, optima=[old, new], sigma, time, pops)
EnvironmentMeanShift(num_traits, mean=[old, new], sigma, time, pops) - seems like this should return an Environment, and not a TraitsModel? Another way to formulate this could be to let users provide a dict {"pop1": (env_params)} describing environmental params for each pop in the demographic model

petrelharp · 2026-02-13T04:01:48Z

Just to be clear here, this is the same list that @gregorgorjanc started above, right? So, this is a proposal for what the methods and arguments are?

gregorgorjanc · 2026-02-13T22:58:47Z

DirectionalSelection(num_traits, ...) - @gregorgorjanc maybe you can advise here?

@roshnipatel: after talking with @petrelharp, I think we should separate two mechanisms that are often conflated in breeding contexts:

Truncation selection: pick the top (or bottom) proportion of individuals, then generate offspring from that selected set (with a specified mating scheme).
Directional selection: assign fitness directly as a monotonic function of trait value (no explicit truncation step), so higher-trait (or lower-trait) individuals contribute more offspring.

We could parameterise directional selection like w(z) = a + b z or w(z) = exp(a + b z) (Malthusian-style).

So maybe API-wise:

DirectionalSelection(num_traits, a, b, fitness_model="linear"|"exponential", direction="top"|"bottom")

@petrelharp if you recall any canonical references for this formulation, that would help us pick default forms, language etc.

Thoughts?

gregorgorjanc · 2026-02-13T22:59:24Z

I guess we should also add direction to TruncationSelection() too!?

petrelharp · 2026-02-14T16:54:00Z

I'd have to go look at the papers that are out there using "directional selection", but my memory is that "linear" is most usual? Note that "linear with a multiplicative trait" and "exponential with an attidive trait" are the same thing, and "linear with an additive trait" runs the risk of negative fitnesses, so my vote is just to have one of these.

roshnipatel

@jeffspence and I did a code review together and had some comments:

There's some ambiguity around whether multivariate fixed effects are allowed in MutationType -- given lines 740-744, it seems they ought to be, and we should provide that functionality/make it explicit in the docstring
_check_args_list is pretty inflexible and we might eventually want to provide distribution params that do not match the total number of dimensions (e.g. weights for a mixture of Gaussians) -- I think we ought to remove this function and outsource args-checking to the relevant functions (i.e. _check_distribution and the FitnessFunction initialization)
- Also, the dim argument is not being used at the moment
A longer conversation for later: MutationType allows sampling from a distribution that could give you negative fitness (which might have been less of an issue when DFEs were largely catalog-provided, but feels like something to address in our new era of user-generated distributions)

Minor notes:

We ought to check for nonempty trait/environment/fitness IDs
Re: line 469 I think I agree this should happen in slim_engine.py
We might want to expand the list of neutral DFEs in MutationType.is_neutral( )
Are we requiring that a multivariate normal distribution has 2+ dimensions? (We don't seem to explicitly check this anywhere...)
Line 351 has an incomplete docstring (but I think we actually have decided what it's supposed to be?)
test_traits.py requires function arguments are provided as lists, but that seems rather strict -- maybe we should just check that things can be cast to a numpy array?

petrelharp · 2026-03-03T05:40:54Z

@jeffspence and I did a code review together and had some comments:

Thanks!

* There's some ambiguity around whether multivariate fixed effects are allowed in `MutationType` -- given [lines 740-744](https://github.com/popsim-consortium/stdpopsim/pull/1792/changes#diff-3e1c6050c108f23af4908d64fb80d312896c6fd1caecfc9145a938908697c536R740-R744), it seems they ought to be, and we should provide that functionality/make it explicit in the docstring

Good point; I adjusted the docstring. However, I wasn't thinking of this as final, as a TODO is to look at making other types multivariate, and resolving how the gaussians are parameterized (see lists above).

* `_check_args_list` is pretty inflexible and we might eventually want to provide distribution params that do not match the total number of dimensions (e.g. weights for a mixture of Gaussians) -- I think we ought to remove this function and outsource args-checking to the relevant functions (i.e. `_check_distribution` and the `FitnessFunction` initialization)

Agree! tidying up the "checking" code is a TODO.

  * Also, the `dim` argument is not being used at the moment

Good catch. I'm leaving that as part of the "tidy the checking code" TODO.

* A longer conversation for later: `MutationType` allows sampling from a distribution that could give you negative fitness (which might have been less of an issue when DFEs were largely catalog-provided, but feels like something to address in our new era of user-generated distributions)

Ah yep. Will add to the TODO list.

Minor notes:

* We ought to check for nonempty trait/environment/fitness IDs

Ah, sure. Done.

* Re: [line 469](https://github.com/popsim-consortium/stdpopsim/pull/1792/changes#diff-3e1c6050c108f23af4908d64fb80d312896c6fd1caecfc9145a938908697c536R469) I think I agree this should happen in `slim_engine.py`

Great, that's in the TODO list.

* We might want to expand the list of neutral DFEs in `MutationType.is_neutral( )`

Also in the TODO list.

* Are we requiring that a multivariate normal distribution has 2+ dimensions? (We don't seem to explicitly check this anywhere...)

No? I don't see why we would?

* [Line 351](https://github.com/popsim-consortium/stdpopsim/pull/1792/changes#diff-3e1c6050c108f23af4908d64fb80d312896c6fd1caecfc9145a938908697c536R351) has an incomplete docstring (but I think we actually have decided what it's supposed to be?)

Maybe? See the TODO above pointing out that we parameterize the MVN with variance and the standard Gaussian with the SD; maybe this is fine, but it seems awkward.

* `test_traits.py` requires function arguments are provided as lists, but that seems rather strict -- maybe we should just check that things can be cast to a numpy array?

Agree; this is in the TODO list.

petrelharp · 2026-03-18T04:17:30Z

I'm going to merge this so it doesn't get in the way of others, but still TODO for me is to make issues for the things above.

petrelharp mentioned this pull request Feb 8, 2026

Minor edits to traits petrelharp/stdpopsim#4

Closed

petrelharp marked this pull request as ready for review February 17, 2026 23:13

roshnipatel reviewed Mar 2, 2026

View reviewed changes

Python scaffolding for traits

67c7fc3

petrelharp force-pushed the traits branch from cf39a40 to 67c7fc3 Compare March 18, 2026 15:55

petrelharp merged commit c96504f into popsim-consortium:main Mar 18, 2026
11 checks passed

petrelharp mentioned this pull request Mar 18, 2026

Mechanism for adding DMEs to a contig #1822

Open

Conversation

petrelharp commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TraitsModel:

FitnessFunction

Environment

Trait

DistributionOfMutationEffects

MutationType

List of standard use cases:

Uh oh!

gregorgorjanc commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stabilizing selection

Directional selection

Truncation selection

Truncation selection (on a binary trait)

Direct effects on fitness (previous behavior)

Neutral

Uh oh!

petrelharp commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petrelharp commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petrelharp commented Jan 23, 2026

Uh oh!

petrelharp commented Jan 23, 2026

Uh oh!

petrelharp commented Jan 25, 2026

Uh oh!

petrelharp commented Jan 25, 2026

Uh oh!

codecov bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

petrelharp commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petrelharp commented Feb 8, 2026

Uh oh!

petrelharp commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roshnipatel commented Feb 12, 2026

Uh oh!

petrelharp commented Feb 13, 2026

Uh oh!

gregorgorjanc commented Feb 13, 2026

Uh oh!

gregorgorjanc commented Feb 13, 2026

Uh oh!

petrelharp commented Feb 14, 2026

Uh oh!

roshnipatel left a comment • edited by petrelharp Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petrelharp commented Mar 3, 2026

Uh oh!

petrelharp commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

petrelharp commented Jan 21, 2026 •

edited

Loading

`TraitsModel`:

`FitnessFunction`

`Environment`

`DistributionOfMutationEffects`

`MutationType`

gregorgorjanc commented Jan 21, 2026 •

edited

Loading

petrelharp commented Jan 22, 2026 •

edited

Loading

petrelharp commented Jan 23, 2026 •

edited

Loading

codecov bot commented Jan 25, 2026 •

edited

Loading

petrelharp commented Jan 25, 2026 •

edited

Loading

petrelharp commented Feb 12, 2026 •

edited

Loading

roshnipatel left a comment •

edited by petrelharp

Loading