Skip to content

Traits#1792

Merged
petrelharp merged 1 commit intopopsim-consortium:mainfrom
petrelharp:traits
Mar 18, 2026
Merged

Traits#1792
petrelharp merged 1 commit intopopsim-consortium:mainfrom
petrelharp:traits

Conversation

@petrelharp
Copy link
Copy Markdown
Contributor

@petrelharp petrelharp commented Jan 21, 2026

This started with the code from @jeffspence's master branch that he and @roshnipatel got together.

Proposed rough low-level API:

# set up traits
traits = stdpopsim.TraitsModel(
    [(name, link_function), (name, link_function), ... ],
    default_environmental_sigma = 0,  # if no environments are added, they are all Gaussian with this sigma
    default_fitness_sigma = Inf, # similarly, default Gaussian
)
traits.add_environment( ... ) # optionally
traits.add_fitness_function(...) # optionally
# maybe add more of each

# set up "distribution of mutational effects" which is like DFE:
esd = stdpopsim.traits.DistributionOfMutationEffects(  # maybe? they will be (mostly?) generic
   mutation_types = [ some mutation types ],
   proportions = [ should add to 1 ],
   ]
)
contig.add_dme(esd, ...) # like add_dfe

# use them
engine.simulate(contig, traits)

Note: we could alternatively instantiate Environment and FitnessFunctions to pass into the Trait constructor,
but these have to refer to "which traits do I work with", so it's cleaner to have the traits set up
before we make them.

Notes:

  • terminology: "trait" is the concept; "phenotype" is the actual measured value (or maybe "trait value"?)

TraitsModel:

Encompasses all the stuff: how they are constructed, how they change with time. (Thus, "model".)

Has

  • a list of Traits, each of which has a link function
  • a list of Environments (each of which can affect a collection of the traits)
  • a list of FitnessFunctions (each of which operate on a collection of the traits)

FitnessFunction

Has:

  • an id (important for debugging)
  • trait type: quantitative, binary, fitness, or neutral
    • this is just basically "which fitness function"
  • when and where this applies

Environment

This affects how the genetic value maps to trait phenotypic values,
including the part that changes with time/space.
Environments can affect more than one trait (so you have correlated noise).
Possibly not public?

Adds "noise" to the genetic value. Has:

  • and id (important for debugging)
  • distribution and parameters
  • when and where this applies

Trait

This describes how to produce the observed biological trait value (ie, the phenotype):
for instance, describes how disease liability translates to disease incidence,
so it should not change with time or location.
Link functions map only a single trait, independently of others.

GeneticValueTransform? PhenotypeTransform? Well, this applies to "genetic value plus environment" so it's not a "genetic value transform"? Something else?

This transforms the underlying (genetic value plus environemtn) to the "observed trait". Has:

  • an id
  • whether it's multiplicative or additive
  • a function
  • maybe some randomness (but NOT what could be put in with environment)

Examples:

  • identity
  • threshold
  • liability: trait equals 1 with probability logit(value)

DistributionOfMutationEffects

Maps mutations to traits, so has

  • a list of MutationTypes
  • and corresponding "what proportion of mutations"

MutationType

A (biologically motivated, hopefully) class of mutations. As before, but now there's more than one trait, so also needs to know:

  • which traits this affects (defaults to "fitness")

List of standard use cases:

See below

We should make it very easy to produce one of each of these,
something like stdpopsim.traits.StabilizingSelectionOnTrait(n, sigma)

Miscellaneous TODOs that may require getting into some weeds, and should turn into issues when we merge this:

  • a special trait is named "fitness" currently; deal with this fact or change it
  • figure out about convert_to_substitution edit: I don't think we need an issue for this
  • switch dominance over to being "truly additive" and explain Change additive models to independent dominance #1808
  • refactor the "check that the arguments and stuff related to this distribution or functional form are right" code (eg, determine if code should be shared between the FitnessFunction and the MutationType paths) (_check_distribution and related code)
  • do scaling (with Q) only for fitness-affecting mutation effects disallow rescaling for traits #1818
  • move code that changes lognormal and uniform to Eidos to slim_engine.py move code that changes lognormal and uniform to Eidos to slim_engine.py #1819
  • ideally we'd not need to define a new mvn distribution type since we already have Gaussian; but the existing distribution_type="n" uses a (mean, sd) parameterization, while the mvn type uses (mean, (co)variance); maybe this is just how things has to be but this needs to be super clear if so
  • do we need to keep the stdpopsim.dfe module? edit: probably not but let's leave it for backwards compatibility
  • revisit the contig.is_neutral() method; this previously just looked at MutationTypes but now needs to include the TraitsModel. revsiit contig.is_neutral() method #1820
  • make other distributions multivariate (by scale transformation of normal?) implement more multivariate distribution of trait effects #1821

@gregorgorjanc
Copy link
Copy Markdown
Contributor

gregorgorjanc commented Jan 21, 2026

WIP ...

Some thoughts on the lower part of #1792 (comment), as in these are some standard use cases and an attempt to produce each of these with ease.

While thinking about these I struck me that the TraitsModel specifies properties of traits, while these "use cases" below are talking about how selection operates on these, which is another "thing". Do we need to think how to encode Selection? When thinking about Fitness this comes up naturally, but we might also want to do selection on non-fitness traits. Will the Fitness component of the TraitsModel capture all of these different types of selection on all the traits that the user would like to select on?

Stabilizing selection

stdpopsim.traits.StabilizingSelection(trait, sigma)
# trait: trait name or index
# sigma: scale of the Gaussian fitness function
* this will call fitness function
* what else?

Directional selection

stdpopsim.traits.DirectionalSelection(trait, ???, direction = "high", )
# trait: trait name or index
# ??? TODO: Peter mentioned exp(x) type thing so we increase probability of selection for individuals with high/low trait - do we need a parameter beta to do smth like exp(\beta x) 
# direction: "high" or "low"

TODO: Discuss how to handle multiple traits here.
We could add a linear combination argument (could also be non-linear function!?),
possibly trait name/index could take in a function too?
This function would combine multiple traits into a single score, then do the exp(x)

Truncation selection

stdpopsim.traits.TruncationSelection(trait, proportion_selected, direction = "high")
# trait: trait name or index
# propotion_selected: proportion of individuals selected each generation [0, 1]
# direction: "high" or "low"

TODO: Discuss how to handle multiple traits here - the above assumes a single trait.
We can not have proportion_selected across multiple traits.
We could add a linear combination argument (could also be non-linear function!?),
possibly trait name/index could take in a function too?
This function would combine multiple traits into a single score,
then we take the top/bottom prop of individuals based on the score.

Truncation selection (on a binary trait)

The same as directional selection above, but applied to 0/1 scores?
Say, we have proportion_selected and if Pr(1) is lower than that, we select all 1s,
and then fill the rest with 0s picked at random?

Direct effects on fitness (previous behavior)

TODO: This is just a special case of directional selection on a trait,
but the trait is fitness. Where is it stored? How to refer to it?

Neutral

Do nothing?

@petrelharp
Copy link
Copy Markdown
Contributor Author

petrelharp commented Jan 22, 2026

Outline:
board1
board2
Note: unlike what's shown in this picture, DistributionOfMutationEffects is not an attribute of TraitsModel; that gets added to a contig instead.

@petrelharp
Copy link
Copy Markdown
Contributor Author

petrelharp commented Jan 23, 2026

Okay - by my reading, MutationType and MultivariateMutationType are (under the current proposal) identical, except that MultivariateMutationType has an argument phenotype_ids (ie, which phenotypes does it affect). That means we can just use MutationType and keep backwards compatibility if we make the default be phenotype_ids = "fitness". How's that sound? For reference, the MutationType properties:
Screenshot From 2026-01-23 09-44-04

@petrelharp
Copy link
Copy Markdown
Contributor Author

Okay, so we might also decide that DFE and DME (or whatever we are calling it) are the same thing, but I'm voting not to do this, because:

  • the name "DFE" doesn't describe what we want to use DME for
  • and DFE has a few things we don't think we'll have in DME (citation, maybe QC)

and so this points towards DFE being a sub-class of DME.

For reference, DFE:
Screenshot From 2026-01-23 09-46-20

@petrelharp
Copy link
Copy Markdown
Contributor Author

Having consulted with @roshnipatel, I've gone ahead and done those last two things (we can un-do them, however).

And, I've moved everything to traits.py; for backwards compatibility dfe.py is still there, but just importing things from traits.py. (I think however we can just delete it...)

@petrelharp
Copy link
Copy Markdown
Contributor Author

Discussing with @gregorgorjanc today, we realized that in practice the "threshold" fitness function should not take a fixed value, it should take a fixed quantile (e.g., "fitness 1 to the top 20% of the population; fitness 0 to the rest").

@petrelharp
Copy link
Copy Markdown
Contributor Author

And consulting with @roshnipatel we realized that basically everything needs IDs, so that when we put together a model we can properly debug it (ie check things went where we thought they should).

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.82%. Comparing base (e23d255) to head (67c7fc3).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1792    +/-   ##
========================================
  Coverage   99.81%   99.82%            
========================================
  Files         142      143     +1     
  Lines        4873     5012   +139     
  Branches      472      513    +41     
========================================
+ Hits         4864     5003   +139     
  Misses          6        6            
  Partials        3        3            

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@petrelharp
Copy link
Copy Markdown
Contributor Author

petrelharp commented Jan 25, 2026

I'm thinking that the plan here should be:

  1. sketch out the higher-level API (like "give me two traits under stabilizing selection");
  2. make sure the conceptual framework here works good for those;
  3. get the tests to pass;
  4. merge this and open a bunch of issues for future work.

Here's brainstorming of those issues, besides the checklist in the original post here:

So, this might be close, but we need (a) some eyes on it, and (b) to sketch out that higher-level API, to make sure?

@petrelharp
Copy link
Copy Markdown
Contributor Author

I've added @gregorgorjanc's edits from petrelharp#4 and realized I forgot to commit test_traits.py; now committed.

@petrelharp
Copy link
Copy Markdown
Contributor Author

petrelharp commented Feb 12, 2026

Okay, consulting with @roshnipatel, here's a proposal for the high-level API:

tm = stdpopsim.StabilizingSelection(...) # returns a Traits Model
dme = stdpopsim.dme_for_traits_model(tm, "gaussian", ...)  # returns a DME that fits with `tm`
contig.add_dme(dme, ...)
engine.simulate(model, contig, tm, samples, ...)

(dme_for_traits_model may not be the best name.)

Rationale:

  • we need the DME to match the TraitsModel
  • however, each TraitsModel doesn't have, like, a specific DME that best fits with it: like maybe there's only one or two natural DMEs, so it makes sense to have this choice orthogonal.

If this sounds good, more TODOs:

  • write dme_for_traits_model
  • enumerate the standard DMEs we want to provide

@roshnipatel
Copy link
Copy Markdown

seems like a starter list of very-basic use cases might be something like:

  • StabilizingSelection(num_traits, optima, sigma) - where maybe there are defaults for optima and sigma
  • DirectionalSelection(num_traits, ...) - @gregorgorjanc maybe you can advise here?
  • TruncationSelection(num_traits, proportion)
  • Neutral(num_traits)
  • StabilizingSelectionOptimaShift(num_traits, optima=[old, new], sigma, time, pops)
  • EnvironmentMeanShift(num_traits, mean=[old, new], sigma, time, pops) - seems like this should return an Environment, and not a TraitsModel? Another way to formulate this could be to let users provide a dict {"pop1": (env_params)} describing environmental params for each pop in the demographic model

@petrelharp
Copy link
Copy Markdown
Contributor Author

Just to be clear here, this is the same list that @gregorgorjanc started above, right? So, this is a proposal for what the methods and arguments are?

@gregorgorjanc
Copy link
Copy Markdown
Contributor

  • DirectionalSelection(num_traits, ...) - @gregorgorjanc maybe you can advise here?

@roshnipatel: after talking with @petrelharp, I think we should separate two mechanisms that are often conflated in breeding contexts:

  • Truncation selection: pick the top (or bottom) proportion of individuals, then generate offspring from that selected set (with a specified mating scheme).

  • Directional selection: assign fitness directly as a monotonic function of trait value (no explicit truncation step), so higher-trait (or lower-trait) individuals contribute more offspring.

We could parameterise directional selection like w(z) = a + b z or w(z) = exp(a + b z) (Malthusian-style).

So maybe API-wise:

DirectionalSelection(num_traits, a, b, fitness_model="linear"|"exponential", direction="top"|"bottom")

@petrelharp if you recall any canonical references for this formulation, that would help us pick default forms, language etc.

Thoughts?

@gregorgorjanc
Copy link
Copy Markdown
Contributor

I guess we should also add direction to TruncationSelection() too!?

@petrelharp
Copy link
Copy Markdown
Contributor Author

I'd have to go look at the papers that are out there using "directional selection", but my memory is that "linear" is most usual? Note that "linear with a multiplicative trait" and "exponential with an attidive trait" are the same thing, and "linear with an additive trait" runs the risk of negative fitnesses, so my vote is just to have one of these.

@petrelharp petrelharp marked this pull request as ready for review February 17, 2026 23:13
Copy link
Copy Markdown

@roshnipatel roshnipatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffspence and I did a code review together and had some comments:

  • There's some ambiguity around whether multivariate fixed effects are allowed in MutationType -- given lines 740-744, it seems they ought to be, and we should provide that functionality/make it explicit in the docstring
  • _check_args_list is pretty inflexible and we might eventually want to provide distribution params that do not match the total number of dimensions (e.g. weights for a mixture of Gaussians) -- I think we ought to remove this function and outsource args-checking to the relevant functions (i.e. _check_distribution and the FitnessFunction initialization)
    • Also, the dim argument is not being used at the moment
  • A longer conversation for later: MutationType allows sampling from a distribution that could give you negative fitness (which might have been less of an issue when DFEs were largely catalog-provided, but feels like something to address in our new era of user-generated distributions)

Minor notes:

  • We ought to check for nonempty trait/environment/fitness IDs
  • Re: line 469 I think I agree this should happen in slim_engine.py
  • We might want to expand the list of neutral DFEs in MutationType.is_neutral( )
  • Are we requiring that a multivariate normal distribution has 2+ dimensions? (We don't seem to explicitly check this anywhere...)
  • Line 351 has an incomplete docstring (but I think we actually have decided what it's supposed to be?)
  • test_traits.py requires function arguments are provided as lists, but that seems rather strict -- maybe we should just check that things can be cast to a numpy array?

@petrelharp
Copy link
Copy Markdown
Contributor Author

@jeffspence and I did a code review together and had some comments:

Thanks!

* There's some ambiguity around whether multivariate fixed effects are allowed in `MutationType` -- given [lines 740-744](https://github.com/popsim-consortium/stdpopsim/pull/1792/changes#diff-3e1c6050c108f23af4908d64fb80d312896c6fd1caecfc9145a938908697c536R740-R744), it seems they ought to be, and we should provide that functionality/make it explicit in the docstring

Good point; I adjusted the docstring. However, I wasn't thinking of this as final, as a TODO is to look at making other types multivariate, and resolving how the gaussians are parameterized (see lists above).

* `_check_args_list` is pretty inflexible and we might eventually want to provide distribution params that do not match the total number of dimensions (e.g. weights for a mixture of Gaussians) -- I think we ought to remove this function and outsource args-checking to the relevant functions (i.e. `_check_distribution` and the `FitnessFunction` initialization)

Agree! tidying up the "checking" code is a TODO.

  * Also, the `dim` argument is not being used at the moment

Good catch. I'm leaving that as part of the "tidy the checking code" TODO.

* A longer conversation for later: `MutationType` allows sampling from a distribution that could give you negative fitness (which might have been less of an issue when DFEs were largely catalog-provided, but feels like something to address in our new era of user-generated distributions)

Ah yep. Will add to the TODO list.

Minor notes:

* We ought to check for nonempty trait/environment/fitness IDs

Ah, sure. Done.

* Re: [line 469](https://github.com/popsim-consortium/stdpopsim/pull/1792/changes#diff-3e1c6050c108f23af4908d64fb80d312896c6fd1caecfc9145a938908697c536R469) I think I agree this should happen in `slim_engine.py`

Great, that's in the TODO list.

* We might want to expand the list of neutral DFEs in `MutationType.is_neutral( )`

Also in the TODO list.

* Are we requiring that a multivariate normal distribution has 2+ dimensions? (We don't seem to explicitly check this anywhere...)

No? I don't see why we would?

* [Line 351](https://github.com/popsim-consortium/stdpopsim/pull/1792/changes#diff-3e1c6050c108f23af4908d64fb80d312896c6fd1caecfc9145a938908697c536R351) has an incomplete docstring (but I think we actually have decided what it's supposed to be?)

Maybe? See the TODO above pointing out that we parameterize the MVN with variance and the standard Gaussian with the SD; maybe this is fine, but it seems awkward.

* `test_traits.py` requires function arguments are provided as lists, but that seems rather strict -- maybe we should just check that things can be cast to a numpy array?

Agree; this is in the TODO list.

@petrelharp
Copy link
Copy Markdown
Contributor Author

I'm going to merge this so it doesn't get in the way of others, but still TODO for me is to make issues for the things above.

@petrelharp petrelharp merged commit c96504f into popsim-consortium:main Mar 18, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants