Skip to content

How do we count "times seen" with non-identical encodings and parameter transformation? #162

@jgallowa07

Description

@jgallowa07

Adding this as a placeholder - but it occurs to me that in the final push we lost track of a discussion @Haddox and I were having about the correct way to calculate the "times seen" column of the mutations df. Note that this is only relevant to multi-condition training sets which include non-identical protein wildtype sequences.

background

Right now, we simply sum the columns of the transformed binary matrix in order to get the times seen such that times seen is essentially the number of times the model sees a "1" for a given mutation. As discussed with, this may not be the correct way to do things and we should re-think how this parameter is calculated.

How the binarymaps are encoded across non-identical proteins for joint modeling

To describe how we encode the variants into binarymaps, let's consider the example in the unit tests.

TODO Finish description, and add discussion between Hugh and I (that currently exists mainly on slack)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions