Skip to content

ANi2x subset #276

@chrisiacovella

Description

@chrisiacovella

@wiederm Mentioned interested in having a smaller ani2x dataset (larger than our testing set) for training examination.
@jchodera suggested limiting to molecules with C, H, O, which I think is good. This would allow us to more directly compare with PhAlkEthOH.

PhAlkEthOH has 12,271 unique molecules, ANI2x has 16,514 unique molecules. I'm not sure how many molecules are in ANI2x with only C, H, O, but if this number is less than PhAlkEthOH, we can create a smaller subset of it to match.

It might be interesting to see the overlap of these datasets. The ANI2x dataset does not contain the smiles strings for the molecules, but probably could do some other relevant comparisons. I think something as simple as looking at the overlap of molecular weight (since we are limited to CHO) would probably be good. Could also just do this as two plots, one for molecules with O, one for molecules without O.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions