Skip to content

Reproducibility and reliability of ECFP descriptors #136

@subercui

Description

@subercui

Hi, when I use the same model on the same molecule but run multiple times. I have different results. Please see the following:

Code snippets:

smiles_ = config["highlight_smiles"]
space = exmol.sample_space(smiles_, model_pred, batched=True, num_samples=1000)
exmol.lime_explain(space, descriptor_type="ECFP")
svg = exmol.plot_descriptors(space, return_svg=True)
skunk.display(svg)

Results of three runs:
image
image
image

I think this may be related to the randomness of the space, and setting a random seed somewhere can increase reproducibility? Meanwhile, I think the concern is more related to how I interpret the results? Is there a way to make it more reliable?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions