Skip to content

Non-deterministic model parameters across training runs #2882

@leostimpfle

Description

@leostimpfle

What happens?

Training the same model on the same data (using the DuckDB backend) results in slightly different model parameters (m/u probabilities) in each run. The differences are very small (<1e-10) so likely just some floating point issue.

Although the differences are small, it makes development of linkage models a bit more cumbersome because the parameters will always change slightly and it's not immediately clear where differences between runs come from.

I'm currently often resorting to rounding the model parameters in splink.Linker._settings_obj after training (to 10 decimals which seems to work reliably) but this feels like a hack.

To Reproduce

reproduce_em_nondeterminism.py

OS:

MacOS

Splink version:

4.0.12

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions