256 fix seed by marianovitasari20 · Pull Request #261 · alan-turing-institute/icenet-mp

marianovitasari20 · 2026-04-29T09:10:19Z

No description provided.

marianovitasari20 · 2026-04-29T11:08:54Z

I’ve merged the latest changes from main and accepted the main version due to conflicts. Since the function I modified in this branch was changed or deleted in main, I now need to reapply my own changes, so I’m working on it again.

github-actions · 2026-04-29T11:17:15Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
icenet_mp
model_service.py					28-32, 194-208
icenet_mp/losses
weighted_mse_loss.py					23-26, 44-48
Project Total

_{This report was generated by python-coverage-comment-action}

marianovitasari20 · 2026-04-30T08:45:54Z

It’s becoming nondeterministic again after I merge main into this branch. I’ve been investigating it since yesterday. I’ve only run it locally and haven’t tried it on Isambard yet.

jemrobinson

I'm not totally convinced that this is a thing we want to be doing.

If we believe that the real performance of a model training has quite a wide distribution of performance (i.e. a large spread of values across any particular metric) then what do we achieve if we manage to make this deterministic? If we want to compare "model configuration A" to "model configuration B" we would still need to run each of these 10-20 times with different seeds in order to convince ourselves that we understand the model uncertainty.

Would we be better working on running model ensembles instead?

jemrobinson · 2026-04-30T08:41:57Z



+class _DeterministicInterpolate:
+    """Monkey-patch F.interpolate to strip antialias=True for deterministic CUDA backward.


I'm not sure this is something we want to do. We previously needed the antialias argument to avoid artifacts when resizing.

jemrobinson · 2026-04-30T08:43:22Z

-            seed_everything(int(seed), workers=True)
+            seed = int(seed)
+            os.environ["PYTHONHASHSEED"] = str(seed)
+            os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"


Reference https://docs.nvidia.com/cuda/cublas/ and explain what this means

IFenton · 2026-04-30T09:38:37Z

If we want to compare "model configuration A" to "model configuration B" we would still need to run each of these 10-20 times with different seeds in order to convince ourselves that we understand the model uncertainty.

My understanding was that it would allow us to compare two model configurations better, e.g. if model config A always has a lower error than model config B when they are run with the same seed then we can be more confident that model config A is a sensible thing to do. You'd still need to run each pairing a number of times with different seeds to get more of an idea of spread (/do model ensembles), but it seems like it allows a more like-for-like comparison of two configurations. Is that logic wrong @jemrobinson?

Would we be better working on running model ensembles instead?
This feels like something we should be doing going forward, but as well rather instead.

marianovitasari20 · 2026-04-30T09:41:57Z

I'm not totally convinced that this is a thing we want to be doing.

If we believe that the real performance of a model training has quite a wide distribution of performance (i.e. a large spread of values across any particular metric) then what do we achieve if we manage to make this deterministic? If we want to compare "model configuration A" to "model configuration B" we would still need to run each of these 10-20 times with different seeds in order to convince ourselves that we understand the model uncertainty.

Would we be better working on running model ensembles instead?

I think these are two separate things, @jemrobinson. First, we want to fairly investigate why a particular method behaves the way it does, and for that we need a fair, controlled comparison to draw meaningful conclusions. The deterministic mode with a fixed seed is just a tool for that, it doesn't remove uncertainty from the model itself.
Regarding ensembles, we can absolutely still do that later. Adding a seed here doesn't prevent us from running ensemble experiments, the seed is just for controlled comparisons during development. Under normal settings, we wouldn't use a seed at all.

Yes, I’m aware of the effect of removing antialiasing. I thought we could still use this to compare runs in a fully deterministic mode, and we wouldn't need to use a seed under normal settings. I take your point that ensembles could also be used for uncertainty quantification, and that they can be applied under normal settings without seeds later. However, I’m still inclined to keep the deterministic approach, as it makes it easier to compare different configurations, such as methods or loss functions. The purpose is to ensure a fair comparison. We will revert to running without a seed once we have decided which methods (e.g., loss functions or approaches) is better.

jemrobinson · 2026-04-30T09:56:11Z

Hi @marianovitasari20. I guess my underlying point is this: what does comparing two different configurations with the same seed actually tell us? Each one is a single draw from an underlying distribution with large variance, so we can't simply compare e.g. MAE for the two configurations and say that one is performing better than the other.

What's the thing you want to compare that is made easier by this?

jemrobinson · 2026-04-30T09:57:49Z

My understanding was that it would allow us to compare two model configurations better, e.g. if model config A always has a lower error than model config B when they are run with the same seed then we can be more confident that model config A is a sensible thing to do. You'd still need to run each pairing a number of times with different seeds to get more of an idea of spread (/do model ensembles), but it seems like it allows a more like-for-like comparison of two configurations. Is that logic wrong @jemrobinson?

Yes, if model A always has lower error than model B across N runs, then we can conclude that it's probably performing better. However, I think this applies whether or not you use the same seed.

marianovitasari20 · 2026-04-30T10:03:11Z

My understanding was that it would allow us to compare two model configurations better, e.g. if model config A always has a lower error than model config B when they are run with the same seed then we can be more confident that model config A is a sensible thing to do. You'd still need to run each pairing a number of times with different seeds to get more of an idea of spread (/do model ensembles), but it seems like it allows a more like-for-like comparison of two configurations. Is that logic wrong @jemrobinson?

Yes, if model A always has lower error than model B across N runs, then we can conclude that it's probably performing better. However, I think this applies whether or not you use the same seed.

I understand your point, that’s why we want to run A vs. B with the same seed, and then repeat with different seeds. This still gives multiple runs, but in a fairer way.

marianovitasari20 · 2026-04-30T10:06:24Z

Just as an example, I ran these experiments using the same seed (while still in fully deterministic mode, before merging main into this branch) to compare the effect of using L1 loss vs. weighted MSE loss. I still need to run multiple experiments with different seeds before drawing conclusions later, but this is a fairer way.
#258

jemrobinson · 2026-04-30T10:19:09Z

I don't quite see why it's fairer to compare different models with the same seed rather than different seeds. They're going to be drawing a different number of random numbers and using them for different purposes, so I think you're still comparing a single sample from a distribution of trained models against a single sample from a different distribution of trained models.

marianovitasari20 · 2026-04-30T12:27:25Z

Discussed further during the coworking meeting with @IFenton and @jemrobinson today.

marianovitasari20 · 2026-04-30T15:15:02Z

Removing _DeterministicInterpolate, the team agreed not to go with this approach. Keeping a note here for reference: if anyone wants fully deterministic behaviour in future, this function shows one way to do it. It does come with a performance cost, though personally I think it's acceptable for comparison purposes.

jemrobinson · 2026-04-30T15:24:58Z

Removing _DeterministicInterpolate, the team agreed not to go with this approach. Keeping a note here for reference: if anyone wants fully deterministic behaviour in future, this function shows one way to do it. It does come with a performance cost, though personally I think it's acceptable for comparison purposes.

To be clear - performance cost isn't a problem, but removing the "antialiasing" argument means that the behaviour of the model is changed.

marianovitasari20 · 2026-04-30T15:56:34Z

Removing _DeterministicInterpolate, the team agreed not to go with this approach. Keeping a note here for reference: if anyone wants fully deterministic behaviour in future, this function shows one way to do it. It does come with a performance cost, though personally I think it's acceptable for comparison purposes.

To be clear - performance cost isn't a problem, but removing the "antialiasing" argument means that the behaviour of the model is changed.

Sorry for the confusion, by "performance cost" I meant model quality, not compute. Stripping antialias changes the interpolation behaviour which could affect results. So yes, we're agreeing on the same thing.

marianovitasari20 · 2026-04-30T16:29:18Z

I've removed it now.

antialias=True → better signal processing, less aliasing, but nondeterministic
antialias=False → slightly worse interpolation quality, but deterministic

In most ML cases, the difference in quality is negligible compared to reproducibility and stable debugging.

After more testing, I’m actually still not fully convinced we should remove this, but we can merge it for now rather than delay things further due to disagreement.

jemrobinson · 2026-04-30T16:37:52Z

Can you add some plots/metrics here for discussion?

marianovitasari20 added 5 commits April 23, 2026 14:30

🔨 Fix the error, but deterministic failed in this case

567709f

🔨 Edit model service

0ea6e3b

🔨 Fix deterministic problem in HPC

f15c65f

🗑️ Cleaning up code in the model service

8bf5a9b

🔨 Edit base isambardai

5e221b0

marianovitasari20 force-pushed the 256-fix-seed branch from dc512e0 to 5e221b0 Compare April 29, 2026 11:02

Merge branch 'main' into 256-fix-seed

b78c76b

marianovitasari20 added 6 commits April 29, 2026 12:20

🔨 Add debug logs in the model service

461dba2

🎨 Run precommit

0b3c240

🔨 Refactor DeterministicInterpolate in ModelService

819020b

🔨 Fix mypy errors

a2c899e

🔨 Fix linter errors

e86b1cd

🔨 Refactor _DeterministicInterpolate

148dd26

jemrobinson requested changes Apr 30, 2026

View reviewed changes

🐛 Fix WMSE loss

7ed491f

💦 Remove _DeterministicInterpolate, but it's no longer deterministic

2d3399e



		class _DeterministicInterpolate:
		"""Monkey-patch F.interpolate to strip antialias=True for deterministic CUDA backward.

Conversation

marianovitasari20 commented Apr 29, 2026

Uh oh!

marianovitasari20 commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

marianovitasari20 commented Apr 30, 2026

Uh oh!

jemrobinson left a comment

Choose a reason for hiding this comment

Uh oh!

jemrobinson Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

jemrobinson Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

IFenton commented Apr 30, 2026

Uh oh!

marianovitasari20 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jemrobinson commented Apr 30, 2026

Uh oh!

jemrobinson commented Apr 30, 2026

Uh oh!

marianovitasari20 commented Apr 30, 2026

Uh oh!

marianovitasari20 commented Apr 30, 2026

Uh oh!

jemrobinson commented Apr 30, 2026

Uh oh!

marianovitasari20 commented Apr 30, 2026

Uh oh!

marianovitasari20 commented Apr 30, 2026

Uh oh!

jemrobinson commented Apr 30, 2026

Uh oh!

marianovitasari20 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marianovitasari20 commented Apr 30, 2026

Uh oh!

jemrobinson commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 29, 2026 •

edited

Loading

marianovitasari20 commented Apr 30, 2026 •

edited

Loading

marianovitasari20 commented Apr 30, 2026 •

edited

Loading