Link `ComponentJobUtilization` and `Invoice` by chrisvanrun · Pull Request #4718 · DIAGNijmegen/rse-grand-challenge

chrisvanrun · 2026-05-15T14:30:32Z

Part of pitch:

https://github.com/DIAGNijmegen/rse-roadmap/issues/475#issuecomment-4175366634

This changeset introduces a link between ComponentJobUtilization and a challenge's Invoice, enabling compute costs to be tracked and summed on a per-invoice basis.

Invoice Linking

Active invoices are resolved via Challenge.active_invoice, which raises an error if the challenge has insufficient funds.

EvaluationUtilization is linked at evaluation creation time (create_evaluation), which is triggered by:

Creating a submission
Re-evaluating via Django admin (site admin)
Re-evaluating via view (editor)

In all three cases, the user receives an appropriate error message if the challenge has insufficient funds.

JobUtilization is linked when creating jobs for an evaluation. A fresh active invoice is fetched per job — if none is found, the Job is marked FAILURE with a descriptive error message.

WarmJobUtilization is linked after Job completion.

Compute Cost Caching

update_challenge_compute_costs now sums linked utilizations on a per-invoice basis. The existing Challenge-level compute_cost_euro_millicents field is kept intact for now — removing it and its related paths will be addressed in a follow-up PR.

Management Command: `link_challenge_utilization_to_invoice`

Since existing challenges have no invoice-utilization linkage, a management command is introduced to backfill this (rather than a one-off migration), with the longer-term goal of enforcing a constraint that phase, challenge, and invoice are always set together on utilizations.

The command reports cleanly which challenges lack invoices.

amickan · 2026-05-20T10:55:21Z

+            Q(job_utilizations__invoice__isnull=True)
+            | Q(evaluation_utilizations__invoice__isnull=True)
+            | Q(job_warm_pool_utilizations__invoice__isnull=True)
+        ).distinct()


This is going to be a pretty expensive query. Unsure how bad it really is, but maybe replacing it with 3 simple indexed scans on individual tables would be better? Something like this:

challenge_ids = set() for utilization_model in CHALLENGE_UTILIZATION_MODELS: ids = ( utilization_model.objects .filter(invoice__isnull=True) values_list("challenge_id", flat=True) .distinct() ) challenge_ids.update(ids) challenges = Challenge.objects.filter(pk__in=challenge_ids) for challenge in challenges:

To make this more efficient, you could pre-compute the challenge to invoice mapping:

invoice_by_challenge = { inv.challenge_id: inv for inv in Invoice.objects.order_by("challenge_id", "expires_on", "created") }

That saves you from repeating the same invoice lookup 300+ times.

I'm also still wondering if looping over utilizations would be better. It would drastically reduce the number of DB queries to execute, but the downside would be that it would require batching /to work, which is less easy to read/follow.

But actually, if you precompute the challenge-to-invoice mapping, you don't need your expensive initial query. You can just loop over all challenges regardless. So I think that is the most efficient way of going about this after all.

for challenge in Challenge.objects.all(): invoice = invoice_by_challenge.get(challenge.pk) if not invoice: missing_invoice_challenges.append(challenge.short_name) continue for model in CHALLENGE_UTILIZATION_MODELS: model.objects.filter( challenge=challenge, invoice__isnull=True, ).update(invoice=invoice)

Updated, and I think resolved.

I assume you have tested the command, but of course those tests will never be a true representation of the production DB, so it's hard to test the efficiency of the command.

Indeed. Tbh, I did not bother adding 100k worth of utilizations as the local dev instance does not match the production in terms of performance. Rather I focused on variants of having invoices and missing invoices.

I think we can reduce the subsequent call number of round trips to the DB if we get a cheap set of challenge_ids that have empty invoice FKs in the 3 utilization tables. The 1248 round trips on every call seems like a lot to me.

I think we can reduce the subsequent call number of round trips to the DB if we get a cheap set of challenge_ids that have empty invoice FKs in the 3 utilization tables. The 1248 round trips on every call seems like a lot to me.

I don't think we can do a 'quick' scan to gather all challenge_ids. The first time around it will be near full table scan =/ per utilization table.

I don't think we can do a 'quick' scan to gather all challenge_ids. The first time around it will be near full table scan =/ per utilization table.

Exactly, I don't think that is possible.

I think 1248 DB round trips is not too bad, as long as each call is manageable. I believe it is ok given that the filter can use indexes, but I don't know for sure.

It might be nice to print out how many rows the update call touched. Then we know that it did something or (on subsequent calls of the command) that it didn't.

Smart. Done.

I discussed the migration approach synchronously with James. Looping over utilizations is the better approach here since the filter on invoice__isnull=True is very cheap and allows us to skip already processed rows on subsequent runs. When looping over challenges, we filter utilizations by invoice and challenge. Even though both fields are indexed, they are not indexed together (there is no composite index for these two fields), and so the filter is much less efficient. That combined with the fact that we'd redo these queries unnecessarily on subsequent runs, makes this approach less efficient.

amickan · 2026-05-20T12:02:29Z

I still have some comments about the management call, but otherwise this looks good to me now. Would still be good if @koopmant could take a look as well.

koopmant · 2026-05-22T14:16:48Z

Would still be good if @koopmant could take a look as well.

Still waiting for @koopmant 's 2 cents though. Good to have someone else look at this with a fresh perspective.

Sorry, I thought I would be able to look at this sooner, but kept getting stuck in the weeds in my own pitch. I can look at this next week.

koopmant

It would have been better if this had been split up into separate PR's. The three headings in the description nicely describe three possible PR's. There is a lot going on here and it was a bit much for me.

Some thoughts:
Do we have to assign the invoice on creation? What if we set the invoice when we set the duration on the utilization; if there is no available budget at that time, we set it at the last invoice. (I don't assume we can quickly update the budget for an invoice at that time?) It might still be a small improvement over assigning early?
If we do assign it on creation, I think adding it as an argument for create() would be nicer, see this next comment.

the intent here is to create a Job and mark it as a failure if no invoice can be found

Maybe this is discussed on the pitch, but I don't fully understand why this is necessary? I.e., why not let job creation fail and update the evaluation with the error message? I like the implementation of getting the active invoice or an error if there are no funds. By adding invoice it to the Job create call, we'd prevent being able to create jobs when there are no funds. Now you have to check afterwards and not forget it... Once a job is created we will not check the funds again; will we?

Regarding management commands in general. When do we use a command and when a data migration? Data migrations can be run multiple times as well, can't they?

koopmant · 2026-05-26T13:59:24Z

            )
+            jobs.append(
+                job
+            )  # Keep track of created jobs, even if utilization setup has failed


Suggested change

) # Keep track of created jobs, even if utilization setup has failed

)

The comment might be confusing. I started rewriting it until I realized this comment is not really necessary at all, the code explains itself.

This code expects that the returned jobs are the ones that have been scheduled, so the utilisation set up must not fail.

amickan · 2026-05-27T08:37:56Z

+                        status=Job.FAILURE,
+                        error_message="Job cannot be executed. The challenge has insufficient budget to run this job.",
+                    )
+                    break


With James' comment here, this will need to be changed. The invoice will need to be passed along as a kwarg to this function.

amickan · 2026-05-27T08:44:06Z

Do we have to assign the invoice on creation? What if we set the invoice when we set the duration on the utilization; if there is no available budget at that time, we set it at the last invoice. (I don't assume we can quickly update the budget for an invoice at that time?) It might still be a small improvement over assigning early?

Setting it after the fact would potentially allow for more unbudgeted submissions (e.g. on deadline day, parallel submissions), right?

amickan · 2026-05-27T08:45:58Z

Regarding management commands in general. When do we use a command and when a data migration? Data migrations can be run multiple times as well, can't they?

A data migration would only run once. They are not meant to be run multiple times.

jmsmkn · 2026-05-27T13:30:53Z

 def update_compute_cost_euro_millicents(
    *,
    obj,
+    obj_field,


Why is this new kwarg necessary?

I see now. The construction indicates a design problem. Please rename compute_costs_utilized_euros_millicents to compute_cost_euro_millicents on invoices.

This is for several reasons:

This is the name suggested by the method being used here: update_compute_cost_euro_millicents

Internal consistency: attributes that represent the same thing should be called the same thing

Historical consistency: we already have an existing name for this property

Simplified implementation: reduced complexity in passing and setting these attributes, no need for extra kwargs or setattr

Reduced cognitive load: developers do not need to thing about what objects they are dealing with when accessing this attribute

The name was chosen deliberately to contrast it with the existing Invoice.compute_costs_euros. Having these side by side is easy to confuse:

Invoice.compute_costs_euros

Invoice.compute_cost_euro_millicents

Challenge.compute_cost_euro_millicents will eventually be removed as it can be replaced by a simple query on the Invoice table.

Rather than renaming the Invoice.compute_costs_utilized_euros_millicents, would it make more sense to rename the Phase.compute_cost_euro_millicents to align with it (i.e. Phase.compute_costs_utilized_euros_millicents)?

That cannot be done without downtime and/or a data migration, so rename the new one.

koopmant · 2026-05-28T07:31:17Z

Do we have to assign the invoice on creation? What if we set the invoice when we set the duration on the utilization; if there is no available budget at that time, we set it at the last invoice. (I don't assume we can quickly update the budget for an invoice at that time?) It might still be a small improvement over assigning early?

Setting it after the fact would potentially allow for more unbudgeted submissions (e.g. on deadline day, parallel submissions), right?

Not really, because just linking the utilization does not change the budget. Only once the utilization has a value for what's been utilized (i.e. when we set the duration) and only after the invoice has been updated will the budget change.

chrisvanrun · 2026-05-28T10:52:33Z

It would have been better if this had been split up into separate PR's. The three headings in the description nicely describe three possible PR's. There is a lot going on here and it was a bit much for me.

Maybe a feature branch then? The coupling could have gone in a different (tiny) PR, but the rest would need to go in in one go.

Not really, because just linking the utilization does not change the budget. Only once the utilization has a value for what's been utilized (i.e. when we set the duration) and only after the invoice has been updated will the budget change.

I agree with that note. We could add an increment of the overall utilization at the time of setting the duration to the utilized invoice. That would not absolve you from doing a periodic consolidation as that is far simpler than covering all corner cases where the overall sum should be adjusted.

With the change that everything involving a single evaluation will be booked on the same invoice, it's really a lot easier to set it at the time of creation.

chrisvanrun · 2026-05-28T10:54:17Z

Quick to-do here:

Rename cache value.
Use evaluation invoice for all jobs. Don't have jobs fail on account of insufficient budget.
(optional) update management command to loop over utilizations.

chrisvanrun added 30 commits May 15, 2026 13:10

Move compute_balance calcuation to python

42bfa7e

Refactor computation for readability

55df239

Add test for multiple challenges

da01369

Return invoices

e1cff9f

Push invoice compute balance to model

dbdf0b9

Move compute_balance calcuation to python

481410b

Refactor computation for readability

a3e05c2

Add test for multiple challenges

e30357f

Add FK from utilizations to (Chalenge) invoices

4fd8c03

Add utils invoice to the admin views

a1f804c

Add management command for linking utilizations

00f2518

WIP available compute

52a9119

Fix get_invoice_for_utilization

eb4fcc8

Add utilization computation

67b7e65

Re-introduce general challenge-level costs cache

ea2ee18

Add challenge-level with compute_balance

aeded35

Add invoice argument to create_evaluation

8a0e697

Add invoice handling to reevaluate_submissions

e9cdee8

Refactor get_invoice_for_utilization

28dbd6c

Handle ripple from refactor

46bc8e5

Add invoice handling to reevaluate view

717ed43

Random refactor

616630d

Fix rebase misses

86a7c3b

Missed rebase change

3dc5607

Refactor

2d716c0

Change when utilization is created

100bcb6

Rename method into active_invoice

26dbb58

Add test for SubmissionForm invoice usage

b05492a

Make active invoice uncached

d702e8a

Assign active invoice to evaluation related jobs

7a12dbf

chrisvanrun added 7 commits May 19, 2026 13:19

Make error message clear so admins contact support

ada1dfa

Remove arguments and invoice creations

738f5e1

Revert manager creation adding of invoices

aae7895

Remove kwargs only

ea9c4f9

Correctly calculate utlization

77601c6

Remove cached property, for now

10a52df

Fix check on error message

f3c78fd

amickan reviewed May 20, 2026

View reviewed changes

koopmant self-assigned this May 20, 2026

chrisvanrun added 3 commits May 21, 2026 11:27

Remove cached-property tests (for now)

734e091

Rewrite command

2e74309

Add counter

5b63538

chrisvanrun removed their assignment May 21, 2026

chrisvanrun added 4 commits May 21, 2026 12:24

Minor improve reporting on progress

1fd0a1f

Report updated rows.

b2aa2e9

Minor reporting update

6d85474

Merge branch 'main' into link-utilization-and-challenge-invoices

bec29cf

koopmant reviewed May 26, 2026

View reviewed changes

amickan reviewed May 27, 2026

View reviewed changes

jmsmkn reviewed May 27, 2026

View reviewed changes

chrisvanrun marked this pull request as draft May 28, 2026 10:52

chrisvanrun assigned chrisvanrun and unassigned koopmant May 28, 2026

	) # Keep track of created jobs, even if utilization setup has failed
	)

Conversation

chrisvanrun commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Invoice Linking

Compute Cost Caching

Management Command: link_challenge_utilization_to_invoice

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amickan May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amickan commented May 20, 2026

Uh oh!

koopmant commented May 22, 2026

Uh oh!

koopmant left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amickan commented May 27, 2026

Uh oh!

amickan commented May 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koopmant commented May 28, 2026

Uh oh!

chrisvanrun commented May 28, 2026

Uh oh!

chrisvanrun commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chrisvanrun commented May 15, 2026 •

edited

Loading

Management Command: `link_challenge_utilization_to_invoice`

amickan May 27, 2026 •

edited

Loading