Fix group-wise evaluation #12

naga-karthik · 2024-06-10T16:00:51Z

currently, the wrapper script compute_metrics_reloaded.py averages the results over all cases (i.e. when prediction and GT are empty, prediction empty GT not empty, etc. etc.).

When pred and GT are both empty, the DSC is set to 1 automatically (which is correct as the model has rightly learned to not output a false positive). BUT, a lot of these outputs, skews the DSC in such a way that we don't how the model performs in case where there is lesion (i.e. does it predict the whole lesion, does it predict only partially, etc.)

SO, for this, we want to separate the evalutation of results into two cases: (1) when GT is not Empty (and then average the resutls), (2) when the GT is empty, maybe compute the False Postive Rate.

credit to Julian McGinnis who started this discussion!

…are empty

naga-karthik added 10 commits June 10, 2024 11:49

parallelize metrics computation across subjects using

c37a128

add function to find sub, ses, chunk

edc4e9c

add condition depending on chunks/stitched masks

9f681ee

minor modifications

6645b34

fix duplication of for loop by Julian

e9793c0

do not compute the average metrics for subjects where preds and refs …

0939a62

…are empty

add tqdm progress bar; remove print statements

32c2939

fix inputs in case where ref and pred are individual files

29545a7

convert multi-index columns to flat index

26e3629

fix bugs in inputs for stitched images

afd94d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix group-wise evaluation #12

Fix group-wise evaluation #12

Uh oh!

naga-karthik commented Jun 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix group-wise evaluation #12

Are you sure you want to change the base?

Fix group-wise evaluation #12

Uh oh!

Conversation

naga-karthik commented Jun 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant