Commit 8a67cf6
Feature(Next-Gen): Patch filtering rework (#832)
## Disclaimer
- [ ] I am an AI agent.
- [ ] I have used AI and I thoroughly reviewed every line.
- [x] I have not used AI extensively.
## Description
> [!NOTE]
> **tldr**: This makes the patch filtering in the NG-Dataset more
efficient by using the `StratifiedPatchingStrategy` to set different
sampling regions in the data to have a reduced probability of being
sampled per epoch.
### Background - why do we need this PR?
The old patch filtering implementation used the `RandomPatchingStrategy`
to keep resampling patches until the patch has signal. If there is a
large amount of background in the data this is inefficient. In the new
implementation the filtering process is happens once at the start of
training and there is no need to define a "patience".
### Overview - what changed?
Quite a lot...
- The attributes `patch_filter` and `coord_filter` have been removed
from the `CAREamicsDataset`, and it is also no longer initialised with
masks. This also means all functionality for filtering within the
dataset has been removed.
- Instead, the filtering process happens in `/ng_dataset/filter_bg.py`,
there are two functions, one for filtering with a filter, one for
filtering with a mask.
- The `create_train_val_datasets` and `create_val_split_dataset` factory
functions have been refactored slightly to reduced code duplication. The
`train_dataset` is first created with the new `create_trian_dataset`
within which the data is filtered.
- The `StratifiedPatchingStrategy` has been modified to have a new
method to set the probability that a sampling region will be sampled
from during an epoch. This also necessitates that the number of patches
are reduced if the probability of any regions is reduced.
- `filtered_patch_prob` and `filter_ref_channel` has been added to the
ng data config.
- The ref channel was added for the case of multiple channels.
### Implementation - how did you implement the changes?
- The idea is a filtering function can be applied to each sampling
region in the `StratifiedPatchingStrategy`, if the region doesn't pass
some threshold the probability of it being sampled from in an epoch can
be reduced.
- The patching strategy instance only needs to store the probability of
each region which doesn't take too much memory.
- When calculating the total probabilities and the sampling bins, the
adjusted areas of the regions can be used, this is the area multiplied
by the new probability.
## Changes Made
### New features or files
- `src/careamics/dataset_ng/filter_bg.py`
- `filter_background`
- `filter_background_with_mask`
### Modified features or files
- `src/careamics/dataset_ng/factory.py`
- src/careamics/dataset_ng/patching_strategies/stratified_patching.py
### Removed features or files
- filtering from `CAREamicsDataset`
## How has this been tested?
- Added unit tests that test the number of patches are reduced in the
`StratifiedPatchingStrategy` if the method `set_region_probs` is called.
- Added functional tests for filtering background, from a few different
layers:
- The patching strategy layer
- The `filter_background` and `filter_background_with_mask` layer
- The `create_train_dataset` layer
Still missing is a test for the lightning module layer or an e2e test
with filtering.
## Related Issues
- Resolves #741
## Breaking changes
- Data config has changed, `filter_patience` has been removed
- Passing masks to the dataset no longer happens
## Additional Notes and Examples
### Still TODO:
- Clean up patch filtering configs and filtering classes, we no longer
need filtering patience or the probability.
- I am considering removing the option to have different "coord filters"
(we don't have any others) because a mask filter config is required to
pass masks to `create_train_dataset`.
---
**Please ensure your PR meets the following requirements:**
- [x] Code builds and passes tests locally, including doctests
- [x] New tests have been added (for bug fixes/features)
- [x] Pre-commit passes
- [ ] PR to the documentation exists (for bug fixes / features)
---------
Co-authored-by: Joran Deschamps <6367888+jdeschamps@users.noreply.github.com>1 parent 2770149 commit 8a67cf6
29 files changed
Lines changed: 951 additions & 437 deletions
File tree
- src/careamics
- config/data
- patch_filter
- dataset_ng
- patch_filter
- patching_strategies
- tests
- dataset_ng
- dataset
- patching_strategies
- functional/dataset_ng
- patching_strategies
- lightning/dataset_ng
- unit
- config
- data
- dataset_ng
- patching_strategies
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
337 | 337 | | |
338 | 338 | | |
339 | 339 | | |
340 | | - | |
341 | | - | |
342 | | - | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
343 | 349 | | |
344 | 350 | | |
345 | 351 | | |
| |||
1062 | 1068 | | |
1063 | 1069 | | |
1064 | 1070 | | |
| 1071 | + | |
| 1072 | + | |
1065 | 1073 | | |
1066 | 1074 | | |
1067 | 1075 | | |
| |||
Lines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | | - | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | 18 | | |
20 | 19 | | |
21 | 20 | | |
| |||
150 | 149 | | |
151 | 150 | | |
152 | 151 | | |
153 | | - | |
154 | | - | |
155 | 152 | | |
156 | 153 | | |
157 | 154 | | |
| |||
160 | 157 | | |
161 | 158 | | |
162 | 159 | | |
163 | | - | |
164 | 160 | | |
165 | 161 | | |
166 | 162 | | |
| |||
174 | 170 | | |
175 | 171 | | |
176 | 172 | | |
177 | | - | |
178 | | - | |
179 | 173 | | |
180 | 174 | | |
181 | 175 | | |
| |||
197 | 191 | | |
198 | 192 | | |
199 | 193 | | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | 194 | | |
213 | 195 | | |
214 | 196 | | |
| |||
343 | 325 | | |
344 | 326 | | |
345 | 327 | | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
379 | | - | |
380 | | - | |
381 | | - | |
382 | | - | |
383 | | - | |
384 | | - | |
385 | | - | |
386 | | - | |
387 | | - | |
388 | | - | |
389 | | - | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | | - | |
394 | 328 | | |
395 | 329 | | |
396 | 330 | | |
| |||
406 | 340 | | |
407 | 341 | | |
408 | 342 | | |
409 | | - | |
| 343 | + | |
| 344 | + | |
410 | 345 | | |
411 | 346 | | |
412 | 347 | | |
| |||
0 commit comments