-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Firstly, thank you for your incredible work in the creation of PanopTILs (and NuCLS/BCSS). I really appreciate the effort you've put into capturing these datasets and making them publicly available.
Following investigation of the provided images in the PanopTILs dataset, I believe that both the manual and bootstrapped region images (and hence annotations) are not captured at 0.25MPP (as indicated on the website and in the filenames), and are closer to 0.3MPP.
To illustrate this, take the WSI: TCGA-A1-A0SP-DX1.
For one of the "Manual Labels" regions: TCGA-A1-A0SP-DX1_left-7582_top-55032_bottom-56251_right-8801
The left/top/bottom/right coordinates correspond to the pixel location in the original WSI (which I've verified by loading the original SVS into QuPath). The MPP of this WSI is 0.2521.
Hence, both the width and height of this region are 1219px (8801-2582, 56251-5503) at 0.2521MPP.
Converting this to
Given the images in the dataset are captured at a fixed size (1024px), the MPP of the image is actually
If the images were truly captured at 0.25MPP, then we would expect their dimensions to be
Similarly for the "Bootstrap Labels" (where the naming convention is a little different: TCGA-A1-A0SP-DX1_xmin6798_ymin53719_MPP-0.2500_xmin-0_ymin-0_xmax-1024_ymax-1024)
I've confirmed (in QuPath), that xmin and ymin correspond to the pixel location in the original WSI (hence, captured at the WSI MPP).
The filename suggests the region has a height and width of 1024px (matching the size of the provided 'rgbs') at 0.25MPP. Converting this to the original WSI MPP (0.2521), we would expect the height and width to be:
Looking at this in QuPath (with [xmin, ymin, xmax, ymax] WSI coordinates of: [6798, 53719, 7822, 54743]), you can see that the region captured (inner square) contains much less area than what is actually provided in the
To investigate this further, I've estimated where the actual region captured is (outer square), and noted it's pixel coordinates (at the original WSI MPP) as [6798, 53719, 8022, 54943].
This gives that area a width and height of 1224px (8022-6798, 54943-54743) at 0.2521MPP. Equivalent to
Given the width/height (in
I've also redone this analysis on a separate slide to check the issue isn't isolated (TCGA-A2-A0SX-DX1). QuPath says the MPP is 0.2480.
Manual region: TCGA-A2-A0SX-DX1_left-53824_top-56708_bottom-57947_right-55063
Width x height = 1239px x 1239px (@ 0.2480MPP). Equivalent to: 307.272
PanopTILs region: 1024px x 1024px. Physical area captured: 307.272
If MPP was 0.25MPP, then PanopTILs region should be: 1229px x 1229px
Bootstrap region: TCGA-A2-A0SX-DX1_xmin53791_ymin56683_MPP-0.2500_xmin-0_ymin-0_xmax-1024_ymax-1024
Assuming largest height/width (i.e. 1024px specified at 0.25MPP, so converted to 0.248MPP = 1032px), we get the inner rectangle (below). This does not match what is captured in the dataset (above).
Manually estimating the actual location on the slide (larger box), gives xmax/ymax coordinates of: (55030, 57922)
This box has width x height of 1239 x 1239 (@ 0.2480MPP). Equivalent to: 307.272
The equation then becomes exactly as above:
PanopTILs region: 1024px x 1024px. Physical area captured: 307.272
If MPP was 0.25MPP, then PanopTILs region should be: 1229px x 1229px
Do you find a similar discrepancy in agreement with my findings? My suspicion is that something may have occurred during the region selection/preprocessing stage. If this finding is correct, then the main implication is that users training models assuming that the data is captured at 0.25MPP are actually training models expecting data closer to 0.3MPP (with potential implications when applying their model to data scaled to 0.25MPP).