Skip to content

PanopTILs not captured at 0.25MPP #18

@Mward94

Description

@Mward94

Firstly, thank you for your incredible work in the creation of PanopTILs (and NuCLS/BCSS). I really appreciate the effort you've put into capturing these datasets and making them publicly available.

Following investigation of the provided images in the PanopTILs dataset, I believe that both the manual and bootstrapped region images (and hence annotations) are not captured at 0.25MPP (as indicated on the website and in the filenames), and are closer to 0.3MPP.

To illustrate this, take the WSI: TCGA-A1-A0SP-DX1.

For one of the "Manual Labels" regions: TCGA-A1-A0SP-DX1_left-7582_top-55032_bottom-56251_right-8801

The left/top/bottom/right coordinates correspond to the pixel location in the original WSI (which I've verified by loading the original SVS into QuPath). The MPP of this WSI is 0.2521.

Hence, both the width and height of this region are 1219px (8801-2582, 56251-5503) at 0.2521MPP.
Converting this to $\mu m$, we would expect the region to be $\approx307.31\mu m$ ($1219 \times 0.2521$)

Given the images in the dataset are captured at a fixed size (1024px), the MPP of the image is actually $\approx 0.3$ MPP $\left(\frac{307.31}{1024}\right)$.

If the images were truly captured at 0.25MPP, then we would expect their dimensions to be $\approx 1229 \times 1229$ px $\left(\frac{307.31}{0.25}\right)$.

Similarly for the "Bootstrap Labels" (where the naming convention is a little different: TCGA-A1-A0SP-DX1_xmin6798_ymin53719_MPP-0.2500_xmin-0_ymin-0_xmax-1024_ymax-1024)

Image

I've confirmed (in QuPath), that xmin and ymin correspond to the pixel location in the original WSI (hence, captured at the WSI MPP).

The filename suggests the region has a height and width of 1024px (matching the size of the provided 'rgbs') at 0.25MPP. Converting this to the original WSI MPP (0.2521), we would expect the height and width to be: $\approx 1015$ px $\left( 1024 \times \frac{0.25}{0.2521}\right)$, which would give WSI-relative xmax/ymax coordinates of (7813, 54734). Alternatively, assuming the height and width are at the WSI MPP (0.2521), would give WSI-relative xmax/ymax coordinates of (7822, 54743).

Looking at this in QuPath (with [xmin, ymin, xmax, ymax] WSI coordinates of: [6798, 53719, 7822, 54743]), you can see that the region captured (inner square) contains much less area than what is actually provided in the $1024 \times 1024$ px image.

Image

To investigate this further, I've estimated where the actual region captured is (outer square), and noted it's pixel coordinates (at the original WSI MPP) as [6798, 53719, 8022, 54943].

This gives that area a width and height of 1224px (8022-6798, 54943-54743) at 0.2521MPP. Equivalent to $\approx 308.5 \mu m$.

Given the width/height (in $\mu m$) is $308.5 \mu m$ (verified in QuPath), and the width/height of the region in PanopTILs is 1024px, then the MPP of the 'rgbs' image is actually $\approx 0.3$ MPP. This closely aligns with what I found the expected MPP to be for the "Manual" label areas.

I've also redone this analysis on a separate slide to check the issue isn't isolated (TCGA-A2-A0SX-DX1). QuPath says the MPP is 0.2480.

Manual region: TCGA-A2-A0SX-DX1_left-53824_top-56708_bottom-57947_right-55063
Width x height = 1239px x 1239px (@ 0.2480MPP). Equivalent to: 307.272 $\mu m$
PanopTILs region: 1024px x 1024px. Physical area captured: 307.272 $\mu m$ x 307.272 $\mu m$. Hence, actual MPP: $\approx 0.3$ MPP
If MPP was 0.25MPP, then PanopTILs region should be: 1229px x 1229px $\left(\frac{307.272}{0.25}\right)$

Bootstrap region: TCGA-A2-A0SX-DX1_xmin53791_ymin56683_MPP-0.2500_xmin-0_ymin-0_xmax-1024_ymax-1024

Image

Assuming largest height/width (i.e. 1024px specified at 0.25MPP, so converted to 0.248MPP = 1032px), we get the inner rectangle (below). This does not match what is captured in the dataset (above).

Image

Manually estimating the actual location on the slide (larger box), gives xmax/ymax coordinates of: (55030, 57922)
This box has width x height of 1239 x 1239 (@ 0.2480MPP). Equivalent to: 307.272 $\mu m$
The equation then becomes exactly as above:
PanopTILs region: 1024px x 1024px. Physical area captured: 307.272 $\mu m$ x 307.272 $\mu m$. Hence, actual MPP: ~0.3MPP
If MPP was 0.25MPP, then PanopTILs region should be: 1229px x 1229px $\left(\frac{307.272}{0.25}\right)$.

Do you find a similar discrepancy in agreement with my findings? My suspicion is that something may have occurred during the region selection/preprocessing stage. If this finding is correct, then the main implication is that users training models assuming that the data is captured at 0.25MPP are actually training models expecting data closer to 0.3MPP (with potential implications when applying their model to data scaled to 0.25MPP).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions