Clarification on datasets used to train the released FoundationStereo model (zero-shot model in Table 2?)

Hi, thank you for your great work on FoundationStereo — the results are very impressive, especially the zero-shot generalization across challenging real-world datasets.

While reading the paper and using the released model, I had a few clarifying questions:

⸻

1. Training datasets used for the zero-shot foundation model

In Section 4.1, it is mentioned that the foundation model was trained on:

“a mixed dataset consisting of our proposed FSD, together with Scene Flow, Sintel, CREStereo, FallingThings, InStereo2K, and Virtual KITTI 2.”

However, in Table 1, additional datasets like TartanAir and IRS are summarized and compared. Could you please confirm:
	•	Were TartanAir or IRS used in any version of the model training?
	•	If not, is there a reason for excluding them (e.g., limited benefit, domain mismatch, or data quality concerns)?

⸻

2. Released model: is it the zero-shot foundation model in the paper?

You have kindly released a pretrained FoundationStereo model. Could you please clarify:
	•	Does this released checkpoint correspond to the zero-shot foundation model described in the paper?
	•	Was it trained only on the datasets listed in Section 4.1, without using any of the evaluation/test sets (Middlebury, ETH3D, KITTI 2012/2015)?

⸻

3. Table 2 results: which training data was used?

In Table 2, zero-shot generalization results are reported across four datasets (Middlebury, ETH3D, KITTI-12, KITTI-15).
	•	Can you confirm that the results in the second block of Table 2 (i.e., your strongest results) are based only on training with the datasets listed in Section 4.1, excluding any test-domain-specific data?

This clarification would help ensure reproducibility and give confidence to others using or fine-tuning the released model.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on datasets used to train the released FoundationStereo model (zero-shot model in Table 2?) #87

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Clarification on datasets used to train the released FoundationStereo model (zero-shot model in Table 2?) #87

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions