Skip to content

CropHarvest dataset discrepancy #19

@jleben

Description

@jleben

I am trying to reproduce results of Galileo on CropHarvest benchmarks and seeing discrepancies in training and test data.

Specifically, I am using the latest cropharvest Python package, version 0.7.0.

I retrieve data like this:

datasets = CropHarvest.create_benchmark_datasets("/mnt/scratch/home/jakob.leben/workspace/cropharvest/data",

And for each dataset in datasets, I retrieve data with dataset.as_array() (train set) and dataset.test_data (test set).

I get the following:

  • Togo train: 1290 total train samples, 581 negative, 709 positive. This is different than 1319 samples mentioned in Galileo and CropHarvest papers
  • Togo test: matches papers
  • Brazil train: 4253 samples, 4223 negative, 30 positive. This is different than 794 samples (773 negative, 21 positive) mentioned in the papers
  • Brazil test: 537454 samples, 363428 negative, 174026 positive. The positive number matches CropHarvest paper, but not the negative
  • Kenya train: 6605 samples, 6341 negative, 264 positive. This is different than 1345 samples (1079 negative and 266 positive) in the papers.
  • Kenya test: matches papers

I have two Questions:

  • Could you please clarify the discrepancy? Do the papers reflect an older version of Cropharvest?
  • I would also like to confirm the following: I am assuming Table 6 in the Galileo paper reports % accurracy, based on the context, although the nature of those values is not explicitly stated in the paper. Is my assumption correct?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions