CropHarvest dataset discrepancy

I am trying to reproduce results of Galileo on CropHarvest benchmarks and seeing discrepancies in training and test data.

Specifically, I am using the latest cropharvest Python package, version 0.7.0.

I retrieve data like this:

```
datasets = CropHarvest.create_benchmark_datasets("/mnt/scratch/home/jakob.leben/workspace/cropharvest/data",
```

And for each dataset in datasets, I retrieve data with `dataset.as_array()` (train set) and `dataset.test_data` (test set).

I get the following:

- Togo train: 1290 total train samples, 581 negative, 709 positive. This is different than 1319 samples mentioned in Galileo and CropHarvest papers
- Togo test: matches papers
- Brazil train: 4253 samples, 4223 negative, 30 positive. This is different than 794 samples (773 negative, 21 positive) mentioned in the papers
- Brazil test: 537454 samples, 363428 negative, 174026 positive. The positive number matches CropHarvest paper, but not the negative
- Kenya train: 6605 samples, 6341 negative, 264 positive. This is different than 1345 samples (1079 negative and 266 positive) in the papers.
- Kenya test: matches papers

I have two Questions:

- Could you please clarify the discrepancy? Do the papers reflect an older version of Cropharvest?
- I would also like to confirm the following: I am assuming Table 6 in the Galileo paper reports % accurracy, based on the context, although the nature of those values is not explicitly stated in the paper. Is my assumption correct?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CropHarvest dataset discrepancy #19

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CropHarvest dataset discrepancy #19

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions