Skip to content

Commit 7389d33

Browse files
fix(root): project level stats to root
1 parent 5dc3e8e commit 7389d33

File tree

2 files changed

+19
-4
lines changed

2 files changed

+19
-4
lines changed

dataset_card.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ dataset/
103103
data.parquet # HuggingFace Parquet with embedded images and masks
104104
projects_summary.json
105105
projects_map.geojson
106+
dataset_stats.json
106107
```
107108

108109
### Data Fields
@@ -123,8 +124,9 @@ dataset/
123124
**Metadata:**
124125
- `metadata.json`: Project-level information (TM project ID, name, imagery URL, country, validation status)
125126
- `aoi.geojson`: Project area of interest boundary
126-
- `projects_summary.json`: Summary of all included projects
127-
- `projects_map.geojson`: Map of all project areas
127+
- [`projects_summary.json`](https://huggingface.co/datasets/hotosm/vhr-building-segmentation/blob/main/projects_summary.json): Summary of all included projects
128+
- [`projects_map.geojson`](https://huggingface.co/datasets/hotosm/vhr-building-segmentation/blob/main/projects_map.geojson): Map of all project areas
129+
- [`dataset_stats.json`](https://huggingface.co/datasets/hotosm/vhr-building-segmentation/blob/main/dataset_stats.json): Aggregate dataset statistics
128130

129131
### Data Splits
130132

src/hot_oam_dataset/hf_publisher.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -202,11 +202,12 @@ def push_raw_to_hub(
202202
api = HfApi()
203203
api.create_repo(repo_id=repo_id, repo_type="dataset", exist_ok=True)
204204

205-
ignore = ["parquet/*", ".cache/*"]
205+
root_level_files = ["projects_map.geojson", "dataset_stats.json", "projects_summary.json"]
206+
ignore = ["parquet/*", ".cache/*"] + root_level_files
206207
if compress:
207208
ignore.extend(["*/chips/*", "*/masks/*", "*/labels/*"])
208209

209-
logger.info("Uploading raw dataset folder to %s (raw/ on main)", repo_id)
210+
logger.info("Uploading raw project data to %s (raw/ on main)", repo_id)
210211
api.upload_folder(
211212
folder_path=str(dataset_dir),
212213
path_in_repo="raw",
@@ -215,6 +216,18 @@ def push_raw_to_hub(
215216
ignore_patterns=ignore,
216217
commit_message=f"Upload raw dataset v{dataset_version}",
217218
)
219+
220+
for filename in root_level_files:
221+
filepath = dataset_dir / filename
222+
if filepath.exists():
223+
api.upload_file(
224+
path_or_fileobj=str(filepath),
225+
path_in_repo=filename,
226+
repo_id=repo_id,
227+
repo_type="dataset",
228+
commit_message=f"Upload {filename} v{dataset_version}",
229+
)
230+
218231
logger.info("Raw dataset uploaded to https://huggingface.co/datasets/%s/tree/main/raw", repo_id)
219232

220233

0 commit comments

Comments
 (0)