Skip to content

Commit 7557c28

Browse files
authored
release oc20 slabs data (#1843)
We got some interest in this data, finally getting around to releasing it. Resolves #1842
1 parent e468b64 commit 7557c28

1 file changed

Lines changed: 27 additions & 0 deletions

File tree

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Open Catalyst 2020 Surfaces Dataset
2+
3+
:::{card} Dataset Overview
4+
5+
| Property | Value |
6+
|----------|-------|
7+
| **Size** | 18.3M structures |
8+
| **Purpose** | Clean catalyst surfaces |
9+
| **Paper** | [UMA Paper](https://arxiv.org/pdf/2506.23971) |
10+
| **License** | [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode) |
11+
:::
12+
13+
## Overview
14+
15+
This dataset contains the clean OC20 catalyst surfaces used to train UMA. Surfaces appearing in any OC20 test split were excluded to avoid data leakage. This filtering is strict — even surfaces in the in-domain test split are excluded to ensure fair S2EF evaluations.
16+
17+
## File Contents and Download
18+
19+
| Splits | Size | MD5 Checksum (Download Link) |
20+
|-----------|------------|------------------------------|
21+
| Train+Val | 18,323,074 | [f047cbd515f213d0b0925704abbb7ae5](https://dl.fbaipublicfiles.com/opencatalystproject/data/oc20/oc20_slabs.tar.gz) |
22+
23+
The following metadata can be accessed in the `atoms.info` entry:
24+
25+
- `sid`: Unique surface system identifier
26+
- `fid`: Frame index along the relaxation trajectory
27+
- `fmax`: Maximum per-atom force

0 commit comments

Comments
 (0)