Skip to content

Commit

Permalink
docs: Improve supraseal setup docs (#409)
Browse files Browse the repository at this point in the history
* docs: Improve supraseal setup docs

* Update documentation/en/supraseal.md

---------

Co-authored-by: LexLuthr <[email protected]>
  • Loading branch information
magik6k and LexLuthr authored Feb 12, 2025
1 parent f6dec91 commit 3fce199
Showing 1 changed file with 113 additions and 85 deletions.
198 changes: 113 additions & 85 deletions documentation/en/supraseal.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,22 @@ You need 2 sets of NVMe drives:
* Fast with sufficient capacity (\~70G x batchSize x pipelines)
* Can be remote storage if fast enough (\~500MiB/s/GPU)

The following table shows the number of NVMe drives required for different batch sizes. The drive count column indicates `N + M` where `N` is the number of drives for layer data (SPDK) and `M` is the number of drives for P2 output (filesystem).
The iops/drive column shows the minimum iops **per drive** required for the batch size.
Batch size indicated with `2x` means dual-pipeline drive setup. IOPS requirements are calculated simply by dividing total target 10M IOPS by the number of drives. In reality, depending on CPU core speed this may be too low or higher than neccesary. When ordering a system with barely enough IOPS plan to have free drive slots in case you need to add more drives later.

| Batch Size | 3.84TB | 7.68TB | 12.8TB | 15.36TB | 30.72TB |
|--------------|--------|--------|--------|---------|---------|
| 32 | 4 + 1 | 2 + 1 | 1 + 1 | 1 + 1 | 1 + 1 |
| ^ iops/drive | 2500K | 5000K | 10000K | 10000K | 10000K |
| 64 (2x 32) | 7 + 2 | 4 + 1 | 2 + 1 | 2 + 1 | 1 + 1 |
| ^ iops/drive | 1429K | 2500K | 5000K | 5000K | 10000K |
| 128 (2x 64) | 13 + 3 | 7 + 2 | 4 + 1 | 4 + 1 | 2 + 1 |
| ^ iops/drive | 770K | 1429K | 2500K | 2500K | 5000K |
| 2x 128 | 26 + 6 | 13 + 3 | 8 + 2 | 7 + 2 | 4 + 1 |
| ^ iops/drive | 385K | 770K | 1250K | 1429K | 2500K |


## Hardware Recommendations

Currently, the community is trying to determine the best hardware configurations for batch sealing. Some general observations are:
Expand All @@ -59,6 +75,92 @@ Currently, the community is trying to determine the best hardware configurations
Please consider contributing to the [SupraSeal hardware examples](https://github.com/filecoin-project/curio/discussions/140).
{% endhint %}

## Setup

### Check NUMA setup:

```bash
numactl --hardware
```

You should expect to see `available: 1 nodes (0)`. If you see more than one node you need to go into your UEFI and set `NUMA Per Socket` (or a similar setting) to 1.

### Configure hugepages:

This can be done by adding the following to `/etc/default/grub`. You need 36 1G hugepages for the batch sealer.

```bash
GRUB_CMDLINE_LINUX_DEFAULT="hugepages=36 default_hugepagesz=1G hugepagesz=1G"
```

Then run `sudo update-grub` and reboot the machine.

Or at runtime:

```bash
sudo sysctl -w vm.nr_hugepages=36
```

Then check /proc/meminfo to verify the hugepages are available:

```bash
cat /proc/meminfo | grep Huge
```

Expect output like:

```
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 36
HugePages_Free: 36
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
```

Check that `HugePages_Free` is equal to 36, the kernel can sometimes use some of the hugepages for other purposes.

### Dependencies

CUDA 12.x is required, 11.x won't work. The build process depends on GCC 11.x system-wide or gcc-11/g++-11 installed locally.

* On Arch install https://aur.archlinux.org/packages/gcc11
* Ubuntu 22.04 has GCC 11.x by default
* On newer Ubuntu install `gcc-11` and `g++-11` packages
* In addtion to general build dependencies (listed on the [installation page](installation.md)), you need `libgmp-dev` and `libconfig++-dev`


### Building

Build the batch-capable Curio binary:

```bash
make batch
```

For calibnet

```bash
make batch-calibnet
```

{% hint style="warning" %}
The build should be run on the target machine. Binaries won't be portable between CPU generations due to different AVX512 support.
{% endhint %}

### Setup NVMe devices for SPDK:

{% hint style="info" %}
This is only needed while batch sealing is in beta, future versions of Curio will handle this automatically.
{% endhint %}

```bash
cd extern/supraseal/deps/spdk-v24.05/
env NRHUGE=36 ./scripts/setup.sh
```

### Benchmark NVME IOPS

Please make sure to benchmark the raw NVME IOPS before proceeding with further configuration to verify that IOPS requirements are fulfilled.&#x20;
Expand Down Expand Up @@ -92,32 +194,28 @@ Total : 8006785.90 31276.51 71.91 1

With ideally >10M IOPS total for all devices.

## Setup

### Dependencies
### PC2 output storage

CUDA 12.x is required, 11.x won't work. The build process depends on GCC 11.x system-wide or gcc-11/g++-11 installed locally.
Attach scratch space storage for PC2 output (batch sealer needs \~70GB per sector in batch - 32GiB for the sealed sector, and 36GiB for the cache directory with TreeC/TreeR and aux files)

* On Arch install https://aur.archlinux.org/packages/gcc11
* Ubuntu 22.04 has GCC 11.x by default
* On newer Ubuntu install `gcc-11` and `g++-11` packages
## Usage

```bash
### Building
1. Start the Curio node with the batch sealer layer

Build the batch-capable Curio binary:
make batch
```bash
curio run --layers batch-machine1
```

For calibnet
2. Add a batch of CC sectors:

```bash
make batch-calibnet
curio seal start --now --cc --count 32 --actor f01234 --duration-days 365
```

{% hint style="warning" %}
The build should be run on the target machine. Binaries won't be portable between CPU generations due to different AVX512 support.
{% endhint %}
3. Monitor progress - you should see a "Batch..." task running in the [Curio GUI](curio-gui.md)
4. PC1 will take 3.5-5 hours, followed by PC2 on GPU
5. After batch completion, the storage will be released for the next batch

## Configuration

Expand Down Expand Up @@ -253,76 +351,6 @@ BatchSealPipelines = 2
SingleHasherPerThread = false
```

### Configure hugepages:

This can be done by adding the following to `/etc/default/grub`. You need 36 1G hugepages for the batch sealer.

```bash
GRUB_CMDLINE_LINUX_DEFAULT="hugepages=36 default_hugepagesz=1G hugepagesz=1G"
```

Then run `sudo update-grub` and reboot the machine.

Or at runtime:

```bash
sudo sysctl -w vm.nr_hugepages=36
```

Then check /proc/meminfo to verify the hugepages are available:

```bash
cat /proc/meminfo | grep Huge
```

Expect output like:

```
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 36
HugePages_Free: 36
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
```

Check that `HugePages_Free` is equal to 36, the kernel can sometimes use some of the hugepages for other purposes.

### Setup NVMe devices for SPDK:

{% hint style="info" %}
This is only needed while batch sealing is in beta, future versions of Curio will handle this automatically.
{% endhint %}

```bash
cd extern/supraseal/deps/spdk-v24.05/
env NRHUGE=36 ./scripts/setup.sh
```

### PC2 output storage

Attach scratch space storage for PC2 output (batch sealer needs \~70GB per sector in batch - 32GiB for the sealed sector, and 36GiB for the cache directory with TreeC/TreeR and aux files)

## Usage

1. Start the Curio node with the batch sealer layer

```bash
curio run --layers batch-machine1
```

2. Add a batch of CC sectors:

```bash
curio seal start --now --cc --count 32 --actor f01234 --duration-days 365
```

3. Monitor progress - you should see a "Batch..." task running in the [Curio GUI](curio-gui.md)
4. PC1 will take 3.5-5 hours, followed by PC2 on GPU
5. After batch completion, the storage will be released for the next batch

## Optimization

* Balance batch size, CPU cores, and NVMe drives to keep PC1 running constantly
Expand Down

0 comments on commit 3fce199

Please sign in to comment.