Skip to content

spn-node calibrate fails because the background moongate-server crashes on startup with an RTX 2080 Ti #205

@echowill

Description

@echowill

Environment:

GPU: NVIDIA GeForce RTX 2080 Ti
Operating System: Windows 11 with Docker Desktop
NVIDIA Driver Version: 571.96
spn-node Image: public.ecr.aws/succinct-labs/spn-node:latest-gpu
moongate Image: public.ecr.aws/succinct-labs/moongate:v5.0.8 (This is the image used by the sp1-gpu container)
Bug Description:
When attempting to run the benchmark using the spn-node calibrate command, the process consistently fails. The client (spn-node) hangs for approximately 60 seconds before panicking with a timeout or connection error.

The root cause has been identified as the immediate crash of the background proving server container (sp1-gpu, which runs the moongate-server image) upon startup.

Steps to Reproduce:
The crash of the moongate-server can be isolated and reproduced directly with the following command, without needing the spn-node client:

docker run -it --rm --gpus all public.ecr.aws/succinct-labs/moongate:v5.0.8 moongate-server

bash

Expected Behavior:
The moongate-server container should start successfully, initialize the GPU, and begin listening for RPC connections from the client.

Actual Behavior:
The moongate-server container crashes immediately upon launch. The only output to the console is a fatal runtime error:

fatal runtime error: Rust cannot catch foreign exceptions, aborting

This indicates that the server process is encountering a low-level (likely C++/CUDA) exception during the GPU initialization phase, which the Rust runtime cannot handle, leading to an immediate abort.

Additional Context & Diagnostics Performed:

The issue is not related to networking. The crash occurs even when both client and server containers are run with --network="host".
The issue is not with the spn-node client. The server container fails even when run standalone.
The issue appears to be a hardware-specific incompatibility between the moongate-server's GPU initialization code and the NVIDIA RTX 2080 Ti under a Windows + Docker environment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions