-
Notifications
You must be signed in to change notification settings - Fork 33
Description
I tried to run this project on my RTX 5070Ti on Ubuntu22.04, and this floating point exception occurred. I assumed it was probably due to the version mismatch of CUDA and pytorch on the latest Blackwell gpu (sm_120).
I also tried modifying the dockerfile to build the environment using the packages with newer supported versions, but faced other issues such as not being able to find the atari roms as included in the pre-built image (Step 7/11 : COPY atari57/ /opt/atari57 COPY failed: file not found in build context or excluded by .dockerignore: stat atari57/: file does not exist)
Below is the error message of the floating point exception:
[2025/09/12_01:21:49.578] [Version] 04a589
[2025/09/12_01:21:49.578] Server initialize over.
[2025/09/12_01:21:49.578] [Iteration] =====1=====
[2025/09/12_01:21:49.578] [SelfPlay] Start 0
connect success
Info docker-desktop_0 sp
[2025/09/12_01:21:51.083] [Worker Connection] docker-desktop_0 sp
CUDA_VISIBLE_DEVICES=0 build/tictactoe/minizero_tictactoe -mode sp -conf_file tictactoe_az_1bx256_n50-04a589/tictactoe_az_1bx256_n50-04a589.cfg -conf_str "nn_file_name=tictactoe_az_1bx256_n50-04a589/model/weight_iter_0.pt:program_auto_seed=false:program_seed=581869302:zero_training_directory=tictactoe_az_1bx256_n50-04a589:zero_num_threads=4:zero_num_parallel_games=64:program_quiet=true"
[2025/09/12_01:21:51.087] [Log] docker-desktop_0 sp: CUDA_VISIBLE_DEVICES=0 build/tictactoe/minizero_tictactoe -mode sp -conf_file tictactoe_az_1bx256_n50-04a589/tictactoe_az_1bx256_n50-04a589.cfg -conf_str "nn_file_name=tictactoe_az_1bx256_n50-04a589/model/weight_iter_0.pt:program_auto_seed=false:program_seed=581869302:zero_training_directory=tictactoe_az_1bx256_n50-04a589:zero_num_threads=4:zero_num_parallel_games=64:program_quiet=true"
connect success
Info docker-desktop_0 op
[2025/09/12_01:21:52.071] [Worker Connection] docker-desktop_0 op
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python minizero/learner/train.py tictactoe tictactoe_az_1bx256_n50-04a589 tictactoe_az_1bx256_n50-04a589/tictactoe_az_1bx256_n50-04a589.cfg
[2025/09/12_01:21:52.073] [Log] docker-desktop_0 op: CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python minizero/learner/train.py tictactoe tictactoe_az_1bx256_n50-04a589 tictactoe_az_1bx256_n50-04a589/tictactoe_az_1bx256_n50-04a589.cfg
scripts/zero-worker.sh: line 192: 995 Floating point exception(core dumped) CUDA_VISIBLE_DEVICES=${cuda_devices} ${sp_executable_file} -conf_file ${CONF_FILE} -conf_str "${CONF_STR}" -mode sp 0<&$broker_fd 1>&$broker_fd