Description
❓ Question
I am trying to run GPU docker image using scripts and it looks like I have everything set up, but still I get the "Using cpu device" information during training.
I've downloaded the repo rl-baselines3-zoo
I've run the script with --device cuda and without (fallback to "auto")
./scripts/run_docker_gpu.sh python train.py --algo ppo --env CartPole-v1 --device cuda
I've checked the nvidia-smi on my host machine:
NVIDIA-SMI 570.124.04 Driver Version: 570.124.04 CUDA Version: 12.8
NVIDIA GeForce GTX 1080 Ti
I've checked the nvidia-smi from the image itself and I got the same output, so my GPU is visible from the image itself.
For that I've prepared docker compose file which I run like this:
docker compose run rl-baselines3-zoo
The file itself:
version: "3.8"
services:
rl-baselines3-zoo:
image: stablebaselines/rl-baselines3-zoo
volumes:
- ./:/rl-baselines3-zoo
runtime: nvidia # Add this line
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
working_dir: /rl-baselines3-zoo
command: /bin/bash
When I run nvidia-smi from the container I can see my gpu as pasted above. The command:
(base) mambauser@c20a8aaf94f7:/rl-baselines3-zoo$ nvidia-smi
Whenever I run an example training script, it uses my CPU which I can see in System Monitor.
I've tried to run the command:
python train.py --algo ppo --env CartPole-v1
It doesn't matter if i run this from within the container or use ./scripts/run_docker_gpu.sh or if I try to force --device cuda, it still outputs:
========== CartPole-v1 ==========
Seed: 1578490461
Loading hyperparameters from: /rl-baselines3-zoo/hyperparams/ppo.yml
Default hyperparameters for environment (ones being tuned will be overridden):
OrderedDict([('batch_size', 256),
('clip_range', 'lin_0.2'),
('ent_coef', 0.0),
('gae_lambda', 0.8),
('gamma', 0.98),
('learning_rate', 'lin_0.001'),
('n_envs', 8),
('n_epochs', 20),
('n_steps', 32),
('n_timesteps', 100000.0),
('policy', 'MlpPolicy')])
Using 8 environments
Creating test environment
Using cpu device
Log path: logs/ppo/CartPole-v1_12
I know that cartpole and ppo might not be optimized for GPU and should be run on CPU, but still it bothers me. What am I missing?
What else I need to check?
I thought that this might be hardcoded to use CPU for this particular environment and algorithm but it seems like it isn't.
Checklist
- I have checked that there is no similar issue in the repo
- I have read the SB3 documentation
- I have read the RL Zoo documentation
- If code there is, it is minimal and working
- If code there is, it is formatted using the markdown code blocks for both code and stack traces.