After enduring a torturous morning, I've had it with the garbage, stupid Python scripts cobbled together by certain AI research nerds. Their creations are practically useless, especially for someone like me with multiple GPUs – I need these scripts to run on specific GPUs. In my opinion, any decent AI-related script, particularly one involving CUDA, should offer two critical features: listing all your CUDA devices and allowing you to select which CUDA devices to use. Sadly, many AI scripts are developed with a "just get it working" mentality, completely neglecting these aspects. To put it in terms from "The Mythical Man-Month," they haven't evolved their "programs" into a "programming product." Then, I thought of how CUDA_VISIBLE_DEVICES can be used to control which CUDA devices PyTorch recognizes, and this parameter should similarly apply to scripts written in TensorFlow. With the help of Qwen over several hours this morning, I wrote a simple Python script that, when running other people's Python scripts, helps freely list all CUDA devices on the current computer and select available CUDA devices via the CUDA_VISIBLE_DEVICES environment variable. Upon completing this task, I realized it could also alleviate the suffering of others dealing with those awful scripts. Thus, I decided to make it public.
This is a script help you to launch a python script with specific CUDA devices.
Usage:
$ python3 ./launch.py
usage: launch.py [-h] [--list] [--device DEVICE] [--raw-py RAW_PY]
Encapsulated Script with GPU Options
options:
-h, --help show this help message and exit
--list List available GPUs and exit.
--device DEVICE Specify the CUDA device(s) to use (e.g., 0,1). If this value is invalid and less than the last available device ID (e.g., -1), the raw Python script
won't be able to detect any CUDA device and might default to running on the CPU (if supported). If this value is invalid and excess the last available
device ID (e.g. 65536), a warning will be issued and no operation will be performed. If this value is not provided, the raw Python script will detect
available CUDA devices as though the CUDA_VISIBLE_DEVICES environment variable is not set.
--raw-py RAW_PY Specify the raw Python script to run.
Example:
$ python3 ./launch.py --device 0,1 --raw-py test.py --other-args-for-test.py
0,1
Install the version of torch that matches your CUDA version. (It's recommended to install it in a Conda environment.)
Then, download this script and run it according to the example provided above.
Sorry for my poor English.