Skip to content

Commit 6706586

Browse files
authored
Merge pull request #259 from BQSKit/gpuGuide
GPU guide
2 parents 14eab3c + 33091aa commit 6706586

File tree

3 files changed

+354
-1
lines changed

3 files changed

+354
-1
lines changed

docs/guides/usegpus.md

+353
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,353 @@
1+
# Using BQSKit on a GPU Cluster
2+
3+
This guide explains how to use BQSKit with GPUs by leveraging the `bqskit-qfactor-jax` package. The `bqskit-qfactor-jax` package provides GPU implementation support for the [QFactor](https://ieeexplore.ieee.org/abstract/document/10313638) and [QFactor-Sample](https://arxiv.org/abs/2405.12866) instantiation algorithms. For more detailed information and advanced configurations of the BQSKit runtime, refer to the [BQSKit distribution guide](https://bqskit.readthedocs.io/en/latest/guides/distributing.html).
4+
5+
We will guide you through the installation, setup, and execution process for BQSKit on a GPU cluster.
6+
7+
## bqskit-qfactor-jax Package Installation
8+
9+
First, you will need to install `bqskit-qfactor-jax`. This can easily done by using pip
10+
```sh
11+
pip install bqskit-qfactor-jax
12+
```
13+
14+
This command will also install all the dependencies including BQSKit and JAX with GPU support.
15+
16+
## Optimizing a Circuit Using QFactor-Sample and the Gate Deletion Flow
17+
This section explains how to optimize a quantum circuit using QFactor-Sample and the gate deletion flow.
18+
19+
First, we load the circuit to be optimized using the Circuit class.
20+
```python
21+
from bqskit import Circuit
22+
23+
# Load a circuit from QASM
24+
in_circuit = Circuit.from_file("circuit_to_opt.qasm")
25+
```
26+
27+
Then we create the instantiator instance and set the number of multistarts to 32.
28+
```python
29+
from qfactorjax.qfactor_sample_jax import QFactorSampleJax
30+
31+
num_multistarts = 32
32+
33+
qfactor_sample_gpu_instantiator = QFactorSampleJax()
34+
35+
instantiate_options = {
36+
'method': qfactor_sample_gpu_instantiator,
37+
'multistarts': num_multistarts,
38+
}
39+
40+
```
41+
42+
Next, generate the optimization flow.
43+
```python
44+
from bqskit.passes import *
45+
46+
# Prepare the compilation passes
47+
passes = [
48+
# Convert U3s to VU
49+
ToVariablePass(),
50+
51+
# Split the circuit into partitions
52+
QuickPartitioner(partition_size),
53+
54+
# For each partition perform scanning gate removal using QFactor jax
55+
ForEachBlockPass([
56+
ScanningGateRemovalPass(
57+
instantiate_options=instantiate_options,
58+
),
59+
]),
60+
61+
# Combine the partitions back into a circuit
62+
UnfoldPass(),
63+
64+
# Convert back the VariablueUnitaires into U3s
65+
ToU3Pass(),
66+
]
67+
```
68+
69+
70+
Finally, use a compiler instance to execute the passes, and then print the statistics. If your system has more than a single GPU, then you should initiate a detached server and connect to it. A destailed explanation on how to setup BQSKit runtime is given in the next sections of the this guide.
71+
```python
72+
from bqskit.compiler import Compiler
73+
74+
with Compiler(num_workers=1) as compiler:
75+
76+
out_circuit = compiler.compile(in_circuit, passes)
77+
78+
print(
79+
f'Circuit finished with gates: {out_circuit.gate_counts}, '
80+
f'while started with {in_circuit.gate_counts}',
81+
)
82+
```
83+
84+
## QFactor-JAX and QFactor-Sample-JAX Use Examples
85+
86+
87+
For other usage examples, please refer to the [examples directory](https://github.com/BQSKit/bqskit-qfactor-jax/tree/main/examples) in the `bqskit-qfactor-jax` package. There, you will find two Toffoli instantiation examples using QFactor and QFactor-Sample, as well as two different synthesis flows that also utilize these algorithms.
88+
89+
90+
## Setting Up a Multi-GPU Environment
91+
92+
To run BQSKit with multiple GPUs, you need to set up the BQSKit runtime properly. Each worker should be assigned to a specific GPU by leveraging NVIDIA's CUDA_VISIBLE_DEVICES environment variable. Several workers can use the same GPU by utilizing [NVIDIA's MPS](https://docs.nvidia.com/deploy/mps/). You can set up the runtime on a single server ( or interactive node on a cluster) or using SBATCH on several nodes. You can find scripts to help you set up the runtime in this [link](https://github.com/BQSKit/bqskit-qfactor-jax/tree/main/examples/bqskit_env_scripts).
93+
94+
You may configure the number of GPUs to use on each server and also the number of workers on each GPU. If you use too many workers on the same GPU, you will run out of memory and experience an out-of-memory exception. If you are using QFactor, you may use the following table as a starting configuration and adjust the number of workers according to your specific circuit, unitary size, and GPU performance. If you are using QFactor-Sample, start with a single worker and increase if the memory permits it. You can use the `nvidia-smi` command to check the GPU usage during execution; it specifies the utilization of the memory and the execution units.
95+
96+
| Unitary Size | Workers per GPU |
97+
|----------------|------------------|
98+
| 3,4 | 10 |
99+
| 5 | 8 |
100+
| 6 | 4 |
101+
| 7 | 2 |
102+
| 8 and more | 1 |
103+
104+
Make sure that in your Python script, you are creating the compiler object with the appropriate IP address. When running on the same node as the server, you can use \`localhost\` as the IP address.
105+
106+
```python
107+
with Compiler('localhost') as compiler:
108+
out_circuit = compiler.compile(in_circuit, passes)
109+
```
110+
111+
112+
### Single Server Multiple GPUs Setup
113+
This section of the guide explains the main concepts in the [single_server_env.sh](https://github.com/BQSKit/bqskit-qfactor-jax/blob/main/examples/bqskit_env_scripts/single_server_env.sh) script template and how to use it. The script creates a GPU-enabled BQSKit runtime and is easily configured for any system.
114+
115+
After you configure the template (replacing every <> with an appropriate value) run it, and then in a separate shell execute your python script that uses this runtime environment.
116+
117+
The environment script has the following parts:
118+
1. Variable configuration - choosing the number of GPUs to use, and the number of workers per GPU. Moreover, the scratch dir path is configured and later used for logging.
119+
```bash
120+
#!/bin/bash
121+
hostname=$(uname -n)
122+
unique_id=bqskit_${RANDOM}
123+
amount_of_gpus=<Number of GPUS to use in the node>
124+
amount_of_workers_per_gpu=<Number of workers per GPU>
125+
total_amount_of_workers=$(($amount_of_gpus * $amount_of_workers_per_gpu))
126+
scratch_dir=<temp_dir>
127+
```
128+
2. Log file monitoring functions to monitor the startup of BQSKit managers and server.
129+
```bash
130+
wait_for_outgoing_thread_in_manager_log() {
131+
while [[ ! -f "$manager_log_file" ]]
132+
do
133+
sleep 0.5
134+
done
135+
136+
while ! grep -q "Started outgoing thread." $manager_log_file; do
137+
sleep 1
138+
done
139+
}
140+
141+
wait_for_server_to_connect(){
142+
while [[ ! -f "$server_log_file" ]]
143+
do
144+
sleep 0.5
145+
done
146+
147+
while ! grep -q "Connected to manager" $server_log_file; do
148+
sleep 1
149+
done
150+
}
151+
```
152+
3. Creating the log directory, and deleting any old log files that conflict with the current run logs.
153+
```bash
154+
mkdir -p $scratch_dir/bqskit_logs
155+
156+
manager_log_file=$scratch_dir/bqskit_logs/manager_${unique_id}.log
157+
server_log_file=$scratch_dir/bqskit_logs/server_${unique_id}.log
158+
159+
echo "Will start bqskit runtime with id $unique_id gpus = $amount_of_gpus and workers per gpu = $amount_of_workers_per_gpu"
160+
161+
# Clean old server and manager logs, if exists
162+
rm -f $manager_log_file
163+
rm -f $server_log_file
164+
```
165+
4. Starting NVIDA MPS to allow efficient execution of multiple works on a single GPU.
166+
```bash
167+
echo "Starting MPS server"
168+
nvidia-cuda-mps-control -d
169+
```
170+
5. Starting the BQSKit manager, and indicating to wait for workers to connect to it. Waiting for the manager to start listening for a connection from a server. This is important as the server might timeout if the manager isn't ready for the connection.
171+
```bash
172+
echo "starting BQSKit managers"
173+
174+
bqskit-manager -x -n$total_amount_of_workers -vvv &> $manager_log_file &
175+
manager_pid=$!
176+
wait_for_outgoing_thread_in_manager_log
177+
```
178+
6. Starting the BQSKit server indicating that there is a single manager in the current server. Waiting until the server connects to the manager before continuing to start the workers.
179+
```bash
180+
echo "starting BQSKit server"
181+
bqskit-server $hostname -vvv &>> $server_log_file &
182+
server_pid=$!
183+
184+
wait_for_server_to_connect
185+
```
186+
7. Starting the workers, each seeing only a specific GPU.
187+
```bash
188+
echo "Starting $total_amount_of_workers workers on $amount_of_gpus gpus"
189+
for (( gpu_id=0; gpu_id<$amount_of_gpus; gpu_id++ ))
190+
do
191+
XLA_PYTHON_CLIENT_PREALLOCATE=false CUDA_VISIBLE_DEVICES=$gpu_id bqskit-worker $amount_of_workers_per_gpu > $scratch_dir/bqskit_logs/workers_${SLURM_JOB_ID}_${hostname}_${gpu_id}.log &
192+
done
193+
```
194+
8. After all the processes have finished, stop the MPS server.
195+
```bash
196+
wait
197+
198+
echo "Stop MPS on $hostname"
199+
echo quit | nvidia-cuda-mps-control
200+
```
201+
202+
203+
### Multis-Server Multi-GPU Environment Setup
204+
205+
This section of the guide explains the main concepts in the [init_multi_node_multi_gpu_slurm_run.sh](https://github.com/BQSKit/bqskit-qfactor-jax/blob/main/examples/bqskit_env_scripts/init_multi_node_multi_gpu_slurm_run.sh) [run_workers_and_managers.sh](https://github.com/BQSKit/bqskit-qfactor-jax/blob/main/examples/bqskit_env_scripts/run_workers_and_managers.sh) scripts and how to use them. After configuring the scripts (updating every <>), place both of them in the same directory and initiate an SBATCH command. These scripts assume a SLURM environment but can be easily ported to other distribution systems.
206+
207+
```bash
208+
sbatch init_multi_node_multi_gpu_slurm_run.sh
209+
```
210+
211+
The rest of this section explains both of the scripts in detail.
212+
213+
#### init_multi_node_multi_gpu_slurm_run
214+
This is a SLURM batch script for running a multi-node BQSKit task across multiple GPUs. It manages job submission, environment setup, launching the BQSKit server and workers on different nodes, and the execution of the main application.
215+
216+
1. Job configuration and logging - this is a standard SLURM SBATCH header.
217+
```bash
218+
#!/bin/bash
219+
#SBATCH --job-name=<job_name>
220+
#SBATCH -C gpu
221+
#SBATCH -q regular
222+
#SBATCH -t <time_to_run>
223+
#SBATCH -n <number_of_nodes>
224+
#SBATCH --gpus=<total number of GPUs, not nodes>
225+
#SBATCH --output=<full_path_to_log_file>
226+
227+
scratch_dir=<temp_dir>
228+
```
229+
230+
2. Shell environment setup - Please consult with your HPC system admin to choose the appropriate modules to load that will enable you to JAX on NVDIA's GPUs. You may use NERSC's Perlmutter [documentation](https://docs.nersc.gov/development/languages/python/using-python-perlmutter/#jax) as a reference.
231+
```bash
232+
### load any modules needed and activate the conda environment
233+
module load <module1>
234+
module load <module2>
235+
conda activate <conda-env-name>
236+
```
237+
238+
3. Starting the managers on all of the nodes using SLURM’s srun command, initiating the run_workers_and_managers.sh script across all nodes. The former handles starting managers and workers on each node.
239+
```bash
240+
echo "starting BQSKit managers on all nodes"
241+
srun run_workers_and_managers.sh <number_of_gpus_per_node> <number_of_workers_per_gpu> &
242+
managers_pid=$!
243+
244+
managers_started_file=$scratch_dir/managers_${SLURM_JOB_ID}_started
245+
n=<number_of_nodes>
246+
```
247+
248+
4. Waiting for all managers to start, by tracking the number of lines in the log file, one created by each manager.
249+
```bash
250+
while [[ ! -f "$managers_started_file" ]]
251+
do
252+
sleep 0.5
253+
done
254+
255+
while [ "$(cat "$managers_started_file" | wc -l)" -lt "$n" ]; do
256+
sleep 1
257+
done
258+
```
259+
260+
5. Starting the BQSKit server on the main node, and using SLURM's `SLURM_JOB_NODELIST` environment variable to indicate the BQSKit server the hostnames of the managers.
261+
```bash
262+
echo "starting BQSKit server on the main node"
263+
bqskit-server $(scontrol show hostnames "$SLURM_JOB_NODELIST" | tr '\n' ' ') &> $scratch_dir/bqskit_logs/server_${SLURM_JOB_ID}.log &
264+
server_pid=$!
265+
266+
uname -a >> $scratch_dir/server_${SLURM_JOB_ID}_started
267+
```
268+
269+
6. Executing the main application that will connect to the BQSKit runtime
270+
```bash
271+
python <Your command>
272+
```
273+
274+
7. After the run is over, closing the BQSKit server.
275+
```bash
276+
echo "Killing the server"
277+
kill -2 $server_pid
278+
```
279+
280+
#### run_workers_and_managers.sh
281+
This script is executed by each node to start the workers and managers on that specific node. It interacts with `init_multi_node_multi_gpu_slurm_run.sh`, the SBATCH script. If GPUs are required, the workers are spawnd seperatly from the manager, allowing for better configuratio of each worker.
282+
283+
The script starts with argument parsing and some variable configuration
284+
```bash
285+
#!/bin/bash
286+
287+
node_id=$(uname -n)
288+
amount_of_gpus=$1
289+
amount_of_workers_per_gpu=$2
290+
total_amount_of_workers=$(($amount_of_gpus * $amount_of_workers_per_gpu))
291+
292+
scratch_dir=<temp_dir>
293+
manager_log_file="$scratch_dir/bqskit_logs/manager_${SLURM_JOB_ID}_${node_id}.log"
294+
server_started_file="$scratch_dir/server_${SLURM_JOB_ID}_started"
295+
managers_started_file="$scratch_dir/managers_${SLURM_JOB_ID}_started"
296+
297+
touch $managers_started_file
298+
```
299+
300+
Then the script declares a few utility methods.
301+
302+
```bash
303+
wait_for_outgoing_thread_in_manager_log() {
304+
while ! grep -q "Started outgoing thread." $manager_log_file; do
305+
sleep 1
306+
done
307+
uname -a >> $managers_started_file
308+
}
309+
310+
start_mps_servers() {
311+
echo "Starting MPS servers on node $node_id with CUDA $CUDA_VISIBLE_DEVICES"
312+
nvidia-cuda-mps-control -d
313+
}
314+
315+
wait_for_bqskit_server() {
316+
i=0
317+
while [[ ! -f $server_started_file && $i -lt 10 ]]; do
318+
sleep 1
319+
i=$((i+1))
320+
done
321+
}
322+
323+
start_workers() {
324+
echo "Starting $total_amount_of_workers workers on $amount_of_gpus gpus"
325+
for (( gpu_id=0; gpu_id<$amount_of_gpus; gpu_id++ )); do
326+
XLA_PYTHON_CLIENT_PREALLOCATE=false CUDA_VISIBLE_DEVICES=$gpu_id bqskit-worker $amount_of_workers_per_gpu &> $scratch_dir/bqskit_logs/workers_${SLURM_JOB_ID}_${node_id}_${gpu_id}.log &
327+
done
328+
wait
329+
}
330+
331+
stop_mps_servers() {
332+
echo "Stop MPS servers on node $node_id"
333+
echo quit | nvidia-cuda-mps-control
334+
}
335+
```
336+
337+
Finally, the script checks if GPUs are not needed, it spawns the manager with its default behavior, else using the "-x" argument, it indicates to the manager to wait for connecting workers.
338+
```bash
339+
if [ $amount_of_gpus -eq 0 ]; then
340+
echo "Will run manager on node $node_id with n args of $amount_of_workers_per_gpu"
341+
bqskit-manager -n $amount_of_workers_per_gpu -v &> $manager_log_file
342+
echo "Manager finished on node $node_id"
343+
else
344+
echo "Will run manager on node $node_id"
345+
bqskit-manager -x -n$total_amount_of_workers -vvv &> $manager_log_file &
346+
wait_for_outgoing_thread_in_manager_log
347+
start_mps_servers
348+
wait_for_bqskit_server
349+
start_workers
350+
echo "Manager and workers finished on node $node_id" >> $manager_log_file
351+
stop_mps_servers
352+
fi
353+
```

docs/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ our `tutorial series. <https://github.com/BQSKit/bqskit-tutorial/>`_
2828
guides/customgate.md
2929
guides/custompass.md
3030
guides/distributing.md
31+
guides/usegpus.md
3132

3233
.. toctree::
3334
:caption: API Reference

docs/requirements.txt

-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
Sphinx>=4.5.0
2-
sphinx-autodoc-typehints>=1.12.0
32
sphinx-rtd-theme>=1.0.0
43
sphinx-togglebutton>=0.2.3
54
sphinx-autodoc-typehints>=2.3.0

0 commit comments

Comments
 (0)