-
Couldn't load subscription status.
- Fork 58
Description
Here's a code snippet in pygsti/optimize/customsolve.py on the head of master.
pyGSTi/pygsti/optimize/customsolve.py
Lines 157 to 168 in c74fcb2
| smbuf1b = _np.zeros(1, _np.int64) | |
| smbuf2 = _np.zeros(2, _np.int64) | |
| smbuf3 = _np.zeros(3, _np.int64) | |
| for icol in range(a.shape[1]): | |
| # Step 1: find the index of the row that is the best pivot. | |
| # each proc looks for its best pivot (Note: it should not consider rows already pivoted on) | |
| potential_pivot_indices = all_row_indices[potential_pivot_mask] | |
| ibest_global, ibest_local, h, k = _find_pivot(a, b, icol, potential_pivot_indices, my_row_slice, | |
| shared_floats, shared_ints, resource_alloc, comm, host_comm, | |
| smbuf1, smbuf2, smbuf3, host_index_buf, host_val_buf) |
The smbuf1b variable is never used. It looks like it should be passed as an argument to _find_pivot, after smbuf1, per the signature below (variable names in the signature are just buf1 and buf1b):
pyGSTi/pygsti/optimize/customsolve.py
Lines 213 to 214 in c74fcb2
| def _find_pivot(a, b, icol, potential_pivot_inds, my_row_slice, shared_floats, shared_ints, | |
| resource_alloc, comm, host_comm, buf1, buf1b, buf2, buf3, best_host_indices, best_host_vals): |
As a consequence of smbuf1b not being passed to _find_pivot, we're calling _find_pivot with only 15 positional arguments when it requires 16.
The appearance of this bug is uncommon because there's a dedicated code path for handling linear solves with matrices of order < 10,000 (below) that gets triggered in our testing and even in most large GST runs:
pyGSTi/pygsti/optimize/customsolve.py
Lines 98 to 126 in c74fcb2
| if comm.size < proc_threshold and a.shape[1] < 10000: | |
| # We're not exactly sure where scipy is better, but until we speed up / change gaussian-elim | |
| # alg the scipy alg is much faster for small numbers of procs and so should be used unless | |
| # A is too large to be gathered to the root proc. | |
| global_a, a_shm = ari.gather_jtj(a, return_shared=True) | |
| global_b, b_shm = ari.gather_jtf(b, return_shared=True) | |
| #global_a = ari.gather_jtj(a) | |
| #global_b = ari.gather_jtf(b) | |
| if comm.rank == 0: | |
| try: | |
| global_x = _scipy.linalg.solve(global_a, global_b, assume_a='pos') | |
| ok_buf[0] = 1 # ok | |
| except _scipy.linalg.LinAlgError as e: | |
| ok_buf[0] = 0 # failure! | |
| err = e | |
| else: | |
| global_x = None | |
| err = _scipy.linalg.LinAlgError("Linear solver fail on root proc!") # just in case... | |
| comm.Bcast(ok_buf, root=0) | |
| if ok_buf[0] == 0: | |
| _smt.cleanup_shared_ndarray(a_shm) | |
| _smt.cleanup_shared_ndarray(b_shm) | |
| raise err # all procs must raise in sync | |
| ari.scatter_x(global_x, x) | |
| _smt.cleanup_shared_ndarray(a_shm) | |
| _smt.cleanup_shared_ndarray(b_shm) | |
| return |
Possible ways to move forward:
- Temporary. Just change the dimension threshold to 100,000 instead of 10,000. Storing one double-precision matrix of this size requires ~80GB. The overhead of the Python call means this will require 160 GB of "free" RAM to run. That's a reasonable amount for modern compute servers.
- Try and fix the bug in the custom Gaussian elimination code. Have the branch on matrix size check a user-modifiable constant instead of a hard coded value. Some of our tests can run this code where that matrix size threshold is quite small.
- Ambitious. Create a new ArrayInterface class that relies on Dask arrays. Replace the custom Gaussian elimination code with calls to Dask's distributed linear algebra functions: https://docs.dask.org/en/stable/_modules/dask/array/linalg.html.