You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/rapids_singlecell/preprocessing/_neighbors/__init__.py
+39-5Lines changed: 39 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -169,22 +169,40 @@ def neighbors(
169
169
metric_kwds
170
170
Options for the metric.
171
171
algorithm_kwds
172
-
Options for the algorithm. For 'ivfflat' and 'ivfpq' algorithms, the following
173
-
parameters can be specified:
172
+
Options for the algorithm.
173
+
For 'ivfflat' and 'ivfpq' algorithms, the following parameters can be specified:
174
+
174
175
* 'n_lists': Number of inverted lists for IVF indexing. Default is 2 * next_power_of_2(sqrt(n_samples)).
176
+
175
177
* 'n_probes': Number of lists to probe during search. Default is 20. Higher values
176
178
increase accuracy but reduce speed.
179
+
177
180
For 'nn_descent' algorithm, the following parameters can be specified:
181
+
178
182
* 'intermediate_graph_degree': The degree of the intermediate graph. Default is None.
179
183
It is recommended to set it to `>= 1.5 * n_neighbors`.
184
+
180
185
For 'all_neighbors' algorithm, the following parameters can be specified:
186
+
181
187
* 'algo': The algorithm to use. Valid options are: 'ivf_pq' and 'nn_descent'. Default is 'nn_descent'.
182
-
* 'n_lists': Number of inverted lists for IVF indexing. Default is 2 * next_power_of_2(sqrt(n_samples)).
188
+
189
+
* 'n_clusters': Number of clusters/batches to partition the dataset into (> overlap_factor). Default is number of GPUs.
190
+
191
+
* 'overlap_factor': Number of clusters each point is assigned to (must be < n_clusters). Default is 1.
192
+
193
+
* 'n_lists': Number of inverted lists for IVF indexing. Default is 2 * next_power_of_2(sqrt(n_samples)). Only available for 'ivf_pq' algorithm.
194
+
183
195
* 'n_probes': Number of lists to probe during search. Default is 20. Higher values
184
-
increase accuracy but reduce speed.
196
+
increase accuracy but reduce speed. Only available for 'ivf_pq' algorithm.
197
+
198
+
* 'intermediate_graph_degree': The degree of the intermediate graph. Default is None. It is recommended to set it to `>= 1.5 * n_neighbors`. Only available for 'nn_descent' algorithm.
199
+
185
200
For 'mg_ivfflat' and 'mg_ivfpq' algorithms, the following parameters can be specified:
186
-
* 'distribution_mode': The distribution mode to use. Valid options are: 'replicated' and 'distributed'. Default is 'replicated'.
201
+
202
+
* 'distribution_mode': The distribution mode to use. Valid options are: 'replicated' and 'shared'. Default is 'replicated'.
203
+
187
204
* 'n_lists': Number of inverted lists for IVF indexing. Default is 2 * next_power_of_2(sqrt(n_samples)).
205
+
188
206
* 'n_probes': Number of lists to probe during search. Default is 20. Higher values
189
207
increase accuracy but reduce speed.
190
208
@@ -337,6 +355,12 @@ def bbknn(
337
355
`'cagra'`
338
356
Employs the Compressed, Accurate Graph-based search to quickly find nearest neighbors by traversing a graph structure.
339
357
358
+
`'mg_ivfflat'`
359
+
Uses the Multi-GPU inverted file indexing to partition the dataset into coarse quantizer cells and performs the search within the relevant cells.
360
+
361
+
`'mg_ivfpq'`
362
+
Combines Multi-GPU inverted file indexing with product quantization to encode sub-vectors of the dataset, facilitating faster distance computation.
363
+
340
364
Please ensure that the chosen algorithm is compatible with your dataset and the specific requirements of your search problem.
341
365
metric
342
366
A known metric's name or a callable that returns a distance.
@@ -349,6 +373,16 @@ def bbknn(
349
373
* 'n_lists': Number of inverted lists for IVF indexing. Default is 2 * next_power_of_2(sqrt(n_samples)).
350
374
* 'nprobes': Number of lists to probe during search. Default is 1. Higher values
351
375
increase accuracy but reduce speed.
376
+
377
+
For 'mg_ivfflat' and 'mg_ivfpq' algorithms, the following parameters can be specified:
378
+
379
+
* 'distribution_mode': The distribution mode to use. Valid options are: 'replicated' and 'shared'. Default is 'replicated'.
380
+
381
+
* 'n_lists': Number of inverted lists for IVF indexing. Default is 2 * next_power_of_2(sqrt(n_samples)).
382
+
383
+
* 'n_probes': Number of lists to probe during search. Default is 20. Higher values
384
+
increase accuracy but reduce speed.
385
+
352
386
trim
353
387
Trim the neighbours of each cell to these many top connectivities.
354
388
May help with population independence and improve the tidiness of clustering.
0 commit comments