spatial_autocorr limit the numba threads as n_jobs temporarily #984

selmanozleyen · 2025-04-07T10:45:22Z

Description

I am not sure if this is a bug. But it makes sense for the user to expect numba to be using at most n_jobs on their cores. I made one solution like this one but I think any code that uses numba will have to be modified this way if we see this as a bug right @ilan-gold ? or am I missing something?

Closes

#957 (comment)

codecov-commenter · 2025-04-07T10:54:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 66.60%. Comparing base (4a632d6) to head (c653dc3).
Report is 189 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #984      +/-   ##
==========================================
- Coverage   69.99%   66.60%   -3.39%     
==========================================
  Files          39       40       +1     
  Lines        5532     6061     +529     
  Branches     1037     1014      -23     
==========================================
+ Hits         3872     4037     +165     
- Misses       1367     1663     +296     
- Partials      293      361      +68

Files with missing lines	Coverage Δ
src/squidpy/gr/_ppatterns.py	`80.78% <100.00%> (+1.81%)`	⬆️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ilan-gold · 2025-04-07T11:50:37Z

@selmanozleyen I think this issue is conflating two things. The function you're wrapping doesn't appear to have anything to do with numba, am I right? So why does setting numba help? If it does, could you explain. Would there be any way for you to confirm (if not be test than by posting results) that your fix works?

selmanozleyen · 2025-04-07T12:39:31Z

I assumed it is numba related it uses score_helper which uses moran's I which is implemented with numba in scanpy.

squidpy/src/squidpy/gr/_ppatterns.py

Line 254 in afcb8d0

func = _morans_i if mode == SpatialAutocorr.MORAN else _gearys_c

moran helper in scanpy:
https://github.com/scverse/scanpy/blob/15c5434ad0382614a16df612745c183807675d04/src/scanpy/metrics/_morans_i.py#L131

I checked locally with htop and this runs on all the cores without the changes I made

import numpy as np
import pandas as pd

import anndata as ad
import scanpy as sc
import squidpy as sq



# load the pre-processed dataset

adata = sq.datasets.visium_hne_adata()
sq.gr.spatial_neighbors(adata)
sq.gr.spatial_autocorr(adata, n_jobs=1, n_perms=10000000, mode="moran")

ilan-gold · 2025-04-07T12:48:51Z

Awesome thanks! And with the change, it works? I would wonder if this problem applies everywhere this parallelize appears, in which case it might make sense to make this a decorator on parallelize or the like.

selmanozleyen · 2025-04-07T13:41:02Z

Yes it works when I set it to 1 but it doesn't work for 2 because there is no guarantee that numba and joblib will use the same cores. So there would be 2*n_jobs cores utilized. I couldn't observe this very clearly because I have 8 cores locally already atm.

But do you think this is a bug? I think n_jobs was just meant for the parallelize function. And setting a global variable like this doesn't feel right. What happens if program runs this method and when it runs another program expects more cores from numba? I think it just a matter of communicating what n_jobs means otherwise the user should set the global configuration of numba imo.

ilan-gold · 2025-04-07T13:48:43Z

Right @selmanozleyen yes I got lost in the sauce. I understand now better, I think. So:

The n_jobs parameter is meant for parallelize, not numba
Separately numba has its own setting the environment variable NUMBA_NUM_THREADS
Setting the former does not interact with the later, so limiting n_jobs means numba may still max out your CPU (or similar behavior)

If so, then I think this issue is one of documentation, you're right.

add a block where you set numba threads

c653dc3

selmanozleyen mentioned this pull request Apr 9, 2025

Docs: Edit the argument documentation of n_jobs of the parallelize function #987

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spatial_autocorr limit the numba threads as n_jobs temporarily #984

spatial_autocorr limit the numba threads as n_jobs temporarily #984

selmanozleyen commented Apr 7, 2025

codecov-commenter commented Apr 7, 2025 •

edited

Loading

ilan-gold commented Apr 7, 2025

selmanozleyen commented Apr 7, 2025

ilan-gold commented Apr 7, 2025

selmanozleyen commented Apr 7, 2025

ilan-gold commented Apr 7, 2025 •

edited

Loading

spatial_autocorr limit the numba threads as n_jobs temporarily #984

Are you sure you want to change the base?

spatial_autocorr limit the numba threads as n_jobs temporarily #984

Conversation

selmanozleyen commented Apr 7, 2025

Description

Closes

codecov-commenter commented Apr 7, 2025 • edited Loading

Codecov Report

ilan-gold commented Apr 7, 2025

selmanozleyen commented Apr 7, 2025

ilan-gold commented Apr 7, 2025

selmanozleyen commented Apr 7, 2025

ilan-gold commented Apr 7, 2025 • edited Loading

codecov-commenter commented Apr 7, 2025 •

edited

Loading

ilan-gold commented Apr 7, 2025 •

edited

Loading