Skip to content

spatial_autocorr limit the numba threads as n_jobs temporarily #984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

selmanozleyen
Copy link
Member

Description

I am not sure if this is a bug. But it makes sense for the user to expect numba to be using at most n_jobs on their cores. I made one solution like this one but I think any code that uses numba will have to be modified this way if we see this as a bug right @ilan-gold ? or am I missing something?

Closes

#957 (comment)

@codecov-commenter
Copy link

codecov-commenter commented Apr 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 66.60%. Comparing base (4a632d6) to head (c653dc3).
Report is 189 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #984      +/-   ##
==========================================
- Coverage   69.99%   66.60%   -3.39%     
==========================================
  Files          39       40       +1     
  Lines        5532     6061     +529     
  Branches     1037     1014      -23     
==========================================
+ Hits         3872     4037     +165     
- Misses       1367     1663     +296     
- Partials      293      361      +68     
Files with missing lines Coverage Δ
src/squidpy/gr/_ppatterns.py 80.78% <100.00%> (+1.81%) ⬆️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilan-gold
Copy link
Contributor

@selmanozleyen I think this issue is conflating two things. The function you're wrapping doesn't appear to have anything to do with numba, am I right? So why does setting numba help? If it does, could you explain. Would there be any way for you to confirm (if not be test than by posting results) that your fix works?

@selmanozleyen
Copy link
Member Author

I assumed it is numba related it uses score_helper which uses moran's I which is implemented with numba in scanpy.

func = _morans_i if mode == SpatialAutocorr.MORAN else _gearys_c

moran helper in scanpy:
https://github.com/scverse/scanpy/blob/15c5434ad0382614a16df612745c183807675d04/src/scanpy/metrics/_morans_i.py#L131

I checked locally with htop and this runs on all the cores without the changes I made

import numpy as np
import pandas as pd

import anndata as ad
import scanpy as sc
import squidpy as sq



# load the pre-processed dataset

adata = sq.datasets.visium_hne_adata()
sq.gr.spatial_neighbors(adata)
sq.gr.spatial_autocorr(adata, n_jobs=1, n_perms=10000000, mode="moran")

@ilan-gold
Copy link
Contributor

Awesome thanks! And with the change, it works? I would wonder if this problem applies everywhere this parallelize appears, in which case it might make sense to make this a decorator on parallelize or the like.

@selmanozleyen
Copy link
Member Author

Yes it works when I set it to 1 but it doesn't work for 2 because there is no guarantee that numba and joblib will use the same cores. So there would be 2*n_jobs cores utilized. I couldn't observe this very clearly because I have 8 cores locally already atm.

But do you think this is a bug? I think n_jobs was just meant for the parallelize function. And setting a global variable like this doesn't feel right. What happens if program runs this method and when it runs another program expects more cores from numba? I think it just a matter of communicating what n_jobs means otherwise the user should set the global configuration of numba imo.

@ilan-gold
Copy link
Contributor

ilan-gold commented Apr 7, 2025

Right @selmanozleyen yes I got lost in the sauce. I understand now better, I think. So:

  1. The n_jobs parameter is meant for parallelize, not numba
  2. Separately numba has its own setting the environment variable NUMBA_NUM_THREADS
  3. Setting the former does not interact with the later, so limiting n_jobs means numba may still max out your CPU (or similar behavior)

If so, then I think this issue is one of documentation, you're right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants