Description
Describe the bug
I get an error when trying to run the following similar to your Transcription Factor activity inference in the Pseudo-bulk functional analysis vignette
Mes1_tf_acts, Mes1_tf_pvals = dc.run_ulm(mat=Mes1_mat, net=collectri)
Mes1_tf_acts
TypeError Traceback (most recent call last)
Cell In[80], line 2
1 # Infer pathway activities with ulm
----> 2 Mes1_tf_acts, Mes1_tf_pvals = dc.run_ulm(mat=Mes1_mat, net=collectri)
3 Mes1_tf_acts
File ~/Documents/Projects/Bates/py_decoupler/.venv/lib/python3.9/site-packages/decoupler/method_ulm.py:109, in run_ulm(mat, net, source, target, weight, batch_size, min_n, verbose, use_raw)
107 net = rename_net(net, source=source, target=target, weight=weight)
108 net = filt_min_n(c, net, min_n=min_n)
--> 109 sources, targets, net = get_net_mat(net)
111 # Match arrays
112 net = match(c, targets, net)
File ~/Documents/Projects/Bates/py_decoupler/.venv/lib/python3.9/site-packages/decoupler/pre.py:258, in get_net_mat(net)
255 targets = X.index.values
256 X = X.values
--> 258 return sources.astype('U'), targets.astype('U'), X.astype(np.float32)
TypeError: float() argument must be a string or a number, not 'NAType'
To Reproduce
The error actually seems to stem from get_net_mat() which can be run without my count matrix
collectri = dc.get_collectri(organism='mouse', split_complexes=False)
collectri
source | target | weight | pmid |
---|---|---|---|
Myc | Tert | 1 | 10022128;10491298;10606235;10637317;10723141;1... |
AP1 | Jun | 1 | 10022869;10037172;10208431;10366004;11281649;1... |
AP1 | Jun | 1 | 10022869;10037172;10208431;10366004;11281649;1... |
AP1 | Jun | 1 | 10022869;10037172;10208431;10366004;11281649;1... |
AP1 | Jun | 1 | 10022869;10037172;10208431;10366004;11281649;1... |
... | ... | ... | ... |
Runx1 | Lcp2 | 1 | |
Runx1 | Prr5l | 1 | |
Twist1 | Gli1 | 1 | |
Usf1 | Nup188 | 1 | 22951020 |
Zfp148 | Rnls | 1 | 25295465 |
58549 rows × 4 columns
because the collecti has duplicated source/target lines, I removed these
collectri = collectri.drop_duplicates(subset=['source','target'], keep = False)
collectri = collectri.dropna()
collectri = collectri.drop('pmid', axis=1)
collectri = collectri.reset_index(drop=True)
collectri
source | target | weight |
---|---|---|
Myc | Tert | 1 |
Smad3 | Jun | 1 |
Smad4 | Jun | 1 |
Stat5a | Il2 | 1 |
Stat5b | Il2 | 1 |
... | ... | ... |
Gata2 | Psd4 | 1 |
Gata2 | Tnfaip8l1 | 1 |
Max | Serf2 | 1 |
Usf1 | Nup188 | 1 |
Zfp148 | Rnls | 1 |
33717 rows × 3 columns
dc.get_net_mat(collectri)
TypeError Traceback (most recent call last)
Cell In[102], line 1
----> 1 dc.get_net_mat(collectri)
File ~/Documents/Projects/Bates/py_decoupler/.venv/lib/python3.9/site-packages/decoupler/pre.py:258, in get_net_mat(net)
255 targets = X.index.values
256 X = X.values
--> 258 return sources.astype('U'), targets.astype('U'), X.astype(np.float32)
TypeError: float() argument must be a string or a number, not 'NAType'
Expected behavior
A clear and concise description of what you expected to happen.
System
- OS: [e.g. macOS M1 Sequoia]
- Python version [3.9.1]
- scanpy version [1.10.3]
- decoupler version [1.9.0]
- numpy version [2.0.2]
- pandas version [2.2.3]
Additional context
It seemed appropriate to open a new issue rather than post this to the previous issue since to me they seem unrelated. I previously posted my issue with get_pseudobulk which was solved with installing decoupler version 1.9.0. I was unable to install version 1.9.2 as you had suggested. If the solution to the run_ulm error is to install 1.9.2, I need some help doing so.
python3 -m pip install 'decoupler==1.9.2'
ERROR: Could not find a version that satisfies the requirement decoupler==1.9.2 (from versions: 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0)
ERROR: No matching distribution found for decoupler==1.9.2