Sign disagreement between published QTL feathers and Supp. Table 4 for caqtl_microglia and dsqtl_yoruba
Hi! While reproducing the QTL coefficient evaluation against your paper's Suppl. Table 4 numbers, I ran into an inconsistency that I think is most likely a feather export bug, but it could also be that there's a documented sign-correction step I'm missing. Wanted to flag it either way.
Summary
For 3 of the 5 caQTL/dsQTL coefficient datasets you publish at gs://alphagenome/evals/, the signed pearsonr(prediction, target) computed directly on the feather matches the paper's Suppl. Table 4 value. For 2 datasets (caqtl_microglia and dsqtl_yoruba), the magnitude matches but the sign is opposite.
| Dataset |
Direct from feather |
Suppl. Table 4 |
| caqtl_african |
+0.7367 |
+0.7368 ✅ |
| caqtl_european |
+0.5914 |
+0.5916 ✅ |
| caqtl_smc |
+0.6870 |
+0.6870 ✅ |
| caqtl_microglia |
−0.6354 |
+0.6357 ❌ |
| dsqtl_yoruba |
−0.8323 |
+0.8323 ❌ |
Reproducer (no install needed, runs on any machine with pandas + scipy)
import os
import pandas as pd
from scipy.stats import pearsonr
from urllib.request import urlretrieve
BASE = "https://storage.googleapis.com/alphagenome/evals/"
cases = [
("caqtl_african", "caqtl_african_variant_coefficient_human_predictions"),
("caqtl_european", "caqtl_european_variant_coefficient_human_predictions"),
("caqtl_smc", "caqtl_smc_variant_coefficient_human_predictions"),
("caqtl_microglia", "caqtl_microglia_variant_coefficient_human_predictions"),
("dsqtl_yoruba", "dsqtl_yoruba_variant_coefficient_human_predictions"),
]
for label, name in cases:
local = f"/tmp/{name}.feather"
if not os.path.exists(local):
urlretrieve(BASE + name + ".feather", local)
df = pd.read_feather(local)
r, _ = pearsonr(df["prediction"], df["target"])
print(f"{label:<20} pearsonr(prediction, target) = {r:+.4f}")
Output:
caqtl_african pearsonr(prediction, target) = +0.7367
caqtl_european pearsonr(prediction, target) = +0.5914
caqtl_smc pearsonr(prediction, target) = +0.6870
caqtl_microglia pearsonr(prediction, target) = -0.6354
dsqtl_yoruba pearsonr(prediction, target) = -0.8323
What I checked
- The paper distinguishes signed vs unsigned Pearson explicitly (e.g. caQTL Fig. 5d: "Signed Pearson r = 0.74; unsigned Pearson r = 0.45"), so Suppl. Table 4's
pearsonr column is clearly the signed version — the magnitudes match perfectly across all 5 datasets, only the sign disagrees on 2.
- I separately verified by running
model.predict_variant from this SDK on a few caqtl_microglia variants — the values reproduce the feather's prediction column to within bf16 numerics (r ≈ 0.999). So the prediction column is faithful to model output; the apparent disagreement is between target and the paper number.
- The other 3 datasets process through the same loading code I'm using (
target → effect_size, prediction → score) and they line up with the paper exactly. So this isn't something on my end being applied inconsistently.
Question
Is there a per-dataset sign-correction step in your table-generation pipeline that the published feathers don't reflect (e.g. a polarity flip based on which allele was assigned REF in the upstream QTL study for those two)? Or is the target column for caqtl_microglia and dsqtl_yoruba simply exported with the wrong sign?
Either is fine — either the feathers should be re-exported, or the convention should be documented somewhere users can find it. Right now anyone naively running pearsonr(prediction, target) on the published artifacts gets the opposite sign of the paper for those two datasets.
Happy to send a PR with whatever fix you prefer (re-sign the feather column, or add a documentation note + a small apply_sign_convention() utility).
Why this matters downstream
Anyone benchmarking against your numbers using gs://alphagenome/evals/ directly hits this — for instance our radical-eval pipeline cross-published baselines for all 5 datasets and the microglia/dsqtl_yoruba ones came out with the wrong sign, traceable entirely to this.
Thanks!
Sign disagreement between published QTL feathers and Supp. Table 4 for caqtl_microglia and dsqtl_yoruba
Hi! While reproducing the QTL coefficient evaluation against your paper's Suppl. Table 4 numbers, I ran into an inconsistency that I think is most likely a feather export bug, but it could also be that there's a documented sign-correction step I'm missing. Wanted to flag it either way.
Summary
For 3 of the 5 caQTL/dsQTL coefficient datasets you publish at
gs://alphagenome/evals/, the signedpearsonr(prediction, target)computed directly on the feather matches the paper's Suppl. Table 4 value. For 2 datasets (caqtl_microgliaanddsqtl_yoruba), the magnitude matches but the sign is opposite.Reproducer (no install needed, runs on any machine with pandas + scipy)
Output:
What I checked
pearsonrcolumn is clearly the signed version — the magnitudes match perfectly across all 5 datasets, only the sign disagrees on 2.model.predict_variantfrom this SDK on a fewcaqtl_microgliavariants — the values reproduce the feather'spredictioncolumn to within bf16 numerics (r ≈ 0.999). So thepredictioncolumn is faithful to model output; the apparent disagreement is betweentargetand the paper number.target→ effect_size,prediction→ score) and they line up with the paper exactly. So this isn't something on my end being applied inconsistently.Question
Is there a per-dataset sign-correction step in your table-generation pipeline that the published feathers don't reflect (e.g. a polarity flip based on which allele was assigned
REFin the upstream QTL study for those two)? Or is thetargetcolumn forcaqtl_microgliaanddsqtl_yorubasimply exported with the wrong sign?Either is fine — either the feathers should be re-exported, or the convention should be documented somewhere users can find it. Right now anyone naively running
pearsonr(prediction, target)on the published artifacts gets the opposite sign of the paper for those two datasets.Happy to send a PR with whatever fix you prefer (re-sign the feather column, or add a documentation note + a small
apply_sign_convention()utility).Why this matters downstream
Anyone benchmarking against your numbers using
gs://alphagenome/evals/directly hits this — for instance our radical-eval pipeline cross-published baselines for all 5 datasets and the microglia/dsqtl_yoruba ones came out with the wrong sign, traceable entirely to this.Thanks!