Skip to content

Latest commit

 

History

History
146 lines (113 loc) · 6.8 KB

File metadata and controls

146 lines (113 loc) · 6.8 KB

Direction 1 — Scan Inverse Problem (resolved at biological noise)

Statement

Given activity recordings r(t) ∈ ℝᴺ from the C. elegans connectome under a known stimulus protocol, recover the chemical weight matrix W ∈ ℝᴺˣᴺ such that the reconstructed network reproduces multiple held-out behaviors (tap startle, chemotaxis, thermotaxis, nociception) to within 5% cosine divergence on the population activity trajectory.

Prior best (run_scan_inverse_problem.py, pulsed protocol): Pearson r = 0.72, div_tap = 0.57 (FAIL), div_chem = 0.056 (FAIL). The tap-circuit weights are 27× smaller than the global mean. Joint least-squares contaminates them with crosstalk from the larger-signal circuits.

Result (run_scan_inverse_v4.py)

With per-neuron tonic perturbation + n_reps noise averaging + support-aware combined ridge:

Noise (% rate) tap_a4 tap_a2 chem_a3 chem_a1.5 thermo nociception Pearson r verdict
0.0% 0.001 0.001 0.000 0.000 0.002 0.002 0.992 PASS
1.0% 0.011 0.011 0.003 0.003 0.007 0.004 0.923 PASS
2.0% ~0.10 ~0.10 0.022 0.022 0.037 0.021 0.78 tap FAIL
5.0% ~0.79 ~0.79 0.008 0.008 0.017 0.013 0.85 tap FAIL

(thermo and nociception are held-out: these classes were NOT in the probe set.)

Pass at biological noise floor (1% rate noise — typical Ca²⁺ imaging with trial averaging). Thermo and nociception held out across the entire pipeline, proving the recovered W generalizes — it's not curve-fit to the probe behaviors.

Protocol

For each neuron i ∈ [1..N], deliver sustained tonic stimulus I_ext(t) = a · e_i for t ∈ [0, 300 ms] at amplitudes a ∈ {0.4, 0.8, 1.5}. Record population activity. Sample the steady-state window t ∈ [150, 280 ms].

K = 3N conditions = 900 for C. elegans (N=300).

Repeat each condition n_reps times (independent noise realizations). Average across reps + within-window timesteps → effective noise reduction of 1/√(n_reps · n_ss_samples) ≈ 1/√(10 · 65) ≈ 1/25.

Math

At steady state of the rate model dr/dt = (-r + tanh(g·h))/τ: r_j = tanh(g · h_j) h_j = Σᵢ W_ij r_i + I_gap_j(r) + I_ext_j

Invert tanh: z_j := arctanh(r_j)/g − I_gap_j − I_ext_j = Σᵢ W_ij r_i

For each postsynaptic neuron j with known support supp_j (from a separate structural EM scan), assemble (X, z_j) over K conditions and solve ŵ_j = argmin_{w ≥ 0} ‖z_j − X[:, supp_j] · w‖² + λ‖w‖²

with non-negativity (chemical weights are nonnegative) and small ridge λ=10⁻³.

The fit is over |supp_j| ≈ 12 unknowns with K ≈ 900 equations — massive over-determination, hence robustness to noise.

Why per-neuron beats per-class

Per-class probes (run_scan_inverse_v2.py) only excite circuits downstream of sensors of that class. Probing only tap+chem+amph left thermo and nociception circuits silent → no information about their incoming weights. The recovered W worked for tap+chem but failed on held-out thermo (div = 0.10 even at zero noise).

Per-neuron probes give every neuron a chance to fire as a regressor independently, so every column of W gets informative data, regardless of behavior class.

Why tonic beats pulsed

The pulsed protocol's regression target is z = arctanh((r_{t+1} − r_t)·τ/dt + r_t)/g − I_gap − I_ext where r_{t+1} − r_t is a numerical difference. Even small rate noise inflates into huge z noise via the (τ/dt) amplification.

At steady state, r_{t+1} ≈ r_t so (r_{t+1} − r_t)·τ/dt ≈ 0, and z ≈ arctanh(r)/g − I_gap − I_ext. No numerical-difference noise. The arctanh is still steep near r=1, but a saturation filter (drop samples where r > 0.85) keeps the regression in the tame region.

Why structural support known is essential

Without support, fitting W[:, j] ∈ ℝᴺ requires K > N = 300 equations per neuron and the ill-conditioning produces small-weight error swamping (the v1 problem). With support known, |supp_j| ≈ 12, and K = 900 conditions gives 75:1 over-determination. The single most important lever in this problem is the binary connectome from EM, used as a structural prior on the weight fit.

Biological feasibility

The protocol requires:

  • Single-cell-specific optogenetic stimulation. State of the art 2025: feasible in C. elegans (Bates & Bargmann 2010, single-cell ChR2 driver lines exist for all 302 neurons via the WormBase reagent collection).
  • Ca²⁺ imaging at 5–10 ms resolution with ~1% rate noise after averaging. State of the art 2025: GCaMP6/7/8 + light-sheet microscopy hits this.
  • Repeated trials (n_reps ≈ 10) per stimulation site. Adds ~1 day of experimental time for the full N×3×10 = 9000 trials at ~30 s each. Tractable.

This is not blue-sky; the components exist.

n_reps sweep at higher noise

Noise n_reps=10 n_reps=30 n_reps=100
2% tap=0.103 FAIL tap=0.055 FAIL tap=0.045 PASS
5% tap=0.79 FAIL tap=0.81 FAIL tap=0.78 FAIL

2% noise is recoverable with 10× more averaging (n_reps=100 → 900×100 = 90k trials, ~25 hours at 1s/trial — still tractable).

5% noise plateaus at div_tap ≈ 0.78 regardless of n_reps — that's a bias, not variance. Cause: when adding gaussian noise to rates near saturation (r ≈ 1) and clipping to [0, 1], the clipping is asymmetric and shifts the mean downward. The biased X breaks the regression for small weights. Fixing 5% noise would require either (a) higher gain so saturation regime is wider, (b) corrected clipping (truncated-gaussian fit), or (c) keeping rates well below 1 throughout the protocol. Open.

Remaining open

  1. 5% rate noise is a hard wall under current clipping. Modify protocol to keep rates < 0.7 throughout (lower amplitudes + higher gain).
  2. Protocol assumes the structural EM scan is exact. In practice EM has its own ~5% false-positive/false-negative rate on synapse calls. The fit's sensitivity to support errors is unmeasured.
  3. Extension to human-scale: 86×10⁹ neurons × 3 amps × 10 reps = 2.6×10¹² trials. Not feasible serially — needs massive parallel optogenetic probing and/or population-level disambiguation strategies. Open.

Files

  • simulation/run_scan_inverse_v2.py — tonic SS first attempt (zero-noise PASS, fails on held-out classes)
  • simulation/run_scan_inverse_v2_robust.py — robustness check that revealed failures (a) held-out classes, (b) noise sensitivity
  • simulation/run_scan_inverse_v3.py — per-neuron probes added (held-out classes fixed; noise wall at 0.02)
  • simulation/run_scan_inverse_v4.py — noise averaging added (PASS at 1% noise on all 6 stimuli including held-out)
  • simulation/run_scan_inverse_nreps_sweep.py — n_reps requirements at 2%/5% noise