Skip to content

Commit 8adbda4

Browse files
authored
Merge pull request #38 from patcon/feat/in-browser-dimensional-reduction
feat: in-browser dimensional reduction with KNN backend param controls
2 parents 0ce3e93 + b26a987 commit 8adbda4

20 files changed

Lines changed: 1304 additions & 403 deletions

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44

55
### Added
66

7+
- Annoy KNN backend now exposes its parameters (`numTrees`, `maxPointsPerLeaf`, `seed`) in the recompute dialog, matching the HNSW pattern. Switching backends shows that backend's params immediately; values are forwarded as `knn_params` to PaCMAP and LocalMAP. HNSW params expanded to include `m` and `seed`.
8+
- HNSW `ef` and `ef_construction` inputs in the recompute dialog's Advanced section, shown only when the HNSW KNN backend is selected. Values are forwarded as `knn_params` to PaCMAP and LocalMAP, overriding the library defaults (`ef=50`, `ef_construction=200`).
9+
- In-browser dimensional reduction via DruidJS. After importing an `.h5ad` file, a "Recompute" button in the projection selector opens a dialog to pick a dense `layers/` matrix as the vote matrix, choose an algorithm (UMAP, PaCMAP, or LocalMAP) and its parameters, and run the reduction in a web worker. The result is added as a new selectable projection and auto-selected. Empty cells in the chosen layer are filled with the column mean before reduction (mean imputation). A progress bar shows iteration progress (0–100 %) while the reduction runs, preceded by a "Building KNN graph…" phase indicator.
710
- `includeAvatars` prop/toggle for `RoutingExperiment` navigation mode to show DiceBear adventurer-neutral avatars (circular crop, radius 90% of pin head) in each pin head, keyed by point ID for stable identity. Toggle appears in the Controls sheet under Waypoint Distribution when navigation mode is active.
811
- `waypointDensity` prop (0–1 slider, default 1.0 = all) for `RoutingExperiment` to sample intermediate waypoints evenly along the path; inactive waypoints remain visible as white dots while active ones stay orange.
912
- `NavigationMode` story for `RoutingExperiment` with Google Maps-style 3D navigation: right-drag to tilt/orbit (heading + pitch), scroll to zoom, left-drag to pan, double-click to reset view. Adds `navigationMode` prop to the component.
@@ -23,6 +26,9 @@
2326

2427
### Changed
2528

29+
- Removed `RepresentativeStatementsManager` class from `reddwarf-ts` and its app-layer wrapper; the class was never instantiated (App.tsx owns `isCalculatingRepStatements` state directly via `useState`).
30+
- Split `reddwarf-ts` stats module: pure statistical functions stay in `stats.ts`; DB-layer types (`VoteConnection`, `VoteQueryResult`) and `getGroupVoteMatrices` move to new `db.ts`. Unified `AnalysisOptions` (now includes `commentTextMap`) replaces two divergent options shapes. Collapsed `analyzeLabeledGroups` (was in `stats.ts`) and `calculateRepresentativeStatements` (was a thin wrapper in `representative-statements.ts`) into a single function in `representative-statements.ts`; app adapter updated to fold `commentTextMap` into options.
31+
- Extracted dimensional reduction logic (types, config, and a pure `runReducer()` generator) into `reddwarf-ts`. `src/lib/druid-reducer.ts` becomes a named re-export shim; `src/lib/druid-reducer.worker.ts` becomes a thin message-protocol shell. `runReducer()` yields `ReducerResponse` events and is usable outside the browser (e.g. in Node.js scripts) without the web worker infrastructure. `@saehrimnir/druidjs` added as a runtime dependency of `reddwarf-ts`.
2632
- Extracted core statistical functions (`stats.ts`, `representative-statements.ts`) into a standalone `reddwarf-ts` workspace package under `packages/`. The app now imports these from the package via a pnpm workspace link. `src/lib/stats.ts` becomes a re-export shim; `src/lib/representative-statements.ts` becomes a thin DuckDB adapter. No changes to the app's public API.
2733
- Renamed `analyzePaintedClusters``analyzeLabeledGroups` in the package (neutral terminology; the app adapter preserves backward compat internally).
2834

package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
"dependencies": {
2525
"reddwarf-ts": "workspace:*",
2626
"@duckdb/duckdb-wasm": "^1.30.0",
27+
"@saehrimnir/druidjs": "github:patcon/DruidJS#add-pacmap-support",
2728
"@radix-ui/react-accordion": "^1.2.12",
2829
"@radix-ui/react-alert-dialog": "^1.1.15",
2930
"@radix-ui/react-dialog": "^1.1.15",

packages/reddwarf-ts/CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,15 @@
22

33
## [Unreleased]
44

5+
### Added
6+
7+
- `imputeColumnMeans(matrix)` — in-place mean imputation for NaN cells per column, extracted from the main app.
8+
- `zeroMaskedColumns(matrix, mask)` — zeroes out columns in-place at indices where `mask[j]` is true.
9+
10+
### Changed
11+
12+
- `runReducer` now spreads params directly into DruidJS constructors using the library's own `ParametersUMAP`/`ParametersPaCMAP`/`ParametersLocalMAP` types, replacing manual field-by-field enumeration. The UMAP `spread` param key was renamed to `_spread` to match the DruidJS type.
13+
514
## [0.1.0] - 2026-05-16
615

716
### Added

packages/reddwarf-ts/package.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@
1010
"scripts": {
1111
"typecheck": "tsc --noEmit"
1212
},
13+
"dependencies": {
14+
"@saehrimnir/druidjs": "github:patcon/DruidJS#add-pacmap-support"
15+
},
1316
"devDependencies": {
1417
"typescript": "~5.8.3"
1518
}

packages/reddwarf-ts/src/db.ts

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
import type { GroupVoteMatrix } from './stats.js';
2+
3+
/**
4+
* Minimal structural interface for a DuckDB-WASM query result.
5+
* Any object satisfying this shape can be passed to the DB-dependent functions.
6+
*/
7+
export interface VoteQueryResult {
8+
numRows: number;
9+
getChild(name: string): { get(i: number): unknown } | null | undefined;
10+
}
11+
12+
export interface VoteConnection {
13+
query(sql: string): Promise<VoteQueryResult>;
14+
}
15+
16+
/**
17+
* Query votes for each label group from a DuckDB-compatible connection.
18+
* The caller is responsible for ensuring the votes table is loaded before calling this.
19+
*/
20+
export async function getGroupVoteMatrices(
21+
conn: VoteConnection,
22+
labelArray: (string | null)[],
23+
participants?: string[],
24+
): Promise<Record<string, GroupVoteMatrix>> {
25+
const groups: Record<string, string[]> = {};
26+
labelArray.forEach((label, index) => {
27+
if (label != null) {
28+
const pid = participants?.[index];
29+
if (pid !== undefined) {
30+
if (!groups[label]) groups[label] = [];
31+
groups[label].push(pid);
32+
}
33+
}
34+
});
35+
36+
const groupVotes: Record<string, GroupVoteMatrix> = {};
37+
for (const [label, indices] of Object.entries(groups)) {
38+
const quotedIndices = indices.map((pid) => `'${pid}'`);
39+
const result = await conn.query(`
40+
SELECT participant_id, comment_id, vote
41+
FROM votes
42+
WHERE participant_id IN(${quotedIndices.join(",")})
43+
`);
44+
45+
const voteMatrix: GroupVoteMatrix = {};
46+
for (let i = 0; i < result.numRows; i++) {
47+
const pid = result.getChild('participant_id')?.get(i)?.toString();
48+
const cid = result.getChild('comment_id')?.get(i)?.toString();
49+
const rawVote = result.getChild('vote')?.get(i);
50+
51+
const vote = typeof rawVote === 'bigint' ? Number(rawVote) : rawVote as number;
52+
53+
if (pid && cid && vote !== undefined) {
54+
if (!voteMatrix[pid]) voteMatrix[pid] = {};
55+
voteMatrix[pid][cid] = vote;
56+
}
57+
}
58+
59+
groupVotes[label] = voteMatrix;
60+
}
61+
62+
return groupVotes;
63+
}
Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
import { UMAP, PaCMAP, LocalMAP, type ParametersUMAP, type ParametersPaCMAP, type ParametersLocalMAP, type ParametersAnnoy, type ParametersHNSW } from "@saehrimnir/druidjs";
2+
3+
export type ReducerAlgorithm = "umap" | "pacmap" | "localmap";
4+
5+
export const REDUCER_LABELS: Record<ReducerAlgorithm, string> = {
6+
umap: "UMAP",
7+
pacmap: "PaCMAP",
8+
localmap: "LocalMAP",
9+
};
10+
11+
export type ParamDef = {
12+
label: string;
13+
min: number;
14+
max: number;
15+
step: number;
16+
default: number;
17+
};
18+
19+
export const REDUCER_PARAM_DEFS: Record<ReducerAlgorithm, Record<string, ParamDef>> = {
20+
umap: {
21+
n_neighbors: { label: "Neighbors", min: 2, max: 200, step: 1, default: 15 },
22+
min_dist: { label: "Min dist", min: 0, max: 1, step: 0.01, default: 0.1 },
23+
_spread: { label: "Spread", min: 0.1, max: 10, step: 0.1, default: 1.0 },
24+
},
25+
pacmap: {
26+
n_neighbors: { label: "Neighbors", min: 2, max: 200, step: 1, default: 10 },
27+
MN_ratio: { label: "MN ratio", min: 0.1, max: 5, step: 0.1, default: 0.5 },
28+
FP_ratio: { label: "FP ratio", min: 0.5, max: 10, step: 0.5, default: 2.0 },
29+
},
30+
localmap: {
31+
n_neighbors: { label: "Neighbors", min: 2, max: 200, step: 1, default: 10 },
32+
MN_ratio: { label: "MN ratio", min: 0.1, max: 5, step: 0.1, default: 0.5 },
33+
FP_ratio: { label: "FP ratio", min: 0.5, max: 10, step: 0.5, default: 2.0 },
34+
low_dist_thres: { label: "Low dist thresh", min: 1, max: 50, step: 1, default: 10 },
35+
},
36+
};
37+
38+
export const REDUCER_ADVANCED_PARAM_DEFS: Record<ReducerAlgorithm, Record<string, ParamDef>> = {
39+
umap: {
40+
_n_epochs: { label: "Epochs", min: 50, max: 2000, step: 10, default: 350 },
41+
seed: { label: "Seed", min: 0, max: 99999, step: 1, default: 1212 },
42+
local_connectivity: { label: "Local connectivity", min: 1, max: 20, step: 1, default: 1 },
43+
_initial_alpha: { label: "Initial LR", min: 0.01, max: 5, step: 0.01, default: 1 },
44+
_repulsion_strength: { label: "Repulsion strength", min: 0, max: 5, step: 0.1, default: 1 },
45+
_negative_sample_rate: { label: "Neg. sample rate", min: 1, max: 20, step: 1, default: 5 },
46+
_set_op_mix_ratio: { label: "Set-op mix ratio", min: 0, max: 1, step: 0.01, default: 1 },
47+
},
48+
pacmap: {
49+
seed: { label: "Seed", min: 0, max: 99999, step: 1, default: 1212 },
50+
lr: { label: "Learning rate", min: 0.001, max: 10, step: 0.001, default: 1.0 },
51+
},
52+
localmap: {
53+
seed: { label: "Seed", min: 0, max: 99999, step: 1, default: 1212 },
54+
lr: { label: "Learning rate", min: 0.001, max: 10, step: 0.001, default: 1.0 },
55+
},
56+
};
57+
58+
export const KNN_BACKEND_ALGORITHMS: ReducerAlgorithm[] = ["pacmap", "localmap"];
59+
export type KnnBackend = "annoy" | "hnsw";
60+
export const KNN_BACKENDS: { value: KnnBackend; label: string }[] = [
61+
{ value: "annoy", label: "Annoy" },
62+
{ value: "hnsw", label: "HNSW (broken?)" },
63+
];
64+
65+
export function defaultParamsFor(algorithm: ReducerAlgorithm): Record<string, number> {
66+
return Object.fromEntries(
67+
Object.entries(REDUCER_PARAM_DEFS[algorithm]).map(([key, def]) => [key, def.default])
68+
);
69+
}
70+
71+
export function defaultAdvancedParamsFor(algorithm: ReducerAlgorithm): Record<string, number> {
72+
return Object.fromEntries(
73+
Object.entries(REDUCER_ADVANCED_PARAM_DEFS[algorithm]).map(([key, def]) => [key, def.default])
74+
);
75+
}
76+
77+
export const KNN_PARAM_DEFS: Record<KnnBackend, Record<string, ParamDef>> = {
78+
annoy: {
79+
numTrees: { label: "Num trees", min: 1, max: 200, step: 1, default: 10 },
80+
maxPointsPerLeaf: { label: "Max pts/leaf", min: 1, max: 200, step: 1, default: 10 },
81+
seed: { label: "Seed", min: 0, max: 99999, step: 1, default: 1212 },
82+
},
83+
// Defaults match the voyager (Spotify) HNSW library used by pacmap-python:
84+
// https://github.com/spotify/voyager/blob/main/cpp/src/TypedIndex.h#L127
85+
// https://spotify.github.io/voyager/python/reference.html#voyager.Index
86+
hnsw: {
87+
ef: { label: "ef (search)", min: 10, max: 1000, step: 10, default: 10 },
88+
ef_construction: { label: "ef_construct", min: 10, max: 2000, step: 10, default: 200 },
89+
m: { label: "m", min: 2, max: 100, step: 1, default: 12 },
90+
seed: { label: "Seed", min: 0, max: 99999, step: 1, default: 1212 },
91+
},
92+
};
93+
94+
export function defaultKnnParamsFor(backend: KnnBackend): Record<string, number> {
95+
return Object.fromEntries(
96+
Object.entries(KNN_PARAM_DEFS[backend]).map(([key, def]) => [key, def.default])
97+
);
98+
}
99+
100+
/** @deprecated Use KNN_PARAM_DEFS["hnsw"] */
101+
export const HNSW_PARAM_DEFS = KNN_PARAM_DEFS["hnsw"];
102+
/** @deprecated Use defaultKnnParamsFor("hnsw") */
103+
export function defaultHnswParams(): Record<string, number> { return defaultKnnParamsFor("hnsw"); }
104+
105+
export type { ParametersAnnoy, ParametersHNSW };
106+
107+
export type ReducerRequest = {
108+
type: "reduce";
109+
matrix: number[][];
110+
algorithm: ReducerAlgorithm;
111+
params: Record<string, number>;
112+
knnBackend?: KnnBackend;
113+
knnParams?: Record<string, number>;
114+
};
115+
116+
export type ReducerResponse =
117+
| { type: "done"; coords: [number, number][] }
118+
| { type: "progress"; iteration: number; total: number }
119+
| { type: "error"; message: string };
120+
121+
export const REDUCER_DEFAULT_ITERATIONS: Record<ReducerAlgorithm, number> = {
122+
umap: 350,
123+
pacmap: 450,
124+
localmap: 450,
125+
};
126+
127+
export const PROGRESS_INTERVAL = 10;
128+
129+
/** Zeroes out columns in-place at indices where mask[j] is true. */
130+
export function zeroMaskedColumns(matrix: number[][], mask: boolean[]): void {
131+
const nObs = matrix.length;
132+
for (let j = 0; j < mask.length; j++) {
133+
if (!mask[j]) continue;
134+
for (let i = 0; i < nObs; i++) matrix[i][j] = 0;
135+
}
136+
}
137+
138+
/** Replaces NaN cells in-place with the column mean of observed values. Falls back to 0 for all-NaN columns. */
139+
export function imputeColumnMeans(matrix: number[][]): void {
140+
const nObs = matrix.length;
141+
const nVars = matrix[0]?.length ?? 0;
142+
for (let j = 0; j < nVars; j++) {
143+
let sum = 0, count = 0;
144+
for (let i = 0; i < nObs; i++) {
145+
if (!isNaN(matrix[i][j])) { sum += matrix[i][j]; count++; }
146+
}
147+
const colMean = count > 0 ? sum / count : 0;
148+
for (let i = 0; i < nObs; i++) {
149+
if (isNaN(matrix[i][j])) matrix[i][j] = colMean;
150+
}
151+
}
152+
}
153+
154+
/** Pure generator — yields progress ticks then a final done event. Usable in a web worker or directly in Node.js. */
155+
export function* runReducer(req: ReducerRequest): Generator<ReducerResponse> {
156+
const { matrix, algorithm, params, knnBackend, knnParams } = req;
157+
const n = matrix.length;
158+
if (n < 3) {
159+
throw new Error(`Need at least 3 rows to run dimensional reduction (got ${n}).`);
160+
}
161+
const nNeighbors = Math.max(2, Math.min(Math.round(params.n_neighbors), n - 1));
162+
const total = params._n_epochs ?? REDUCER_DEFAULT_ITERATIONS[algorithm];
163+
164+
let gen: Generator<unknown, unknown, unknown>;
165+
if (algorithm === "umap") {
166+
const dr = new UMAP(matrix, {
167+
d: 2,
168+
...(params as Partial<ParametersUMAP>),
169+
n_neighbors: nNeighbors,
170+
_n_epochs: total,
171+
});
172+
gen = dr.generator(total);
173+
} else if (algorithm === "localmap") {
174+
const dr = new LocalMAP(matrix, {
175+
d: 2,
176+
...(params as Partial<ParametersLocalMAP>),
177+
n_neighbors: nNeighbors,
178+
// knn_backend/knn_params are not in DruidJS types but accepted at runtime
179+
knn_backend: knnBackend ?? "annoy",
180+
knn_params: (knnParams ?? defaultKnnParamsFor(knnBackend ?? "annoy")) as Partial<ParametersAnnoy> | Partial<ParametersHNSW>,
181+
} as Partial<ParametersLocalMAP>);
182+
gen = dr.generator();
183+
} else {
184+
const dr = new PaCMAP(matrix, {
185+
d: 2,
186+
...(params as Partial<ParametersPaCMAP>),
187+
n_neighbors: nNeighbors,
188+
// knn_backend/knn_params are not in DruidJS types but accepted at runtime
189+
knn_backend: knnBackend ?? "annoy",
190+
knn_params: (knnParams ?? defaultKnnParamsFor(knnBackend ?? "annoy")) as Partial<ParametersAnnoy> | Partial<ParametersHNSW>,
191+
} as Partial<ParametersPaCMAP>);
192+
gen = dr.generator();
193+
}
194+
195+
let iteration = 0;
196+
let lastProjection: number[][] = [];
197+
for (const projection of gen) {
198+
iteration++;
199+
lastProjection = projection as number[][];
200+
if (iteration % PROGRESS_INTERVAL === 0) {
201+
yield { type: "progress", iteration, total };
202+
}
203+
}
204+
205+
const coords = lastProjection.map((row) => [row[0], row[1]] as [number, number]);
206+
yield { type: "done", coords };
207+
}

packages/reddwarf-ts/src/index.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
export * from './stats.js';
2+
export * from './db.js';
23
export * from './representative-statements.js';
4+
export * from './druid-reducer.js';

0 commit comments

Comments
 (0)