feat: replace DuckDB with in-memory AnnDataStore; faithful h5ad round-trip export#39
Open
patcon wants to merge 6 commits into
Open
feat: replace DuckDB with in-memory AnnDataStore; faithful h5ad round-trip export#39patcon wants to merge 6 commits into
patcon wants to merge 6 commits into
Conversation
All import modes (h5ad, Kedro, local files) now normalize into a singleton AnnDataStore at load time. Vote queries become synchronous typed-array operations over a dense Float32Array vote matrix. Adds: - src/lib/anndata-store.ts: central store with vote matrix + h5ad export - src/lib/parquet-reader.ts: hyparquet-based Parquet loader for Kedro/local votes - "Download h5ad" button (merges painted groups into obs/manual_painted) - hyparquet + hyparquet-compressors dependencies Removes DuckDB from the vote-query hot path; reddwarf-ts/db.ts and calculateRepresentativeStatements are now synchronous pure functions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Deletes duckdb.ts, the @duckdb/duckdb-wasm npm dep, the 100 MB public/duckdb/ WASM assets, and the now-unused kedroBaseUrl/pipelineId props in MapOverlay and StatementExplorerDrawer. All vote queries are now served by AnnDataStore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…round-trip AnnDataStore now stores rawH5adBytes from the original import. toH5adBytes() uses them as the source: it opens the raw file, copies all var columns (not just content/moderation_state), all uns fields, full-dimensional obsm embeddings (PCA etc.), and HDF5 group attributes to a new output file — then patches in the current manual_painted state and any new user-computed obsm projections. Previously the download silently dropped extra var columns, uns metadata, and high-dimensional obsm — only the tiny subset AnnDataStore explicitly tracked was written back. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
X, varm, varp, obsp, and any other top-level groups not explicitly handled by the app are now copied from the source file. Previously only obs/var/obsm/layers/uns were written. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…export Two bugs in the round-trip copy: - copyGroupContents was calling create_dataset without shape, so any 2D dataset (notably X) was written as a flat 1D array. Now passes child.metadata.shape so dimensions are preserved. - obs and var index datasets were always named '_index' regardless of the original column name (e.g. 'voter-id', 'comment-id'). Now reads the source group's _index attribute to use the original name. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…suffix toH5adBytes now accepts an extraObsm param for projections that live only in React state (not yet in AnnDataStore). handleDownloadH5ad passes recomputedProjections so any in-browser DruidJS run shows up as X_<algo>_recomputed in the downloaded file's obsm group. Also renames the suffix separator from '-' to '_' so the h5wasm key is valid as an obsm name (X_umap_recomputed rather than X_umap-recomputed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Float32Arrayvote matrix (AnnDataStore)AnnDataStoresingleton, so the rest of the app is mode-agnosticX,varm,varp,obsp, allvarcolumns, allunsfields, full-dimensionalobsmembeddings (e.g. PCA), and HDF5 group attributesX_umap_recomputed) are written intoobsmon downloadcalculateRepresentativeStatementsandcalculateStatementVoteStatsare now synchronous — no async DuckDB connection neededKey files
src/lib/anndata-store.tstoH5adBytes()with raw-bytes round-tripsrc/lib/parquet-reader.tsread_parquetsrc/lib/h5ad-loader.tsrawBytesfor lossless exportsrc/lib/duckdb.tspublic/duckdb/packages/reddwarf-ts/src/db.tspackages/reddwarf-ts/src/representative-statements.tsTest plan
X,varm,varp,obsp, all originalvarcolumns, andobsindex uses the original column name (e.g.voter-id)obs/manual_paintedreflects current paintingobsm/X_<algo>_recomputedis presentpublic/duckdb/directory in build output🤖 Generated with Claude Code (code and ~200 words of PR description from ~120 words of human prompts across this session)