After landscape architect Fredrick Law Olmsted
Olmsted is an open-source tool for visualizing and exploring B cell lineages.
Most users should simply visit olmstedviz.org — no installation required. You can upload and explore your data directly in the browser.
- Getting Started
- Preparing Your Data
- Using the Visualization
- Local Deployment (for developers)
- Miscellany
In the human immune system, affinity maturation of B cell receptor sequences coding for immunoglobulins (i.e. antibodies) begins with a diverse pool of randomly generated naive sequences and leads to a collection of evolutionary histories. It is now common to apply high-throughput DNA sequencing to the B cell repertoire and then reconstruct these evolutionary histories using specialized algorithms. However, researchers often lack the tools to explore these reconstructions in the detail necessary to, for example, choose sequences for further functional, structural, or biochemical studies. We aim to address this need with Olmsted: a browser-based application for visually exploring B cell repertoires and clonal family tree data. Olmsted allows the user to scan across collections of clonal families at a high level using summary statistics, and then hone in on individual families to visualize phylogenies and mutations. This will enable lab-based researchers to more quickly and intuitively identify lineages of interest among vast B cell sequencing datasets, and move forward with in-depth analyses and testing of individual antibodies.
Go to olmstedviz.org in your browser.
Use olmsted-cli to convert your data into Olmsted format (see Preparing Your Data below).
Once on the website:
- Upload data: Drag and drop your processed JSON file onto the upload area, or use the file browser
- Select datasets: Check the datasets you want to visualize in the table
- Start exploring: Click "Explore!" to begin
Note: You must select at least one dataset before clicking "Explore!" or you'll see an empty display. You can also select datasets from the table on the visualization page.
Use olmsted-cli to convert your data into the format required by Olmsted.
# Create a conda environment and install
conda create -n olmsted python=3.9
conda activate olmsted
git clone https://github.com/matsengrp/olmsted-cli.git
cd olmsted-cli
pip install .olmsted process [OPTIONS]Use olmsted process -h for all options.
Common options:
--input, -i: Input file path (PCP CSV or AIRR JSON)--output, -o: Output file path (defaults toolmsted_output.json)--format, -f: Input format (airr,pcp) — usually auto-detected--name, -n: Dataset name (recommended for navigation in Olmsted)
PCP-specific:
--input-trees, -t: Tree data CSV file path
AIRR-specific:
--naive-name: Name of naive/root node for tree rooting--root-trees, -r: Root trees
Examples:
# PCP format with separate tree file
olmsted process -i data.csv -t trees.csv -o processed.json -n "My Dataset"
# AIRR format
olmsted process -i data.json -o processed.json -n "My Dataset"Olmsted supports two primary input formats:
The PCP (Parent Child Pair) format consists of two CSV files:
-
Parent Child Pair CSV: Contains data for each parent-child pair with columns:
sample_id: Sample identifierfamily: Family identifierparent_name: Parent node nameparent_heavy: Parent heavy chain sequencechild_name: Child node namechild_heavy: Child heavy chain sequencebranch_length: Branch length between parent and childdepth: Depth in tree structuredistance: Distance from rootv_gene_heavy: V gene assignment for heavy chainj_gene_heavy: J gene assignment for heavy chaincdr1_codon_start_heavy: CDR1 start position in heavy chaincdr1_codon_end_heavy: CDR1 end position in heavy chaincdr2_codon_start_heavy: CDR2 start position in heavy chaincdr2_codon_end_heavy: CDR2 end position in heavy chaincdr3_codon_start_heavy: CDR3 start position in heavy chaincdr3_codon_end_heavy: CDR3 end position in heavy chainparent_is_naive: Boolean indicating if parent is naive/rootchild_is_leaf: Boolean indicating if child is leaf node
-
Tree Data CSV: Contains Newick data with columns:
family_name: Family identifier matching the family column abovesample_id: Sample identifiernewick_tree: Newick format phylogenetic tree string
Example PCP data can be found in the olmsted-cli/example_data/ directory, demonstrating the expected structure and column formats.
The AIRR JSON format is described here. A list of tools that output this format can be found here. For a human-readable version of the schema, see olmstedviz.org/schema.html or view schema.html on htmlpreview.github.io.
olmsted-cli provides built-in validation:
# Validation only
olmsted validate --input your_data.json
# Validate during processing
olmsted process --input your_data.json --output processed.json --validateUpon launching Olmsted and navigating in a browser to the appropriate address (or using the example at http://olmstedviz.org), you will find the home page with a table of the available datasets:
Olmsted uses a client-side database to manage your datasets within the browser. This database is managed from the splash page, where you can load new datasets and delete existing ones. You can upload datasets by clicking the "Upload Data" and navigating file explorer, or via the drag-n-drop box below. This will add the dataset to the Available Datasets table. When you check the load icon for the row in the datasets table, it queues the dataset for visualization and adds it to the query string. Click Explore! visually explore selected datasets.
Once you're in the visualization interface, you can change your dataset selection on demand. Simply select the desired datasets in the dataset table on the visualization page, then click the "Update Visualization" button to refresh the view with your new selection. "Manage Datasets" will return you to the splash page.
The Clonal Families section represents each clonal family as a point in a scatterplot:
Choose an immunoglobulin locus to restrict the clonal families in the scatterplot to that locus - the default is immunoglobulin gamma, or igh (where h stands for heavy chain).
In order to visualize all clonal families from all loci in the dataset at once, choose "All" in the locus selector.
By default, the scatterplot maps the number of unique members in a clonal family, unique_seqs_count, to the x-axis, and the average mutation frequency among members of that clonal family, mean_mut_freq, to the y-axis.
However, you may configure both axes as well as the color and shape of the points to map to a range of fields, including sequence sampling time (see below).
For comparison of subsets, you may facet the plot into separated panels according to data values for a range of fields:
Interact with the plot by clicking and dragging across a subset of points or clicking individual points to filter the resulting clonal families in the Selected clonal families table below.
Below the scatterplot, the full collection or selected subset of clonal families appears in a table including a visualization of the recombination event resulting in the naive antibody sequence and a subset of clonal family metadata:
Each row in the table represents one clonal family. The table automatically selects the top clonal family according to the sorting column. Click on the checkbox in the "Select" column in the table to select a clonal family for further visualization. Upon selecting a clonal family from the table, the phylogenetic tree(s) corresponding to that clonal family (as specified in the input JSON) is visualized below the table in the Clonal family details section.
For a selected clonal family, its phylogenetic tree is visualized below the table in the Clonal family details section:
Select among any alternate phylogenies using the Ancestral reconstruction method menu. Note that these ancestral reconstruction methods are according to those specified in the input data according to the phylogenetic inference tool used to produce them. Alongside the tree is an alignment of the sequences at the tree's tips. Colors indicate amino acid mutations at each position that differs from the sequence at the root of the tree (typically the family's inferred naive antibody sequence). Scroll while hovering over the tree to zoom in and out. Click and drag the zoomed view to pan in a traditional map-style interface. The alignment view on the right zooms in the vertical dimension according to the zoom status of the tree. The tree's leaves use pie charts to show the multiplicity (i.e. the number of downsampled and deduplicated sequences) represented by a given sequence, colored according to sampling timepoint. See the schema for more detailed field descriptions.
Note that often in example data the number of sequences in a clonal family has been downsampled to build a tree (see downsampled_count, downsampling_strategy in the schema), which explains why a clonal family might be listed in the table as having a few thousand unique sequences, but upon selecting the clonal family, the corresponding tree visualization only contains 10s or 100s of sequences.
Use the interface below the tree to configure:
- Maximum width of the tree window with respect to the alignment window
- Field mapped to the size of tree leaves (pie charts)
- Maximum size of the tree leaves
- Tree tip labels
- Fields mapped to branch width and color
In order to get more details about a particular lineage in the tree, click on a leaf's label (or circle if the labels are hidden) - the Ancestral Sequences section will appear below the tree.
The Ancestral Sequences section displays an alignment of the selected sequence with its ancestral lineage starting from the naive sequence:
Mutations from the naive sequence are shown as in the Clonal Family Details section.
For developers who want to run Olmsted locally or deploy their own instance.
# Clone the repository
git clone https://github.com/matsengrp/olmsted.git
cd olmsted
git submodule update --init
# Option 1: Use pre-built image from quay.io
docker run -p 8080:3999 quay.io/matsengrp/olmsted:latest
# Option 2: Build locally
docker build -t olmsted:latest .
./bin/olmsted-server.sh olmsted:latest 3999Navigate to localhost:8080 (or localhost:3999 for local build) in your browser.
For reproducibility, use a specific version tag rather than latest.
git clone https://github.com/matsengrp/olmsted.git
cd olmsted
git submodule update --init
npm install --legacy-peer-deps
npm run startThe server's localData mode requires data in split format (separate files for datasets, clones, and trees), as opposed to the consolidated single-file format used for browser uploads. The split format allows the server to serve individual files on demand via the charon API.
Split format structure:
data/
datasets.json # List of available datasets
clones.{dataset_id}.json # Clonal families for each dataset
tree.{tree_id}.json # Individual tree files
To run with example data from olmsted-cli:
# Clone olmsted-cli if you haven't already
git clone https://github.com/matsengrp/olmsted-cli.git ../olmsted-cli
# Copy pre-split example data to data/ (gitignored)
cp ../olmsted-cli/example_data/pcp/split_golden_data/* data/
# Start server with local data
BABEL_ENV=dev ./node_modules/.bin/babel-node server.js dev localData dataThe data/ directory is gitignored, so local data files won't be committed.
Olmsted can be compiled as a single-page app for CDN deployment:
npm run build
# Output is in the `deploy` directory
# Place your data at `deploy/data`Test locally:
cd deploy
python -m http.server 4000For AWS S3 deployment, see bin/deploy.py -h.
We use git tags to tag releases of Olmsted using the semver versioning strategy.
Tag messages, e.g. Olmsted version 2.0.1 ; uses schema version 2.0.0, contain the version of the input data schema with which a given version of Olmsted is compatible.
The tagged release's major version of Olmsted should always match that of its compatible schema version; should we need to make breaking changes to the schema, we will bump the major versions of both Olmsted and the input schema.
This application relies on React.js and Redux for basic framework, and Vega and Vega-Lite for the interactive data visualizations.
Copyright 2025 Dave Rich, Christopher Small, Eli Harkins, and Erick Matsen. Originally forked from Auspice, copyright 2014-2018 Trevor Bedford and Richard Neher.
Source code to Olmsted is made available under the terms of the GNU Affero General Public License (AGPL). Olmsted is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.






