pixi install
pixi run post_install[!WARNING] Embedding extraction and LID training were run on a CUDA 12.6 GPU; higher CUDA versions are likely to work but are untested. A GPU is not required to install the environment or to run the downstream phylogenetic analyses — on CPU-only machines,
pixi installwill resolve the CPU build of PyTorch automatically.
Download pipeline inputs:
pixi run download_models # download pre-trained audio models
pixi run download_fleurs # download FLEURS-R audio dataset
pixi run download_glottolog # extract lineages from FLEURS-R
pixi run download_reference_trees # extract and process reference trees
pixi run download_geojson # download language polygon data (Glottography)Download external data from Zenodo (see ZENODO.md for full manifest):
# From the repo root:
tar -xzf phylaudio_zenodo.tar.gzThis unpacks BEAST2 posteriors, XLS-R embeddings, and regression outputs into
data/.
By default, training metrics are written to a local CSV file via Lightning's
CSVLogger. To use Weights & Biases instead, set up an
account (see the Quickstart) and pass
--project <name>.
# CSV logger (default, no account required)
pixi run lid --dataset fleurs-r --model_id NeMo_ambernet
# wandb logger
pixi run lid --dataset fleurs-r --model_id NeMo_ambernet --project phylaudiopixi run sentence_distance --dataset fleurs-r --model_id NeMo_ambernet --ebs 1pixi run sentence_discrete --dataset fleurs-r --model_id NeMo_ambernetpixi run sentence_astral pdist
pixi run sentence_summary pdistpixi run beast2 -beagle_SSE -threads 8 -seed 889 data/trees/beast/speech/0.01_brsupport/input.xmlpixi run beast2 -sampleFromPrior -beagle_SSE -threads 8 -seed 889 data/trees/beast/speech/0.01_brsupport/prior.xmlscripts/beast_combine_logs.sh data/trees/beast/speech/0.01_brsupport input_v12pixi run treeannotator -topology CCD0 data/trees/beast/speech/0.01_brsupport/input_combined_resampled.trees input_combined_resampled.ccd0pixi run network_analysis data/trees/beast/speech/0.01_brsupport/input.xmlInstall the regression environment:
pixi install -e regressionBefore running regression or plotting, the following files must be present:
| File | Source |
|---|---|
data/trees/beast/speech/0.01_brsupport/input_combined_resampled.mcc |
Zenodo (speech MCC tree) |
data/trees/beast/speech/0.01_brsupport/input_combined_resampled.log |
Zenodo (speech BEAST log) |
data/trees/beast/speech/0.01_brsupport/input_combined_resampled.trees |
Zenodo (speech posterior trees) |
data/trees/beast/speech/0.01_brsupport/prior_1.log |
Zenodo (speech prior log) |
data/trees/references/raw/iecor.nex |
pixi run download_reference_trees (IECoR MCC tree) |
data/trees/beast/iecor/raw.trees |
pixi run download_reference_trees (IECoR posterior) |
data/trees/beast/iecor/raw.log |
pixi run download_reference_trees (IECoR posterior log) |
data/trees/beast/iecor/prior/raw.log |
pixi run download_reference_trees (IECoR prior log) |
data/trees/beast/iecor/prunedtomodern.trees |
pixi run download_reference_trees (auto-pruned) |
Generates metadata CSVs (with and without phoneme inventory) for both speech and
cognate trees. Reads MCC trees from data/trees/beast/:
pixi run -e regression prepare_regression_dataThis writes 4 files to data/phyloregression/.
pixi run -e regression beast_phylolm -- --model_type linear_geo --tree input_v12_combined_resampled --variant with_inventory
pixi run -e regression beast_phylolm -- --model_type linear_geo --tree heggarty2024_raw --variant with_inventorypixi run -e regression beast_phylolm -- --model_type gp_geo --tree input_v12_combined_resampled --variant with_inventory
pixi run -e regression beast_phylolm -- --model_type gp_geo --tree heggarty2024_raw --variant with_inventoryResults are written to data/phyloregression/<variant>/.
Install visualization dependencies:
pixi install -e viz# Figure 1
pixi run -e viz fig1_acc_vs_brsupport # Panel A: LID accuracy vs. bootstrap support
pixi run -e viz fig1_nmf # Panel B: NMF structure plot
pixi run -e viz fig1_delta # Panel D: per-language delta scores
pixi run -e viz fig1_pca # Extended: PCA of XLS-R embeddings
pixi run -e viz fig1_sqa # Extended: silhouette vs. SI-SDR + correlation
# Figures 2–3
pixi run -e viz fig2_rates # Figure 2 panel B: speech rate over time
pixi run -e viz fig2_rates_cognate # Cognate rate over time
pixi run -e viz fig3_geo # Figure 3: regression panels
# Extended
pixi run -e viz ext_rates_and_maps # rate scatter, GP maps, root age, rate-over-timepixi run python -m src.tasks.phylo.compute_paper_stats