Rubin Sampling

Public baseline pipeline for variable-star period recovery and alias analysis under Rubin-like cadence constraints.

Overview

This repository tracks an end-to-end workflow for:

building Gaia DR3 truth-set samples
ingesting real public-survey light curves
standardizing raw photometry into a project schema
running baseline period search
evaluating recovery, alias modes, and failure cases

The current live-data workflow is centered on RR Lyrae objects cross-matched from Gaia DR3 to ZTF.

Current Status

As of 2026-03-09, the project has completed:

Gaia truth tables for RR Lyrae, Cepheid, and Eclipsing Binary
a successful live RR Lyrae pilot run on real ZTF data
a first larger RR Lyrae baseline batch with 30 usable objects
end-to-end baseline period search and evaluation on that batch

Current Baseline Snapshot

Item	Value
Survey proxy	`ZTF`
Truth set	`Gaia DR3`
Current live class	`RR Lyrae`
Usable baseline objects	`30`
Ingest success rate	`30/43` (`69.8%`)
Period recovery	`24/30` (`80%`)
Median relative period error	`~1.4e-5`
Main failure modes	`P/2`, `2P`, `other` mismatches
Most stable live provider	`ALeRCE`

Repository Structure

Rubin Sampling/
  configs/
  data/
    catalogs/
    lc_raw/
    lc_resampled/
    truth/
  logs/
  notebooks/
  results/
    figures/
    tables/
  src/
    rubin_sampling/
  CHECKLIST.md
  CHECKLIST_NEXT.md

Tracked Public Artifacts

This public repository now includes generated science artifacts directly in git:

Gaia truth-set parquet files in data/truth/
pilot and baseline ZTF light-curve parquet files in data/lc_raw/ztf/
period-result tables in results/tables/
evaluation tables in results/**/tables/
figure bundles in results/**/figures/

Main Artifacts

Truth sets:

data/truth/truth_gaia_rrlyrae.parquet
data/truth/truth_gaia_cepheid.parquet
data/truth/truth_gaia_eb.parquet
data/truth/truth_gaia_summary.parquet

Pilot RR Lyrae:

standardized batch: data/lc_raw/ztf/pilot_standardized.parquet
ingest summary: data/lc_raw/ztf/pilot_summary.parquet
period results: results/tables/pilot_period_results.parquet
evaluation bundle: results/pilot/

Baseline RR Lyrae batch:

raw per-object files: data/lc_raw/ztf/rrlyrae_baseline/
standardized batch: data/lc_raw/ztf/rrlyrae_baseline_standardized.parquet
ingest summary: data/lc_raw/ztf/rrlyrae_baseline_summary.parquet
period results: results/tables/rrlyrae_baseline_period_results.parquet
evaluation bundle: results/rrlyrae_baseline/

Core Pipeline

The baseline pipeline currently covers:

Gaia truth-set download via rubin_sampling.download_gaia_truth
live ZTF ingest via rubin_sampling.ingest_ztf_pilot
period search via rubin_sampling.period_pipeline
feature extraction via rubin_sampling.features
evaluation and figure generation via rubin_sampling.evaluate

Data Contract

Minimum light-curve schema:

object_id
time
mag_or_flux
err
band

Optional metadata:

class_label
period_ref
ra
dec
survey

Primary storage format: parquet

Quickstart

Run from the project root:

python -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
export PYTHONPATH=src

Main Commands

Download Gaia truth sets:

.venv/bin/python -m rubin_sampling.download_gaia_truth \
  --output-dir data/truth \
  --limit-per-class 1000

Run a live ZTF ingest:

.venv/bin/python -m rubin_sampling.ingest_ztf_pilot \
  --truth data/truth/truth_gaia_rrlyrae.parquet \
  --target-count 30 \
  --candidate-pool 200

Run baseline period search:

.venv/bin/python -m rubin_sampling.period_pipeline \
  --input data/lc_raw/ztf/rrlyrae_baseline_standardized.parquet \
  --output results/tables/rrlyrae_baseline_period_results.parquet \
  --config configs/baseline.yaml

Run evaluation:

.venv/bin/python -m rubin_sampling.evaluate \
  --period-results results/tables/rrlyrae_baseline_period_results.parquet \
  --lightcurves data/lc_raw/ztf/rrlyrae_baseline_standardized.parquet \
  --ingest-summary data/lc_raw/ztf/rrlyrae_baseline_summary.parquet \
  --output-dir results/rrlyrae_baseline

Ingest Notes

ingest_ztf_pilot supports live providers auto|irsa|alerce.

auto tries IRSA first, then falls back to ALeRCE
alerce is currently the fastest stable option for scaling the real RR Lyrae batch
--resume reuses an existing summary file
--flush-every writes incremental progress during long runs
successful raw per-object files can rebuild the standardized batch output without refetching

Evaluation Outputs

evaluate writes:

tables/period_evaluation.parquet
tables/metrics_summary.parquet
tables/ingest_status_counts.parquet
tables/ingest_failure_cases.parquet

Known Gaps

the real-data baseline sample is still below the 100-300 usable-object target
extreme photometric outlier filtering is not yet explicit in ingest validation
some period solutions still collapse to P/2 or 2P
failure cases still need manual review before cadence-emulation work begins

Next Steps

scale RR Lyrae from 30 usable objects to 100+
review the 6 current period misses in the baseline batch
add explicit outlier hardening before the next scaling run
extend the stabilized workflow to the next variable-star class

See CHECKLIST_NEXT.md for the active execution plan and CHECKLIST.md for the broader roadmap.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
data		data
results		results
src/rubin_sampling		src/rubin_sampling
.gitignore		.gitignore
CHECKLIST.md		CHECKLIST.md
CHECKLIST_NEXT.md		CHECKLIST_NEXT.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rubin Sampling

Overview

Current Status

Current Baseline Snapshot

Repository Structure

Tracked Public Artifacts

Main Artifacts

Core Pipeline

Data Contract

Quickstart

Main Commands

Ingest Notes

Evaluation Outputs

Known Gaps

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rubin Sampling

Overview

Current Status

Current Baseline Snapshot

Repository Structure

Tracked Public Artifacts

Main Artifacts

Core Pipeline

Data Contract

Quickstart

Main Commands

Ingest Notes

Evaluation Outputs

Known Gaps

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages