Public baseline pipeline for variable-star period recovery and alias analysis under Rubin-like cadence constraints.
This repository tracks an end-to-end workflow for:
- building Gaia DR3 truth-set samples
- ingesting real public-survey light curves
- standardizing raw photometry into a project schema
- running baseline period search
- evaluating recovery, alias modes, and failure cases
The current live-data workflow is centered on RR Lyrae objects cross-matched from Gaia DR3 to ZTF.
As of 2026-03-09, the project has completed:
- Gaia truth tables for
RR Lyrae,Cepheid, andEclipsing Binary - a successful live
RR Lyraepilot run on realZTFdata - a first larger
RR Lyraebaseline batch with30usable objects - end-to-end baseline period search and evaluation on that batch
| Item | Value |
|---|---|
| Survey proxy | ZTF |
| Truth set | Gaia DR3 |
| Current live class | RR Lyrae |
| Usable baseline objects | 30 |
| Ingest success rate | 30/43 (69.8%) |
| Period recovery | 24/30 (80%) |
| Median relative period error | ~1.4e-5 |
| Main failure modes | P/2, 2P, other mismatches |
| Most stable live provider | ALeRCE |
Rubin Sampling/
configs/
data/
catalogs/
lc_raw/
lc_resampled/
truth/
logs/
notebooks/
results/
figures/
tables/
src/
rubin_sampling/
CHECKLIST.md
CHECKLIST_NEXT.md
This public repository now includes generated science artifacts directly in git:
- Gaia truth-set parquet files in
data/truth/ - pilot and baseline ZTF light-curve parquet files in
data/lc_raw/ztf/ - period-result tables in
results/tables/ - evaluation tables in
results/**/tables/ - figure bundles in
results/**/figures/
Truth sets:
data/truth/truth_gaia_rrlyrae.parquetdata/truth/truth_gaia_cepheid.parquetdata/truth/truth_gaia_eb.parquetdata/truth/truth_gaia_summary.parquet
Pilot RR Lyrae:
- standardized batch:
data/lc_raw/ztf/pilot_standardized.parquet - ingest summary:
data/lc_raw/ztf/pilot_summary.parquet - period results:
results/tables/pilot_period_results.parquet - evaluation bundle:
results/pilot/
Baseline RR Lyrae batch:
- raw per-object files:
data/lc_raw/ztf/rrlyrae_baseline/ - standardized batch:
data/lc_raw/ztf/rrlyrae_baseline_standardized.parquet - ingest summary:
data/lc_raw/ztf/rrlyrae_baseline_summary.parquet - period results:
results/tables/rrlyrae_baseline_period_results.parquet - evaluation bundle:
results/rrlyrae_baseline/
The baseline pipeline currently covers:
- Gaia truth-set download via
rubin_sampling.download_gaia_truth - live ZTF ingest via
rubin_sampling.ingest_ztf_pilot - period search via
rubin_sampling.period_pipeline - feature extraction via
rubin_sampling.features - evaluation and figure generation via
rubin_sampling.evaluate
Minimum light-curve schema:
object_idtimemag_or_fluxerrband
Optional metadata:
class_labelperiod_refradecsurvey
Primary storage format: parquet
Run from the project root:
python -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
export PYTHONPATH=srcDownload Gaia truth sets:
.venv/bin/python -m rubin_sampling.download_gaia_truth \
--output-dir data/truth \
--limit-per-class 1000Run a live ZTF ingest:
.venv/bin/python -m rubin_sampling.ingest_ztf_pilot \
--truth data/truth/truth_gaia_rrlyrae.parquet \
--target-count 30 \
--candidate-pool 200Run baseline period search:
.venv/bin/python -m rubin_sampling.period_pipeline \
--input data/lc_raw/ztf/rrlyrae_baseline_standardized.parquet \
--output results/tables/rrlyrae_baseline_period_results.parquet \
--config configs/baseline.yamlRun evaluation:
.venv/bin/python -m rubin_sampling.evaluate \
--period-results results/tables/rrlyrae_baseline_period_results.parquet \
--lightcurves data/lc_raw/ztf/rrlyrae_baseline_standardized.parquet \
--ingest-summary data/lc_raw/ztf/rrlyrae_baseline_summary.parquet \
--output-dir results/rrlyrae_baselineingest_ztf_pilot supports live providers auto|irsa|alerce.
autotries IRSA first, then falls back to ALeRCEalerceis currently the fastest stable option for scaling the realRR Lyraebatch--resumereuses an existing summary file--flush-everywrites incremental progress during long runs- successful raw per-object files can rebuild the standardized batch output without refetching
evaluate writes:
tables/period_evaluation.parquettables/metrics_summary.parquettables/ingest_status_counts.parquettables/ingest_failure_cases.parquet
- the real-data baseline sample is still below the
100-300usable-object target - extreme photometric outlier filtering is not yet explicit in ingest validation
- some period solutions still collapse to
P/2or2P - failure cases still need manual review before cadence-emulation work begins
- scale
RR Lyraefrom30usable objects to100+ - review the
6current period misses in the baseline batch - add explicit outlier hardening before the next scaling run
- extend the stabilized workflow to the next variable-star class
See CHECKLIST_NEXT.md for the active execution plan and CHECKLIST.md for the broader roadmap.