Local-first validator and packager for FAIR-compliant research datasets.
Why FAIRy
- Local-first: All processing is on your machine. Your raw and fixed data never leave without your consent.
- Extensible: Add new repository templates or contribute improved schemas/rules — keep up with evolving standards.
- Practical: Catch real submission pain points (dates, IDs, vocab, file names), export clean packages, avoid resubmission headaches.
What’s in this repo
This repo contains the core validation engine and CLI (e.g., fairy preflight, fairy validate).
- ✅ Validates tabular metadata against repository-specific rulepacks
- ✅ Emits machine-readable (JSON) and human-readable (Markdown) reports
- ✅ Writes attestation & provenance, with optional export bundle (zip)
- 🧪 Includes intentionally "failing" fixtures for smoketests
- 🚧 Early alpha; interfaces may change prior to v1.0
💡 Want the full UI experience? For project workspaces, guided fixes, visual workflows, and demo examples, see fairy-lab — a Streamlit-based demo tenant that uses this core engine.
This produces a Markdown file for the penguins kata like the screenshot above.
python -m venv .venv && source .venv/bin/activate
pip install -e .
mkdir -p .tmp
fairy validate \
--rulepack rulepacks/examples/penguins/rulepack.yml \
--inputs default=tests/fixtures/penguins_small.csv \
--report-json .tmp/penguins_report.json \
--report-md .tmp/penguins_report.md
less .tmp/penguins_report.md # press q to quit
# (Optional) Open in VS Code if you have it:
code .tmp/penguins_report.md-
UI (fairy-lab): local project workspaces, guided fixes, visual workflow, export bundles. → https://github.com/yuummmer/fairy-lab
-
CLI (this repo): run
fairy preflight(human-friendly default) orfairy validate(engine command) against a rulepack, review JSON/Markdown reports, iterate until submission-ready.Command overview:
validate= checks;preflight= checks + outputs + guidance. Most users should runpreflight. See CLI usage for details.
# 0) Python 3.10+
python -m venv .venv && source .venv/bin/activate
pip install -e .
# 1) Check CLI
fairy --help
fairy --version
# 2) (Optional) Rulepack schema validation (example penguins rulepack)
python -m fairy.cli rulepack --rulepack rulepacks/examples/penguins/rulepack.yml
FAIRy-core is the engine. Rulepacks are versioned and released independently in separate CC0 repos.
- Rulepack registry (recommended start here):
yuummmer/fairy-rulepack-registryMachine-readable index of available rulepacks + versions. - GEO rulepacks:
yuummmer/fairy-rulepacks-geogeo_bulk_seq(available)geo_single_cell(planned)
This repo (fairy-core) includes only small example/kata rulepacks under rulepacks/examples/ for learning and tests.
Official (and community) rulepacks are tracked in fairy-rulepack-registry. Tools (and FAIRy Lab) can use this to discover rulepacks.
- 📌 Machine-readable list of rulepack repos + versions (so tools can discover them)
- ✅ Schema-validated via CI (prevents broken entries)
- 🧭 Use it to find the latest GEO/ENA/etc rulepacks without hunting across GitHub
# Get the GEO rulepack repo (CC0)
git clone https://github.com/yuummmer/fairy-rulepacks-geo.git
# Run preflight using its fixtures
mkdir -p .tmp
fairy preflight \
--rulepack .deps/fairy-rulepacks-geo/rulepacks/geo_bulk_seq/v0_2_0.json \
--samples .deps/fairy-rulepacks-geo/rulepacks/geo_bulk_seq/fixtures/samples.tsv \
--files .deps/fairy-rulepacks-geo/rulepacks/geo_bulk_seq/fixtures/files.tsv \
--out .tmp/geo_bulk_seq_report.json
# View outputs:
ls .tmp/
cat .tmp/geo_bulk_seq_report.md | head -n 80 # or open in your editor
Available rule types: required, unique, enum, range, dup/no_duplicate_rows, foreign_key, url, non_empty_trimmed, regex
See Rule types reference for complete documentation on all rule types and their configuration options.
- JSON: Structured v1.0.0 schema reports with deterministic ordering (see
schemas/preflight_report_v1.schema.jsonanddocs/reporting.md) - Markdown: Curator-friendly one-pager (generated alongside JSON)
- Exit code: 0 if no FAIL, else 1
For full documentation, see the docs/ folder:
pip install -e ".[dev]"
pytest -q
ruff check . --fix
ruff format .- Source files use SPDX headers:
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright (c) 2025 Jennifer Slotnick- We package rulepacks as package data:
[tool.setuptools.package-data]
fairy = ["rulepacks/**/*.json","rulepacks/**/*.yaml","rulepacks/**/*.yml"]
-
Local artifacts to ignore are preconfigured (.tmp/, .venv/, pycache/).
-
Coverage config: .coveragerc, pytest.ini
-
Demo fixtures: tests/fixtures/*, demos/
src/fairy/
cli/ # CLI entrypoints (validate, preflight, rulepack)
core/ # services/models/validators (evolving)
rulepack/ # loader + schema (YAML)
rulepacks/
examples/ # small kata/example rulepacks (CC0)
demos/ # demo rulepacks / scratch data (not shipped)
tests/ # unit + smoke tests
decisions/ # Architecture Decision Records (ADRs)
See Architecture Decision Records for major design decisions and rationale.
FAIRy-core uses a mixed licensing model:
-
Core engine code (
src/fairy/**): Licensed under AGPL-3.0-only. SeeLICENSE. -
Example/kata rulepacks (
rulepacks/examples/**): Licensed under CC0-1.0. Seerulepacks/examples/LICENSE. -
Official public rulepacks (separate repos, e.g.
fairy-rulepacks-geo): Licensed under CC0-1.0 (each repo includes its own LICENSE). -
Samples and fixtures (e.g.
samples/**,tests/fixtures/**): Licensed under CC BY-4.0 (documented per folder).
Commercial licensing for FAIRy-core is available for organizations that
cannot adopt AGPL. See COMMERCIAL.md or contact
[email protected].
We welcome contributions! See CONTRIBUTING.md for guidelines on:
- How to get started (forking, setup, making changes)
- Ways to contribute (rulepacks, tests, documentation, engine improvements)
- Specs and PRDs for major features
- Licensing of contributions
For information about maintainer roles and module stewards, see MAINTAINERS.md.
If you use FAIRy in a project, demo, or talk, please cite:
FAIRy (v0.2). Local-first validator for FAIR research data. FAIRy-core (engine): https://github.com/yuummmer/fairy-core FAIRy Lab (UI & labs): https://github.com/yuummmer/fairy-lab
- Profiles as workflow composition over rulepacks (see ADR-0007)
- Preflight evolving to universal operator mode (profile-based + output-dir oriented)
For the full roadmap, see ROADMAP.md.
- Researchers and lab teams tired of “submission rejected → fix → resubmit” loops.
- Anyone handling sensitive/pre-publication data who needs local-first validation.
- Curators and institutions who want consistent, transparent, hackable validation templates.
Institutions and labs interested in pilots or dashboards — we’d love to hear from you.
Email [email protected] or open an issue labeled pilot-inquiry.
