Most human data systems rely on post-hoc quality assurance.
This creates a structural problem: invalid data can be created freely, and must be detected later.
This repo demonstrates a different approach:
→ Prevent invalid data states from being created at all
→ Enforce admissibility at the point of contribution
In AI systems, human-generated data directly shapes model behavior.
If invalid states are allowed at the point of creation, they propagate into training, evaluation, and alignment layers.
This turns data quality into a probabilistic outcome.
This repo treats data quality as a system property instead.
Contributor → Constraint Layer → Accepted / Rejected → Downstream Systems
A minimal “Human Data Admissibility Layer”:
- Define inadmissible states (not all valid states)
- Enforce constraints at submission time
- Reject invalid inputs immediately
- Track system behavior via rejection telemetry
Run:
python run_demo.py
Output:
Valid: False, Reason: low_confidence_submission
If invalid states cannot be created,
data quality becomes a property of the system — not the reviewer.