Skip to content

Commit 7baad1a

Browse files
committed
Merge dev into main
2 parents 13eedbe + d5daac0 commit 7baad1a

18 files changed

+747
-131
lines changed

.github/workflows/build.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,9 @@ jobs:
3131
3232
- name: Run tests with coverage
3333
run: |
34-
pytest --cov --cov-report=term --cov-branch
34+
pytest --cov --cov-report=term --cov-report=xml --cov-branch
35+
36+
- name: Upload coverage reports to Codecov
37+
uses: codecov/codecov-action@v5
38+
with:
39+
token: ${{ secrets.CODECOV_TOKEN }}

.github/workflows/deploy.yml

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
1-
name: deploy-test-pypi
1+
name: build-and-deploy
22

33
on:
4-
push:
5-
pull_request:
6-
branches: [ main ]
74
workflow_dispatch:
85

96
jobs:
@@ -17,7 +14,7 @@ jobs:
1714
- name: Set up Python
1815
uses: actions/setup-python@v4
1916
with:
20-
python-version: '3.14'
17+
python-version: '3.13'
2118

2219
- name: Install dependencies
2320
run: |
@@ -43,7 +40,7 @@ jobs:
4340
- name: Set up Python
4441
uses: actions/setup-python@v4
4542
with:
46-
python-version: '3.14'
43+
python-version: '3.13'
4744

4845
- name: Install hatch
4946
run: |

.github/workflows/docs-publish.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
name: 03b-quartodoc-publish
2-
1+
name: Publish Documentation to GitHub Pages
32
on:
43
pull_request:
54
branches: [ main ]

.github/workflows/release.yml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
name: Release to TestPyPI
2+
3+
on:
4+
push:
5+
tags:
6+
- "v*" # Triggers when you push a tag like v0.1.4, v1.0.0, etc.
7+
workflow_dispatch: # Allows manual triggering
8+
9+
jobs:
10+
build-and-publish:
11+
name: Build and Publish to TestPyPI
12+
runs-on: ubuntu-latest
13+
14+
steps:
15+
# 1. Check out the code with full git history (required for hatch-vcs to see tags)
16+
- name: Check out repository
17+
uses: actions/checkout@v4
18+
with:
19+
fetch-depth: 0 # CRITICAL: Fetch all history including tags
20+
21+
# 2. Set up Python
22+
- name: Set up Python
23+
uses: actions/setup-python@v5
24+
with:
25+
python-version: "3.13"
26+
27+
# 3. Install dev dependencies including hatch
28+
- name: Install dependencies
29+
run: |
30+
python -m pip install --upgrade pip
31+
pip install -e ".[dev]"
32+
33+
# 4. Build the package
34+
# hatch-vcs will automatically detect the git tag and use it as the version
35+
- name: Build package
36+
run: hatch build
37+
38+
# 5. Publish to TestPyPI
39+
- name: Publish to TestPyPI
40+
uses: pypa/gh-action-pypi-publish@release/v1
41+
with:
42+
user: __token__
43+
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
44+
repository-url: https://test.pypi.org/legacy/
45+
verbose: true # Shows detailed output for debugging
46+
skip-existing: true # Skip if version already exists

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,3 +243,8 @@ cython_debug/
243243

244244
# Hatch-VCS
245245
_version.py
246+
247+
/.quarto/
248+
249+
# Hatch version file (auto-generated from git tags)
250+
src/pyos_data_validation/__version__.py

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1616
- Improved `DriftReport` to include missingness-related changes.
1717
- Updated documentation for `compare_contracts` with explicit drift definitions and directionality.
1818
- Renamed the package in documentation/metadata to `pyos_data_validation`.
19+
- Removed shell prompt and split commands inside README command snippets for easier copy/paste.
20+
- Standardized docstrings across public APIs (consistent Notes formatting, Raises sections).
21+
- Added hyperlinks to Pandera, Great Expectations, and Pydantic in README comparison section.
22+
- Added docs to the developer guid install list for quartodoc build to work.
1923

2024
### Tests
2125
- Added comprehensive unit tests for `compare_contracts`, covering schema drift, constraint drift, edge cases, and error handling.

CONTRIBUTING.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,11 @@ You can contribute in many ways, for example:
1616
- [Submit Feedback](#submit-feedback)
1717
- [Get Started!](#get-started)
1818
- [Pull Request Guidelines](#pull-request-guidelines)
19+
- [Development Tools, Infrastructure, and Practices](#development-tools-infrastructure-and-practices)
20+
- [Development Tools](#development-tools)
21+
- [GitHub Infrastructure](#github-infrastructure)
22+
- [Organizational and Collaboration Practices](#organizational-and-collaboration-practices)
23+
- [Scaling Considerations](#scaling-considerations)
1924

2025
### Report Bugs
2126

@@ -114,3 +119,44 @@ Before you submit a pull request, check that it meets these guidelines:
114119
new functionality into a function with a docstring.
115120
3. Your pull request will automatically be checked by the full test suite.
116121
It needs to pass all of them before it can be considered for merging.
122+
123+
---
124+
125+
## Development Tools, Infrastructure, and Practices
126+
127+
This project applies modern Python development workflows and collaborative practices learned in DSCI 524, with a strong emphasis on reproducibility, automation, and code quality.
128+
129+
### Development Tools
130+
131+
* **Hatch** is used for environment management, testing, and task execution. This ensures consistent developer environments and simplifies common workflows such as running tests and checks.
132+
* **Ruff** is used for formatting and linting to enforce PEP 8–compliant, readable code and to provide fast feedback during development.
133+
* **Pytest** is used for automated testing to validate correctness and prevent regressions as the codebase evolves.
134+
* **Quartodoc + Quarto** are used to generate API documentation directly from docstrings, ensuring documentation stays closely aligned with the code.
135+
136+
### GitHub Infrastructure
137+
138+
* **GitHub Issues** are used to track bugs, feature requests, and documentation improvements, with labels (`bug`, `enhancement`, `help wanted`) to organize work and encourage contributions.
139+
* **Pull Requests** are the primary mechanism for code review, discussion, and integration. All changes are reviewed before merging.
140+
* **GitHub Actions (CI)** automatically run tests, formatting checks, and build steps on every pull request to `main`, ensuring consistent quality standards and preventing broken code from being merged.
141+
* **Branch-based development** is used, with feature and fix branches (`feat/*`, `fix/*`) to keep the main branch stable.
142+
143+
### Organizational and Collaboration Practices
144+
145+
* **Semantic commit messages** (Conventional Commits) improve readability of the project history and support changelog generation.
146+
* **Consistent docstring standards** ensure functions are easy to understand and maintain, especially for new contributors.
147+
* **Clear contribution guidelines** lower the barrier to entry for contributors and help standardize collaboration across the team.
148+
149+
---
150+
151+
## Scaling Considerations
152+
153+
If this project (or a similar one) were to scale to a larger user base or contributor community, the following tools and practices would be adopted or expanded:
154+
155+
* **Stricter CI gates**, such as required test coverage thresholds and branch protection rules, to maintain code quality at scale.
156+
* **Dependency monitoring** tools (e.g., Dependabot) to keep dependencies secure and up to date.
157+
* **Pre-commit hooks** to catch formatting, linting, and documentation issues earlier in the development cycle.
158+
* **Expanded documentation and examples**, including tutorials and usage guides, to support a broader audience.
159+
* **Issue and PR templates refinement**, ensuring high-quality reports and consistent reviews as contribution volume grows.
160+
161+
These tools and practices help ensure that the project remains maintainable, reliable, and welcoming as it scales in complexity and community size.
162+

README.md

Lines changed: 30 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,18 @@
22

33
| | |
44
|--------|--------|
5-
| Package | [![Latest PyPI Version](https://img.shields.io/pypi/v/pyos_data_validation.svg)](https://pypi.org/project/pyos_data_validation/) [![Supported Python Versions](https://img.shields.io/pypi/pyversions/pyos_data_validation.svg)](https://pypi.org/project/pyos_data_validation/) |
6-
| CI / Release | [![deploy-test-pypi](https://github.com/UBC-MDS/DSCI_524_G26_Data_Validation/actions/workflows/deploy.yml/badge.svg?branch=main)](https://github.com/UBC-MDS/DSCI_524_G26_Data_Validation/actions/workflows/deploy.yml) |
5+
| Package | [![TestPyPI](https://img.shields.io/badge/TestPyPI-0.1.3-blue)](https://test.pypi.org/project/pyos-data-validation/) [![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)](https://test.pypi.org/project/pyos-data-validation/) |
6+
| CI / Release | [![deploy-test-pypi](https://github.com/UBC-MDS/DSCI_524_G26_Data_Validation/actions/workflows/deploy.yml/badge.svg?branch=main)](https://github.com/UBC-MDS/DSCI_524_G26_Data_Validation/actions/workflows/deploy.yml) [![codecov](https://codecov.io/gh/UBC-MDS/DSCI_524_G26_Data_Validation/branch/main/graph/badge.svg)](https://codecov.io/gh/UBC-MDS/DSCI_524_G26_Data_Validation) |
77
| Meta | [![Code of Conduct](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](CODE_OF_CONDUCT.md) |
88
| Documentation | [View Full Documentation](https://ubc-mds.github.io/DSCI_524_G26_Data_Validation/) |
99

10-
11-
1210
`pyos_data_validation` is a lightweight Python package for defining, validating, and comparing **data contracts** for tabular datasets. It enables data scientists to formalize assumptions about their data—such as schema, missingness constraints, numeric ranges, and categorical domains—and to automatically validate new datasets against those expectations. The package supports reproducible workflows and CI-friendly automation by producing structured validation outputs and clear, actionable error messages suitable for use in unit tests and GitHub Actions.
1311

1412
---
1513

1614
## Table of Contents
17-
- [Quick Start](#get-started)
1815
- [Function Reference](#function-reference)
16+
- [Quick Start](#get-started)
1917
- [Usage Examples](#usage-examples)
2018
- [Developer Guide](#developer-guide)
2119
- [Contributors](#contributors)
@@ -468,12 +466,12 @@ print(f"Validation passed: {summary.ok}")
468466
print(f"Total issues: {len(result.issues)}")
469467

470468
# Show top 5 most severe issues
471-
print("\n🔴 Top Issues to Fix:")
469+
print("\n Top Issues to Fix:")
472470
for i, issue in enumerate(summary.top_issues, 1):
473471
print(f"{i}. [{issue.kind}] {issue.column}: {issue.message}")
474472

475473
# Example output:
476-
# 🔴 Top Issues to Fix:
474+
# Top Issues to Fix:
477475
# 1. [missing_column] user_id: Missing required column: user_id
478476
# 2. [extra_column] debug_flag: Unexpected extra column: debug_flag
479477
# 3. [dtype] age: age: expected int64, got object
@@ -501,16 +499,16 @@ for kind, count in summary.counts_by_kind.items():
501499

502500
`pyos_data_validation` is inspired by production-grade data validation frameworks but serves a different purpose:
503501

504-
| Feature | pyos_data_validation | Pandera | Great Expectations | Pydantic |
502+
| Feature | pyos_data_validation | [Pandera](https://pandera.readthedocs.io/) | [Great Expectations](https://greatexpectations.io/) | [Pydantic](https://docs.pydantic.dev/) |
505503
|---------|---------------------|---------|-------------------|----------|
506504
| **Target Use Case** | Educational, lightweight validation | Production data validation | Enterprise data quality | API input validation |
507505
| **Learning Curve** | Low | Medium | High | Low-Medium |
508-
| **Contract Inference** | Automatic | ⚠️ Limited | Profiling | Manual only |
509-
| **Drift Detection** | Built-in | No | Via profiling | No |
510-
| **Tabular Data Focus** | Yes | Yes | Yes | No (objects) |
511-
| **CI/CD Friendly** | Simple integration | Yes | ⚠️ Complex setup | Yes |
512-
| **Minimal Dependencies** | pandas only | ⚠️ Medium | Heavy | Minimal |
513-
| **Validation Customization** | ⚠️ Basic | Extensive | Extensive | Extensive |
506+
| **Contract Inference** | Automatic | Limited | Profiling | Manual only |
507+
| **Drift Detection** | Built-in | No | Via profiling | No |
508+
| **Tabular Data Focus** | Yes | Yes | Yes | No (objects) |
509+
| **CI/CD Friendly** | Simple integration | Yes | Complex setup | Yes |
510+
| **Minimal Dependencies** | pandas only | Medium | Heavy | Minimal |
511+
| **Validation Customization** | Basic | Extensive | Extensive | Extensive |
514512

515513
**When to use pyos_data_validation:**
516514
- Small to medium projects
@@ -520,17 +518,19 @@ for kind, count in summary.counts_by_kind.items():
520518
- When you need simple drift detection out of the box
521519

522520
**When to use alternatives:**
523-
- **Pandera**: Production ML pipelines with complex validation rules and custom checks
524-
- **Great Expectations**: Enterprise data quality monitoring with extensive reporting and data docs
525-
- **Pydantic**: API request/response validation or configuration management with type safety
521+
- **[Pandera](https://pandera.readthedocs.io/)**: Production ML pipelines with complex validation rules and custom checks
522+
- **[Great Expectations](https://greatexpectations.io/)**: Enterprise data quality monitoring with extensive reporting and data docs
523+
- **[Pydantic](https://docs.pydantic.dev/)**: API request/response validation or configuration management with type safety
526524

527525
---
528526

529527
## Get started
530528

531529
You can install this package locally into your preferred Python environment using pip:
532530

533-
$ pip install -e .
531+
```bash
532+
pip install -e .
533+
```
534534

535535
### Basic usage
536536

@@ -574,20 +574,30 @@ Clone the repository:
574574

575575
```bash
576576
git clone https://github.com/UBC-MDS/DSCI_524_G26_Data_Validation.git
577+
```
578+
579+
Change the directory to the project:
580+
581+
```bash
577582
cd DSCI_524_G26_Data_Validation
578583
```
579584

580-
Create and activate the conda environment:
585+
Create the conda environment:
581586

582587
```bash
583588
conda env create -f environment.yml
589+
```
590+
591+
Activate the environment:
592+
593+
```bash
584594
conda activate pyos_data_validation
585595
```
586596

587597
Install the package in editable mode with development dependencies:
588598

589599
```bash
590-
pip install -e ".[dev,tests]"
600+
pip install -e ".[dev,tests,docs]"
591601
```
592602

593603
### Running tests
@@ -670,7 +680,3 @@ View the live documentation at: https://ubc-mds.github.io/DSCI_524_G26_Data_Vali
670680
- [Full Documentation](https://ubc-mds.github.io/DSCI_524_G26_Data_Validation/)
671681
- [Issue Tracker](https://github.com/UBC-MDS/DSCI_524_G26_Data_Validation/issues)
672682
- [Project Board](https://github.com/UBC-MDS/DSCI_524_G26_Data_Validation/projects?query=is%3Aopen)
673-
- [Pandera](https://github.com/unionai-oss/pandera)
674-
- [Great Expectations](https://github.com/great-expectations/great_expectations)
675-
- [Pydantic](https://github.com/pydantic/pydantic)
676-

objects.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"project": "pyos_data_validation", "version": "0.0.9999", "count": 15, "items": [{"name": "pyos_data_validation.infer_contract.infer_contract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/infer_contract.html#pyos_data_validation.infer_contract.infer_contract", "dispname": "-"}, {"name": "pyos_data_validation.infer_contract", "domain": "py", "role": "module", "priority": "1", "uri": "reference/infer_contract.html#pyos_data_validation.infer_contract", "dispname": "-"}, {"name": "pyos_data_validation.validate_contract.validate_contract", "domain": "py", "role": "function", "priority": "1", "uri": "reference/validate_contract.html#pyos_data_validation.validate_contract.validate_contract", "dispname": "-"}, {"name": "pyos_data_validation.validate_contract", "domain": "py", "role": "module", "priority": "1", "uri": "reference/validate_contract.html#pyos_data_validation.validate_contract", "dispname": "-"}, {"name": "pyos_data_validation.compare_contracts.compare_contracts", "domain": "py", "role": "function", "priority": "1", "uri": "reference/compare_contracts.html#pyos_data_validation.compare_contracts.compare_contracts", "dispname": "-"}, {"name": "pyos_data_validation.compare_contracts", "domain": "py", "role": "module", "priority": "1", "uri": "reference/compare_contracts.html#pyos_data_validation.compare_contracts", "dispname": "-"}, {"name": "pyos_data_validation.summarize_violations.summarize_violations", "domain": "py", "role": "function", "priority": "1", "uri": "reference/summarize_violations.html#pyos_data_validation.summarize_violations.summarize_violations", "dispname": "-"}, {"name": "pyos_data_validation.summarize_violations", "domain": "py", "role": "module", "priority": "1", "uri": "reference/summarize_violations.html#pyos_data_validation.summarize_violations", "dispname": "-"}, {"name": "pyos_data_validation.types.ColumnRule", "domain": "py", "role": "class", "priority": "1", "uri": "reference/types.ColumnRule.html#pyos_data_validation.types.ColumnRule", "dispname": "-"}, {"name": "pyos_data_validation.types.Contract", "domain": "py", "role": "class", "priority": "1", "uri": "reference/types.Contract.html#pyos_data_validation.types.Contract", "dispname": "-"}, {"name": "pyos_data_validation.types.Issue", "domain": "py", "role": "class", "priority": "1", "uri": "reference/types.Issue.html#pyos_data_validation.types.Issue", "dispname": "-"}, {"name": "pyos_data_validation.types.ValidationResult", "domain": "py", "role": "class", "priority": "1", "uri": "reference/types.ValidationResult.html#pyos_data_validation.types.ValidationResult", "dispname": "-"}, {"name": "pyos_data_validation.types.DriftReport", "domain": "py", "role": "class", "priority": "1", "uri": "reference/types.DriftReport.html#pyos_data_validation.types.DriftReport", "dispname": "-"}, {"name": "pyos_data_validation.types.Summary", "domain": "py", "role": "class", "priority": "1", "uri": "reference/types.Summary.html#pyos_data_validation.types.Summary", "dispname": "-"}, {"name": "pyos_data_validation.types.ContractViolationError", "domain": "py", "role": "class", "priority": "1", "uri": "reference/types.ContractViolationError.html#pyos_data_validation.types.ContractViolationError", "dispname": "-"}]}

pyproject.toml

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,16 @@
44

55
[build-system]
66
build-backend = "hatchling.build"
7-
requires = ["hatchling"]
7+
requires = ["hatchling", "hatch-vcs"] ##release - Added hatch-vcs
88

99
################################################################################
1010
# Project Configuration
1111
################################################################################
1212

1313
[project]
1414
name = "pyos_data_validation"
15-
# You can chose to use dynamic versioning with hatch or static where you add it manually.
16-
version = "0.1.3"
15+
# Dynamic versioning from git tags
16+
dynamic = ["version"] ##release - Changed from static to dynamic
1717

1818
description = """
1919
pyos_data_validation is a lightweight Python package for defining, validating, and comparing
@@ -95,6 +95,15 @@ only-packages = true
9595
packages = ["src/pyos_data_validation"]
9696

9797

98+
###### git tag-based version bump #####
99+
100+
[tool.hatch.version] ##release
101+
source = "vcs" ##release
102+
raw-options = {version_scheme = "no-guess-dev", local_scheme = "no-local-version"} ##release
103+
104+
[tool.hatch.build.hooks.vcs] ##release
105+
version-file = "src/pyos_data_validation/__version__.py" ##release
106+
98107

99108
######## Configure pytest for your test suite ########
100109
[tool.pytest.ini_options]

0 commit comments

Comments
 (0)