Skip to content

Commit 1d5b0a0

Browse files
authored
Merge pull request #48 from UBC-MDS/ss_quarto_docs
quarto documentation
2 parents 3152b0c + 25f8b2f commit 1d5b0a0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+8881
-250
lines changed

.github/workflows/docs.yml

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,17 +11,22 @@ jobs:
1111
runs-on: ubuntu-latest
1212

1313
steps:
14-
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v4.5.2
15-
- name: Set up Python ${{ matrix.python-version }}
16-
uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v5.1.0
14+
- name: Check-out repository
15+
uses: actions/checkout@v2
16+
17+
- name: Set up Python
18+
uses: actions/setup-python@v4
1719
with:
18-
python-version: ${{ matrix.python-version }}
20+
python-version: '3.12'
21+
22+
- name: Set up Quarto
23+
uses: quarto-dev/quarto-actions/setup@v2
1924

2025
- name: Install hatch
2126
run: |
22-
python -m pip install --upgrade pip
23-
python -m pip install hatch
27+
python -m pip install --upgrade pip
28+
python -m pip install hatch
2429
2530
- name: Build documentation using Hatch
2631
run: |
27-
hatch run docs:build
32+
hatch run docs:build

.github/workflows/publish-docs.yml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
name: Publish Docs
2+
3+
on:
4+
push:
5+
branches: [main]
6+
workflow_dispatch:
7+
8+
jobs:
9+
build-docs:
10+
runs-on: ubuntu-latest
11+
12+
permissions:
13+
contents: write
14+
15+
steps:
16+
- name: Check-out repository
17+
uses: actions/checkout@v2
18+
19+
- name: Set up Python
20+
uses: actions/setup-python@v4
21+
with:
22+
python-version: '3.12'
23+
24+
- name: Install dependencies
25+
run: |
26+
python -m pip install --upgrade pip
27+
pip install -e ".[docs]"
28+
29+
- name: Set up Quarto
30+
uses: quarto-dev/quarto-actions/setup@v2
31+
32+
- name: Build quartodoc
33+
run: |
34+
quartodoc build --verbose
35+
36+
- name: Render Quarto site
37+
run: |
38+
quarto render .
39+
40+
- name: Publish to gh-pages branch
41+
uses: quarto-dev/quarto-actions/publish@v2
42+
with:
43+
target: gh-pages
44+
env:
45+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

README.md

Lines changed: 35 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -47,36 +47,58 @@ Provides a set of exploratory data analysis (EDA) focused on numeric columns. Th
4747

4848

4949
**Numeric value checks**
50+
5051
Numeric value checks ensure that numerical columns contain valid and meaningful values. These checks help detect outliers, impossible values, and violations of constraints that should logically apply to the data.
5152
- `validate_numeric_column(df, column, min_value,max_value,allow_negative)`
5253
- Verifys that numeric values fall within an expected range.
5354
- Detects negative values where they are not allowed
5455
- Identifies values that violate domain-specific boundaries.
5556

56-
**Column-level checks**
57-
Column-level checks inspect each column individually to understand data quality and readiness for cleaning or modeling. These checks evaluate the composition, completeness, and consistency of columns.
58-
- ` summarize_column_quality(df,target_column)`
59-
- Confirms the required columns (e.g., the target column) are present.
60-
- Checks data type consistency across each column
61-
- Calculates the number and percentage of missing values
62-
- Reports the number of unique values to identify high-cardinality or low-variance columns
63-
6457
While standard libraries like Pandas provide tools to transform data, **Datacure** provides the rules to validate it. By focusing on data cleaning - structural integrity, column consistency, and value range constraints - it allows developers to build more resilient data pipelines with less boilerplate code.
6558

66-
## Get started
59+
## Setting up the Development Environment
60+
1. Clone the repository to your local machine by opening your terminal and run the following commands:
61+
``` bash
62+
git clone https://github.com/UBC-MDS/DSCI_524_group20_datacure.git
63+
cd DSCI_524_group20_datacure
64+
```
65+
2. Create the conda environment from `environment.yml`:
66+
``` bash
67+
conda env create -f environment.yaml
68+
```
69+
3. Activate the environment:
70+
``` bash
71+
conda activate dsci_524_proj_env
72+
```
6773

74+
## Installing the package
6875
You can install this package into your preferred Python environment using pip:
69-
7076
``` bash
71-
$ pip install datacure
77+
$ pip install -e .
7278
```
73-
Inorder to run the tests successfully, use the command below:
7479

80+
## Running Tests
81+
You can run tests to validate all functions in the package using pytest:
7582
``` bash
7683
$ pytest -v
7784
```
78-
To use datacure in your code:
85+
-v provides a verbose output showing the names of all tests and if they passed or not.
86+
87+
## Build Documentation
88+
### Option 1 (Recommended): Build using Hatch
89+
This option installs all required documentation dependencies automatically and builds the documentation:
90+
``` bash
91+
hatch run docs:build
92+
```
93+
### Option 2 (Optional): Live preview locally (requires Quarto installed)
94+
If not already installed, you can install quarto from (here)[!https://quarto.org/docs/get-started/].
95+
To generate the API reference pages and preview the documentation website run:
96+
``` bash
97+
quartodoc build --watch
98+
quarto preview
99+
```
79100

101+
## Example use:
80102
``` python
81103
import pandas as pd
82104
from datacure import validate_datetime_schema
@@ -95,16 +117,11 @@ result = validate_datetime_schema(
95117
```
96118

97119
## Contributors
98-
99120
- Jose Davila
100-
101121
- Ssemakula Peter Wasswa
102-
103122
- Yanxin Liang
104-
105123
- Shruti Sasi
106124

107125
## Copyright
108-
109126
- Copyright © 2026 Jose Davila , Ssemakula Peter Wasswa , Yanxin Liang , Shruti Sasi.
110127
- Free software distributed under the [MIT License](./LICENSE).

_quarto.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
project:
2+
type: website
3+
output-dir: docs
4+
5+
website:
6+
title: "Datacure"
7+
navbar:
8+
search: true
9+
left:
10+
- text: Home
11+
href: index.qmd
12+
- text: References
13+
href: reference/index.qmd
14+
15+
16+
format:
17+
html:
18+
theme:
19+
- cosmo
20+
- brand
21+
css: styles.css
22+
toc: true
23+
24+
quartodoc:
25+
package: datacure
26+
dir: reference
27+
sections:
28+
- title: Datacure Functions
29+
desc: Functions to help you with data validation, data cleaning and plotting.
30+
contents:
31+
- load_or_validate_source.load_or_validate_source
32+
- validate_categorical_schema.validate_categorical_schema
33+
- validate_datetime_schema.validate_datetime_schema
34+
- validate_numeric_column.validate_numeric_column
35+
- plots.plot_numeric_distributions
36+

0 commit comments

Comments
 (0)