You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+73-4Lines changed: 73 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,10 +5,79 @@ All notable changes to this project will be documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
8
-
## [Unreleased]
8
+
## [3.0.0] (Milestone 4) - 2026-02-02
9
9
10
-
- Upcoming features and fixes
10
+
### Fixed
11
11
12
-
## [0.1.0] - (1979-01-01)
12
+
- PR [#114](https://github.com/UBC-MDS/DSCI_524_group37_csvplus/pull/114) to reorganize `README.md` for clarity and usability for both users and developers to address peer review Issue [#103](https://github.com/UBC-MDS/DSCI_524_group37_csvplus/issues/103)
13
+
- PR [#121](https://github.com/UBC-MDS/DSCI_524_group37_csvplus/pull/121) to fix `resolve_string_value()` example in `README.md` to address peer review Issue [#100](https://github.com/UBC-MDS/DSCI_524_group37_csvplus/issues/100)
14
+
- Addressed inconsistencies in test_generate_report.py (#122)
15
+
- PR [#131](https://github.com/UBC-MDS/DSCI_524_group37_csvplus/pull/131) to add author emails to address peer review Issue [#130](https://github.com/UBC-MDS/DSCI_524_group37_csvplus/issues/130)
13
16
14
-
- First release
17
+
### Added
18
+
19
+
- Retrospective and next steps to CONTRIBUTING.md (#126)
20
+
21
+
## [2.0.0] (Milestone 3) - 2026-01-25
22
+
23
+
### Added
24
+
25
+
- Additional unit tests for improved coverage (#95, #80)
26
+
- Flake8 linter to workflow (#83)
27
+
- Quartodoc YAML file for documentation (#91)
28
+
- Additional data validation and unit tests (#76)
29
+
- Deploy and build workflow files (#68, #71)
30
+
31
+
### Changed
32
+
33
+
- Bump package version from 0.1.2 to 0.2.2 (#97)
34
+
- Updated README for milestone 3 (#94)
35
+
- Updated dependencies and deleted commented out code (#62)
36
+
- Installed necessary dev and test dependencies (#69)
37
+
38
+
### Fixed
39
+
40
+
- Linter issues (#98)
41
+
- Action version and added skip-existing option (#96)
42
+
- Style issues and flake8 compliance (#89, #86)
43
+
- Docstring style errors (#73)
44
+
- Pandas version to pass all unit tests (#76)
45
+
46
+
## [1.0.0] (Milestone 2) - 2026-01-17
47
+
48
+
### Added
49
+
50
+
- Implemented `data_version_diff` function (#46)
51
+
- Created tests for `data_version_diff` function (#53)
52
+
- Improved test coverage for `data_version_diff` function (#56)
53
+
- Implemented `generate_report` function (#51)
54
+
- Implemented `load_optimized_csv` function with tests (#48)
55
+
- Implemented `resolve_string_value` function with unit tests (#40)
56
+
- Initial version of environment.yml (#32)
57
+
58
+
### Changed
59
+
60
+
- Updated README (#55)
61
+
- Updated docstring and function specs (#37)
62
+
- Renamed `data-correction.py` to `data_correction.py` and updated docstrings (#34)
63
+
64
+
## [0.0.1] (Milestone 1) - 2026-01-10
65
+
66
+
### Added
67
+
68
+
- Initial commit with project setup
69
+
- Function stub and docstring for `data_version_diff` (#17)
70
+
- Created `generate-report.py` with docstring (#15)
71
+
- Function definition and docstring for `load_optimized_csv` (#14)
72
+
- Added `resolve_string_value` function in data-correction.py (#13)
73
+
- Package details and contributors in README (#12)
74
+
75
+
### Changed
76
+
77
+
- Updated code of conduct to reflect group values (#11)
78
+
- Edited CONTRIBUTING.md (#18)
79
+
- Added raised errors to docstring (#16)
80
+
81
+
### Fixed
82
+
83
+
- Address inconsistencies in function names in README.md and data-correction.py (#21)
| Meta |[](CODE_OF_CONDUCT.md)|
7
8
8
9
> **Note**: PyPI badges are included for completeness but may not reflect a published package.
9
10
@@ -27,21 +28,20 @@ The package is intended to support:
27
28
28
29
This package addresses common data preprocessing and exploration tasks through the following functions:
29
30
30
-
|Function |Description |
31
-
|--------|--------|
32
-
|`load_optimized_csv`|Loads a CSV file and automatically downcasts data types to minimize memory footprint.|
33
-
|`data_version_diff`|Compare two versions of a pandas DataFrame and return a structured summary of schema, row count, missing values, numeric statistics, and data type changes.|
34
-
|`resolve_string_value`|Consolidating spelling variations of the same data value in a column.|
35
-
|`summary_report`|Produce a list of descriptive statistics of the data and information about missing values.|
|`load_optimized_csv`|Loads a CSV file and automatically downcasts data types to minimize memory footprint.|
34
+
|`data_version_diff`|Compare two versions of a pandas DataFrame and return a structured summary of schema, row count, missing values, numeric statistics, and data type changes.|
35
+
|`resolve_string_value`|Consolidating spelling variations of the same data value in a column.|
36
+
|`summary_report`|Produce a list of descriptive statistics of the data and information about missing values.|
36
37
37
38
Some functions operate on **CSV files**, while others work directly on **pandas DataFrames**, allowing users to integrate `csvplus` into existing pandas-based workflows.
38
39
39
40
Our package fits into the Python preprocessing framework. Currently, the [`pandas`](https://pandas.pydata.org/) package provides basic functionality to read CSV and produce summary statistics, and the [`pyjanitor`](https://pyjanitor-devs.github.io/pyjanitor/) package provides functions for sanitizing the column names and converting column dtype.
40
41
41
42
`csvplus` extends these tools with automated memory optimization, dataset version comparison and high-level summaries useful for auditing and exploratory analysis
42
43
43
-
Full API reference and examples are available at: https://ubc-mds.github.io/DSCI_524_group37_csvplus/reference/
44
-
---
44
+
## Full API reference and examples are available at: https://ubc-mds.github.io/DSCI_524_group37_csvplus/reference/
All tests are written using `pytest`. To run the full test suite and generate a coverage report execute:
158
+
This allows you to edit the source code locally while using the package.
124
159
125
160
```bash
126
-
# install coverage tools if not yet installed
127
-
pip install pytest pytest-cov
128
-
129
-
pytest --cov=csvplus --cov-report=term-missing
161
+
pip install -e ".[docs]"
130
162
```
131
163
132
-
### Install csvplus package (editable mode)
164
+
### Run Tests and Coverage
133
165
134
-
This allows you to edit the source code locally while using the package.
166
+
All tests are written using `pytest`. To run the full test suite and generate a coverage report execute:
135
167
136
168
```bash
137
-
pip install -e .
169
+
pytest --cov=csvplus --cov-report=term-missing
138
170
```
139
171
140
172
### Build and Preview Documentation
141
173
142
174
```bash
143
175
quartodoc build
144
-
quarto preview
145
176
quarto render
177
+
quarto preview
146
178
```
147
179
148
-
## Contributors
180
+
### Deploy Documentation (automated)
181
+
Documentation is deployed automatically by the `build-docs` job in `.github/workflows/docs-publish.yml` on a pull request (PR) aimed at the main branch.
0 commit comments