Skip to content

Commit d7e2446

Browse files
R7L208kwang-databricksBrianDeaconBrianDeaconBrightdrop
authored
Resample refactor (#428)
* making contributing.md slightly more clear (#422) * making contributing.md slightly more clear * Remove `Analyze` job from test.yml (#423) - This job called CodeQL which is broken due to new firewall rules --------- Co-authored-by: Lorin Dawson <[email protected]> * created prelim makefile with tox commands, updated contributing.md (#424) * created prelim makefile with tox commands, updated contributing.md * adding 3.11 to docs, and updating create-env in makefile to install all necessary python versions * removing 3.8 * Update Makefile to improve test and environment management Added commands for creating virtual environments, running tests, and generating coverage reports. Enhanced documentation for supported DBR versions and added new command options for better usability. * Update Makefile to conditionally install Python versions Modified the `venv` target to check for existing Python versions before installing them. This prevents redundant installations and ensures only missing versions are installed via `pyenv`. * Update Makefile to use dynamic virtualenv variable Replaced hardcoded `.venv` with the `$(VENV)` variable in the Makefile. This allows for greater flexibility and customization of the virtual environment directory name. * Fix typo in Makefile target comment Corrected a misspelling in the comment for the `test` target. Changed "testtests" to "tests" for improved clarity and professionalism. --------- Co-authored-by: Lorin <[email protected]> * [Chore] Cicd updates 02 (#425) * created prelim makefile with tox commands, updated contributing.md * adding 3.11 to docs, and updating create-env in makefile to install all necessary python versions * removing 3.8 * updating git workflows to use make * create coverage report in test * remove commented tox command * update check_for_nan, force bool returns * updates to io.py to pass tests * testing out change to tsdf to account for new mypy restrcitions * updating mypy error handling * revert * updating mypy.ini to ignore scipy * remove edits to tox.ini to show versions * - Fix `bool` conversions in `tempo/intervals.py` to ensure consistent type handling. - Correct `tox.ini` basepython to `py311` and manage additional dependencies. - Clarify complex number handling in `fourier_transform` and ensure float usage in `fftfreq`. - Expand `Makefile` functionality with environment checks for Python and Java setups. * Add `_cleanup_delta_warehouse` method for Delta test environment cleanup. Integrate pre- and post-test cleanup of Delta warehouse directories and metastore files in `setUpClass` and `tearDownClass` to ensure a consistent test environment. * Expand Makefile with enhanced Python environment checks and management. - Add `.claude` to `.gitignore`. - Replace `check-pyenv` with `check-python` to support system Python and automate `pyenv` installation. - Introduce `setup-python-versions` and `setup-all-python-versions` targets for flexible Python version setups. - Update `venv`, `test`, and `test-all` targets to utilize new utilities. --------- Co-authored-by: Lorin <[email protected]> * chore: code formatting and linting updates * Upgrading Tox to Hatch, and Updating respective commands in Makefile (#426) * created prelim makefile with tox commands, updated contributing.md * adding 3.11 to docs, and updating create-env in makefile to install all necessary python versions * removing 3.8 * updating git workflows to use make * create coverage report in test * initial hatch commit * updates to makefile * updating github workflow dependencies to install hatch instead of tox * fixing posargs issue in lint * fixing type checker * adding version call so hatch knows what to pick up * using correct method in version.py" * adding get_version to version.py for hatch environment creation * adding semver as dep in git * using vcs as hatch version * updating version.py to dynamically pull version, and semver as dep in all testenvs * checking semver install * updating semver * fixing var ref before assignment in version.py * fixing correct error type * getting around coverage circ dep * forgot to update makefile * remove commented tox command * update check_for_nan, force bool returns * updates to io.py to pass tests * testing out change to tsdf to account for new mypy restrcitions * updating mypy error handling * revert * updating mypy.ini to ignore scipy * remove edits to tox.ini to show versions * linting fix * refactor: simplify resampling and interpolation logic - Removed `_ResampledTSDF` class in favor of integrating resampling metadata (`_resample_freq`, `_resample_func`) into `TSDF`. - Improved error handling for `freq` and `func` parameters in resample and interpolation methods. - Updated interpolation logic to utilize a resampled TSDF object and mapped predefined fill methods. - Adjusted references to `partitionCols` by transitioning to `series_ids`. - Added `parameterized` dependency to `pyproject.toml` for enhanced test capabilities. - Reduced circular imports and restructured imports for maintainability. - Updated tests to align with changes in resample and interpolation workflows. * refactor: extract resample helper functions to `resample_utils` and update tests - Moved reusable functions and constants (e.g., `validateFuncExists`, `checkAllowableFreq`) to `resample_utils` for better modularity. - Updated `resample`, `tsdf`, and associated tests to use the refactored helper functions. - Simplified test data construction by introducing `get_test_function_df_builder`. * core refactor done: enhance resampling logic and consolidate column handling behavior - Introduced `AGG_KEY` constant for consistent grouping across resample functions. - Refined column handling logic in `aggregate` and `resample` to align with best practices, emphasizing explicit configurations and preserving observational columns by default. - Updated `calc_bars` to seamlessly integrate OHLC bar calculations with enhanced column handling. - Adjusted tests to use more descriptive data keys (`input_data`, `expected_data`) for better clarity. - Updated tests to align with modified column handling and aggregation behavior. * refactor: introduce `resample_utils` module for shared resampling utilities - Added `resample_utils.py` to encapsulate utility functions (`checkAllowableFreq`, `validateFuncExists`) and constants for resampling. - Defined global frequency and aggregation options for modularity. - Improved frequency validation logic with `freq_dict` and `ALLOWED_FREQ_KEYS`. * refactor: optimize time bounds calculation in resample logic. All resmaple tests pass - Replaced window function-based approach with `groupBy` for calculating time bounds, reducing duplicate computations. - Improved handling of `series_ids` for partitioning, ensuring accuracy in sequence generation. - Simplified logic for timestamp sequence generation using `time_bounds`. * refactor: update test data construction and skip conditions for enhanced clarity and precision - Consolidated shared test data definitions and references using `$ref`. - Adjusted timestamp handling and sequence column logic in various tests. - Added skip conditions for tests involving composite timestamp indexes or complex timestamp structures. - Refactored utility tests to improve consistency in `get_test_function_df_builder` usage. * refactor: replace `get_test_df_builder` with `get_test_function_df_builder` in tests for consistent data building * refactor: add PySpark 3.5+ compatibility and provide fallbacks for older versions - Introduced version checks for PySpark features like `count_if` and `bool_or`. - Added compatibility wrapper `_bool_or_compat` for `bool_or` functionality. - Updated segment handling logic with fallbacks for PySpark < 3.5, ensuring robust interpolation behavior. * Allow more column types to be interpolated (#421) * Allow interpolation to only reject column types that cannot work for the specific method used * Fix unit test incompatibility with dbr113 * Fix tests and clarify error message * Fix incorrect column reference in test --------- Co-authored-by: Brian Deacon <[email protected]> * update: extend .gitignore to include `.venv-*` directory pattern * chore linting: improve type annotations, imports, and formatting for code consistency - Replaced redundant `Any` in type hints with more precise alternatives (e.g., `object`, `TSDF`, `Collection`). - Converted `TimeUnitsType` from `NamedTuple` creation to a class for improved readability. - Consolidated and reorganized imports across modules for better clarity. - Removed unused imports and redundant `pass` statements in abstract methods. - Standardized and fixed minor formatting issues (consistency in blank lines, indentation, and trailing spaces). * refactor: improve type annotations, reorganize imports, and enhance join logic across modules - Added precise type annotations (e.g., `Optional`, `Tuple`) to methods for better clarity and static analysis support. - Refactored imports to include `# type: ignore` directives where necessary for untyped packages or compatibility. - Enhanced `_toleranceFilter` and `_join` methods with placeholders and TODOs for future logic implementations. - Introduced stricter validation for parameters (e.g., `freq` checks) to avoid runtime errors. - Updated join constructors to handle default prefix values and improve readability. * style: fix indentation to 4 spaces per PEP 8 * adding imports for IntervalsDF + pytest in dev.txt * test: remove `intervals_tests.py` to clean up unused and redundant test cases * test: remove `intervals_df_tests.json` to clean up deprecated and redundant test cases * test: update exception type in `test_appendAggKey_freq_is_none` - Replaced `TypeError` with `ValueError` in assertion to align with updated `_appendAggKey` behavior. --------- Co-authored-by: kwang-databricks <[email protected]> Co-authored-by: Brian Deacon <[email protected]> Co-authored-by: Brian Deacon <[email protected]>
1 parent 2cfe796 commit d7e2446

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+9767
-3117
lines changed

.github/workflows/docs-release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,11 +31,11 @@ jobs:
3131
- name: Install dependencies
3232
run: |
3333
python -m pip install --upgrade pip
34-
python -m pip install tox
34+
python -m pip install hatch
3535
3636
- name: Build docs
3737
working-directory: ./python
38-
run: tox -e build-docs
38+
run: make build-docs
3939

4040
- name: Upload artifacts
4141
uses: actions/upload-artifact@v4

.github/workflows/release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,11 @@ jobs:
2828
- name: Install dependencies
2929
run: |
3030
python -m pip install --upgrade pip
31-
python -m pip install tox
31+
python -m pip install hatch
3232
3333
- name: Build dist
3434
working-directory: ./python
35-
run: tox -e build-dist
35+
run: make build-dist
3636

3737
- name: Publish a Python distribution to PyPI
3838
uses: pypa/gh-action-pypi-publish@release/v1

.github/workflows/test.yml

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,13 @@ jobs:
2222
- name: Install dependencies
2323
run: |
2424
python -m pip install --upgrade pip
25-
python -m pip install tox tox-gh-actions
25+
python -m pip install hatch
2626
- name: Lint check
2727
working-directory: ./python
28-
run: tox -e lint -- --check --diff
28+
run: make lint LINT_PARAMS="-- --check --diff"
2929
- name: Type check
3030
working-directory: ./python
31-
run: tox -e type-check
31+
run: make type-check
3232

3333
test:
3434
needs: lint-and-check
@@ -37,15 +37,15 @@ jobs:
3737
matrix:
3838
config:
3939
- py: '3.9'
40-
dbr: dbr113
40+
dbr: 113
4141
- py: '3.9'
42-
dbr: dbr122
42+
dbr: 122
4343
- py: '3.10'
44-
dbr: dbr133
44+
dbr: 133
4545
- py: '3.10'
46-
dbr: dbr143
46+
dbr: 143
4747
- py: '3.11'
48-
dbr: dbr154
48+
dbr: 154
4949
fail-fast: false
5050
steps:
5151
- uses: actions/checkout@v4
@@ -59,10 +59,11 @@ jobs:
5959
- name: Install dependencies
6060
run: |
6161
python -m pip install --upgrade pip
62-
python -m pip install tox
63-
- name: Execute tox envs
62+
python -m pip install semver
63+
python -m pip install hatch
64+
- name: Execute hatch envs
6465
working-directory: ./python
65-
run: tox -e ${{ matrix.config.dbr }},coverage-report
66+
run: make test coverage-report DBR=${{ matrix.config.dbr }}
6667
- name: Publish test coverage
6768
uses: codecov/codecov-action@v4
6869
with:

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ python/.env
3838
venv
3939
.venv
4040
.env
41+
.venv-*
4142

4243
# other misc ignore
4344
.DS_Store
@@ -49,6 +50,10 @@ docs/_build
4950
# ignore mypy cache
5051
.mypy_cache
5152
python/.mypy_cache
53+
.aider*
54+
55+
# ignore Claude Code files
56+
.claude
5257

5358
# ignore pycharm databricks plugin directory
5459
.databricks

CONTRIBUTING.md

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,18 @@ Be sure to carefully follow the instructions to configure your shell environment
1212

1313
Use `pyenv` to install the following Python versions for testing.
1414
```bash
15-
pyenv install 3.8 3.9 3.10
15+
pyenv install 3.9 3.10 3.11
1616
```
1717

1818
You will probably want to set one of these versions as your global Python version. This will be the version of Python that is used when you run `python` commands in your terminal.
1919
For example, to set Python 3.9 as your global Python version, run the following command:
2020
```bash
21-
pyenv global 3.10
21+
pyenv global 3.9
2222
```
2323

2424
Within the `tempo/python` folder, run the below command to create a `.python-version` file that will tell `pyenv` which Python version to use when running commands in this directory:
2525
```bash
26-
pyenv local 3.8 3.9 3.10
26+
pyenv local 3.9 3.10 3.11
2727
```
2828

2929
This allows `tox` to create virtual environments using any of the Python versions listed in the `.python-version` file.
@@ -38,28 +38,22 @@ pip install -U tox
3838
A brief description of each managed `tox` environment can be found by running `tox list` or in the `tox.ini` file.
3939

4040
## Create a development environment
41-
Run the following command in your terminal to create a virtual environment in the `.venv` folder:
41+
Each development environment is roughly associated with a DBR LTS version. A list of base environments can be found in your `tox.ini` file, but the number specified below is only the version number.
42+
Run the following command in your terminal to create a virtual environment (you can replace "your_dbr_number" with something like 154, 143, etc.):
4243
```bash
43-
tox --devenv .venv -e {environment-name}
44+
make venv DBR={your_dbr_number}
4445
```
45-
The `—devenv` flag tells `tox` to create a development environment, and `.venv` is the folder where the virtual environment will be created. The `environment-name` is a reference to the environments that are listed in the `tox.ini` file. In general, these reflect various LTS versions of DBR.
4646

4747
## Environments we test
48-
The environments we test against are defined within the `tox.ini` file, and the requirements for those environments are stored in `python/tests/requirements`. The makeup of these environments is inspired by the [Databricks Runtime](https://docs.databricks.com/en/release-notes/runtime/index.html#) (hence the naming convention), but it's important to note that developing Databricks is **not** a requirement. We're simply mimicking some of the different runtime versions because (a) we recognize that much of the user base uses `tempo` on Databricks and (b) it saves development time spent trying to build out test environments with different versions of Python and PySpark from scratch.
49-
50-
## Run tests locally for one or more environments
51-
You can run tests locally for one or more environments defined enviornments without setting up a development environment first.
52-
53-
### To run tests for a single environment, use the `-e` flag followed by the environment name:
48+
Use the following to run tests for a given environment.
5449
```bash
55-
tox -e {environment-name}
50+
make test DBR={your_dbr_number}
5651
```
5752

58-
### To run tests for multiple environments, specify the environment names separated by commas:
53+
If you would like to run tests for all environments, use
5954
```bash
60-
tox -e {environment-name1, environment-name2, etc.}
55+
make test-all
6156
```
62-
This will run tests for all listed environments.
6357

6458
### Run additional checks locally
6559
`tox` has special environments for additional checks that must be performed as part of the PR process. These include formatting, linting, type checking, etc.
@@ -70,6 +64,17 @@ These environments are also defined in the `tox.ini`file and skip installing dep
7064
* build-docs
7165
* coverage-report
7266

67+
To run an individual check, use
68+
```bash
69+
make lint
70+
make type-check
71+
make {name_of_environment}
72+
```
73+
74+
To run all checks at once, run
75+
```bash
76+
make all-local
77+
```
7378
# Code style & Standards
7479

7580
The tempo project abides by [`black`](https://black.readthedocs.io/en/stable/index.html) formatting standards,

0 commit comments

Comments
 (0)