You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Track initialized and used columns in each processor
During 2026-03 invoicing, a bug was found where the columns initialized
by the New-PI credit processor (i.e `PI Balance` column), was
being accessed by the PI-SU processor before it was
initialized, causing an KeyError.
To fix this, the codebase has been refactored to allow each processor to
explicitly document which columns they initialize and use, defined in two
new properties, `initializes_columns` and `operates_on_columns`. A helper
function `_init_columns()` is added to initalize columns
Unit test `tests/unit/processors/test_processor_list.py` is added to check each processor
only uses columns that itself or previous processors initialized, and no
column is initialized more than once
Additionally, each column will now be encapsulated as a `InvoiceColumn` instance.
`InvoiceColumn` contains the name, datatype, and default values for each column
This will also enable stricter and clearer type enforcement for data entering
and leaving the pipeline
A new processor `ValidateInputColumnsProcessor` is added to check the input
dataframe to the processing pipeline has prerequisite columns, and to cast
to appropriate types
The e2e test data has been updated to surface the bug that was found.
It did not failed during the PR that introduced the bug [1] because
the test data didn't have the right conditions to trigger the PI-SU processor
Refactored unit tests to accomodate the new
processor by adding a new base test class.
[1] #279
0 commit comments