-
Notifications
You must be signed in to change notification settings - Fork 57
Allow more column types to be interpolated #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…the specific method used
ace43dd
to
6021d6c
Compare
Hey @BrianDeacon - Given the Databricks License and that our Databricks Labs workflow broke to accept external contributions; I'm not sure if this can be merged. I am checking on options for you |
I mean, I'm pretty sure this just fell off a truck, and you wrote it @R7L208 ;) |
@BrianDeacon, if you're still interested in authoring this, could you email me the below info to [email protected] to get you whitelisted?
|
I've emailed Lorin. Is the static ip address a requirement? Lol, you're Lorin. :) |
#425 should fix the mypy errors. Once that's merged then you can pull the latest from |
@BrianDeacon - #425 was merged into master. Can you update your fork & branch? That PR should resolve the lint issues that are unrelated to your changes |
Done! |
@R7L208 Anything else? |
Looks like the pipeline uncovered that I had some unit test code that wasn't compatible with older versions of pyspark. That's fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a few small things but otherwise looks great!
python/tempo/interpol.py
Outdated
|
||
if not self._is_valid_method_for_column(series, method, target_col): | ||
raise ValueError( | ||
f"Interpolation method '{method}' is not supported for column '{target_col}' of type '{series.schema[target_col].dataType}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BrianDeacon - can you clarify the error message to indicate the column must be of NumericType but instead received a non-numeric type of {}
python/tests/interpol_tests.py
Outdated
ValueError, | ||
self.interpolate_helper.interpolate, | ||
simple_input_tsdf, | ||
freq="30 seconds", func="ceil", method="linear", ts_col="event_ts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be method="zero"
IIUC. Can you update and make sure test still passes?
python/tests/base.py
Outdated
if "decimal_convert" in self.df: | ||
for decimal_col in self.df["decimal_convert"]: | ||
if "." in date_col: | ||
col, field = date_col.split(".") | ||
convert_field_expr = sfn.col(col).getField(field).cast("decimal") | ||
df = df.withColumn( | ||
col, sfn.col(col).withField(field, convert_field_expr) | ||
) | ||
else: | ||
df = df.withColumn(decimal_col, sfn.col(decimal_col).cast("decimal")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is a bug here and you need decimal_col
inside of the for loop instead of date_col
for the if
expression and call of split()
. Can you double check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Live by the clipboard, die by the clipboard. :) Fixed.
Can you sanity check me? As far as I can tell, none of the test configs pass in data that would hit these if "."
branches. I'm guessing these got pulled in from some other code base? I'm not suggesting to yank it out, but I just wanted to make sure I wasn't misunderstanding how this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in the interpol tests to manage decimal accuracy when converting data to DecimalType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do the other variations ever run into a column with a "." in the name? These _convert variations are all to process these entries, right? So I just got "lucky" that none of these bits in the config had a . in the name?
"date_convert": ["date_col"],
@R7L208 Ready for another look. 👍 |
python/tests/base.py
Outdated
@@ -176,7 +176,7 @@ def as_sdf(self) -> DataFrame: | |||
if "decimal_convert" in self.df: | |||
for decimal_col in self.df["decimal_convert"]: | |||
if "." in date_col: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if "." in date_col: | |
if "." in decimal_col: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:face_palm:
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final change i think as long as test pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @BrianDeacon!
linear
andzero
interpolation methods continue to require numeric column types, butffill
,bfill
, andnull
will work on any column type.Changes
Rather than a blanket rejection of all non-numeric column types, the requirements are applied on a per-column basis depending on the interpolation method required. ValueError is still thrown when the column type doesn't work, but the check is done at the time of attempting to interpolate the column.
Linked issues
Resolves #420
Functionality
...
...
...
...
...
Tests