Using TypeAdapter for validation#43
Conversation
Using TypeAdapter instead of the raw BaseModel to allow validating more complex types (e.g. discriminated union)
|
Thank for your PR! I will try to find some time upcoming days to look at it. Also looping in @xaviernogueira here. |
There was a problem hiding this comment.
Very nice @benbuc ! I like this feature especially as it supports a standard pydantic behavior we were previously missing.
I added some comments/requests. Mainly I think three things:
_validate_schema_typeshould be a decorator since it is used in the first line of the threepandasextension functions.- We should handle specific exceptions, not
except Exception. - I don't like moving to the
Anytype hint, as the function can't really take any input. If I setschema=1it would fail despite playing along with the type hints. Validation within the function body does not mean you should loosen the type hint to include types that will cause failure IMO.
But other than that this all looks great thank you for contributing and @wesselhuising let me know if I forgot anything.
| """Raise TypeError if schema is not a valid pydantic model, type, or Union.""" | ||
| try: | ||
| TypeAdapter(schema) | ||
| except Exception as e: |
There was a problem hiding this comment.
Let's try to handle the specific exception type you would expect for TypeAdapter(schema) failing in a normal way.
There was a problem hiding this comment.
Follow up: #43 (comment) the try/except logic can be removed in favor for a simple type check for PandasValidator inputs.
| def validate( | ||
| self, | ||
| schema: BaseModel, | ||
| schema: Any, |
There was a problem hiding this comment.
Is there a reason you went with Any as opposed to SchemeTypes here. My hunch is you did this because you use self._validate_schema_type(), but handling incorrect input types in the function body does not negate having the actual type hint you need.
There was a problem hiding this comment.
Thank you for your response and sorry for taking the discussion out of the PR comments. I thought of it being more a fundamental question, but I will answer now inside the comments. I hope its not too confusing.
Do I understand correctly that what you are proposing basically means that I should revert the typing changes to their original state while keeping the internal changes? This way the default use case of pandantic - validating against BaseModel types - stays fully intact and type checked. Under the hood though, through the use of the TypeAdapter, more complex validation will be possible.
For my use case, this would mean that if I want to leverage these more complex schemas, I would need to either use # type: ignore or cast, because there is no way to express all possible valid Pydantic schemas explicitly in static typing (hence the reliance on Any at runtime).
Thank you already 😃
There was a problem hiding this comment.
Btw on vacation right now will respond next week!
There was a problem hiding this comment.
Hi @xaviernogueira I hope you had a great vacation. I kinda forgot about the PR as well. Do you have any news or novel ideas though concerning the remaining points?
There was a problem hiding this comment.
@benbuc its all good, did slip thru.
For a data science oriented library, I stand by my view that :
- the function signature should be informative as possible such that someone using JupyterLab with lower software engineering skills (analyst, scientist, etc) can leverage the LSP to understand things reasonably well without docs.
- If it is included in the typehint, the function should work.
Anyis far to broad for this heuristic to work.
That said, I do see your concern, and I don't want people to have to use # type: ignore as it could lead to missing other errors on the same line. At first I thought maybe BaseModel | Any is a good compromise (maintains some LSP type hinting without forcing ignores).
However I think the better strategy is:
SchemaType = typing.TypeVar("SchemaType", bound=BaseModel)
# make the validation func return the TypeAdaptor (perhaps rename to "get_adapter"
def _validate_schema_type(self, schema: Any) -> TypeAdapter[SchemaType]:
try:
return TypeAdapter(schema)
except Exception:
...
# use the func to get the adaptor in other cases
valid_schema: ValidSchema = self._validate_schema_type(schema: Any)
schema_validator = CoreValidator(schema)
# where CoreValidator __init__ takes TypeAdapter[SchemaType]Haven't done all the linting here, but I do believe this should work fine, you may need to change things slightly, but I do think a TypeAdapter + TypeVar is the exact thing you are looking for.
| if not isinstance(schema, type(BaseModel)): | ||
| raise TypeError("Arg `schema` must be a pydantic.BaseModel subclass!") | ||
|
|
||
| self._validate_schema_type(schema) |
There was a problem hiding this comment.
Since you are using _validate_schema_type() in all these functions, consider using a decorator pattern. Would make things more clear and harder to mess up accidentally.
| errors = "log" | ||
| else: | ||
| errors = "skip" | ||
| errors = "log" if verbose else "skip" |
| for row in self.obj.itertuples(name=None): | ||
| try: | ||
| _ = schema(**dict(zip(self.obj.columns, row[1:]))) # type: ignore | ||
| _ = adapter.validate_python( |
There was a problem hiding this comment.
Since we are not using the variable for anything, maybe we just drop the _. Out of scope for this PR tho so don't worry about it if you don't want to.
There was a problem hiding this comment.
Personally would prefer to see it removed this PR.
There was a problem hiding this comment.
Makes sense to me since he's touching these lines anyways.
|
Looked into the comments and nothing really to add. Wondering about the performance difference in time to validate when including the TypeAdapter. The only thing; we will never know as we don't have any benchmark tests in place (at the time of writing). I do like the added functionality, I think this PR is very welcome. |
|
One thing; it does conflict with the last PR: https://github.com/wesselhuising/pandantic/pull/41/files. Not sure what the implication is here but the change was requested due to linting errors in pyright (#40). |
|
Thanks everyone for your responses! I am very unsure about the type hints and validation decorator though. I am wondering if we need the decorator at all if we are validating using So the official docs for TypeAdapter list the T = TypeVar('T')
@final
class TypeAdapter(Generic[T]):
@overload
def __init__(self, type: type[T]) -> None: ...
# This second overload is for unsupported special forms (such as Annotated, Union, etc.)
# Currently there is no way to type this correctly
# See https://github.com/python/typing/pull/1618
@overload
def __init__(self, type: Any) -> None: ...
def __init__(self, type: Any) -> None:
<actual implementation>I found the comment especially interesting. Now concerning the decorator and the I would be glad to hear your opinions. |
|
@benbuc side note but next time let's keep discussion attached to the PR comments so things can be dealt with / resolved one by one, but no biggie thanks for the detailed response! Point by point: On the
On the decorator -> I agree that in the context you describe removing it entirely makes sense. After reassessing, but considering we want to avoid unhandled errors that will be opaque to end-users (like a class attribute derived
|
| """An implementation of the Pydantic BaseValidator.""" | ||
|
|
||
| def __init__(self, schema: SchemaTypes): | ||
| def __init__(self, schema: Any): |
There was a problem hiding this comment.
Follow up #43 (comment) this change makes sense here for the library internal, pushing input specificity down to the actual implementation classes.
Hello everyone,
thanks for the great Plug-In. I've been using it for a while now but now ran into a use case which was currently not implemented. Therefore I made this fork.
I wanted to make use of Discriminated Unions to allow type narrowing of some fields of the schema based on other fields.
Pydantic exposes the TypeAdapter for such use cases.
As everything is handled by the TypeAdapter, I removed
SchemaTypes. I am not sure if this is ideal, but would be happy if you could take a look.Please let me know what you think about the changes.
Thanks a lot and all the best!
Edit PS: Oh and I also did not touch the docs yet as I was too unsure about how everything works. I never edited RTD myself before, but would happily do so if needed. I would appreciate your guidance.
Partly in reference to this GitHub Issue: pydantic/pydantic#8833