Skip to content

Using TypeAdapter for validation#43

Open
benbuc wants to merge 3 commits into
wesselhuising:mainfrom
benbuc:main
Open

Using TypeAdapter for validation#43
benbuc wants to merge 3 commits into
wesselhuising:mainfrom
benbuc:main

Conversation

@benbuc
Copy link
Copy Markdown

@benbuc benbuc commented Aug 22, 2025

Hello everyone,
thanks for the great Plug-In. I've been using it for a while now but now ran into a use case which was currently not implemented. Therefore I made this fork.

I wanted to make use of Discriminated Unions to allow type narrowing of some fields of the schema based on other fields.
Pydantic exposes the TypeAdapter for such use cases.

As everything is handled by the TypeAdapter, I removed SchemaTypes. I am not sure if this is ideal, but would be happy if you could take a look.

Please let me know what you think about the changes.
Thanks a lot and all the best!

Edit PS: Oh and I also did not touch the docs yet as I was too unsure about how everything works. I never edited RTD myself before, but would happily do so if needed. I would appreciate your guidance.

Partly in reference to this GitHub Issue: pydantic/pydantic#8833

Using TypeAdapter instead of the raw BaseModel to allow validating more complex types (e.g. discriminated union)
@wesselhuising
Copy link
Copy Markdown
Owner

wesselhuising commented Aug 29, 2025

Thank for your PR! I will try to find some time upcoming days to look at it. Also looping in @xaviernogueira here.

@xaviernogueira xaviernogueira added the enhancement New feature or request label Aug 29, 2025
Copy link
Copy Markdown
Collaborator

@xaviernogueira xaviernogueira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @benbuc ! I like this feature especially as it supports a standard pydantic behavior we were previously missing.

I added some comments/requests. Mainly I think three things:

  • _validate_schema_type should be a decorator since it is used in the first line of the three pandas extension functions.
  • We should handle specific exceptions, not except Exception.
  • I don't like moving to the Any type hint, as the function can't really take any input. If I set schema=1 it would fail despite playing along with the type hints. Validation within the function body does not mean you should loosen the type hint to include types that will cause failure IMO.

But other than that this all looks great thank you for contributing and @wesselhuising let me know if I forgot anything.

"""Raise TypeError if schema is not a valid pydantic model, type, or Union."""
try:
TypeAdapter(schema)
except Exception as e:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to handle the specific exception type you would expect for TypeAdapter(schema) failing in a normal way.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up: #43 (comment) the try/except logic can be removed in favor for a simple type check for PandasValidator inputs.

def validate(
self,
schema: BaseModel,
schema: Any,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you went with Any as opposed to SchemeTypes here. My hunch is you did this because you use self._validate_schema_type(), but handling incorrect input types in the function body does not negate having the actual type hint you need.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your response and sorry for taking the discussion out of the PR comments. I thought of it being more a fundamental question, but I will answer now inside the comments. I hope its not too confusing.

Do I understand correctly that what you are proposing basically means that I should revert the typing changes to their original state while keeping the internal changes? This way the default use case of pandantic - validating against BaseModel types - stays fully intact and type checked. Under the hood though, through the use of the TypeAdapter, more complex validation will be possible.

For my use case, this would mean that if I want to leverage these more complex schemas, I would need to either use # type: ignore or cast, because there is no way to express all possible valid Pydantic schemas explicitly in static typing (hence the reliance on Any at runtime).

Thank you already 😃

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw on vacation right now will respond next week!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @xaviernogueira I hope you had a great vacation. I kinda forgot about the PR as well. Do you have any news or novel ideas though concerning the remaining points?

Copy link
Copy Markdown
Collaborator

@xaviernogueira xaviernogueira Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbuc its all good, did slip thru.

For a data science oriented library, I stand by my view that :

  • the function signature should be informative as possible such that someone using JupyterLab with lower software engineering skills (analyst, scientist, etc) can leverage the LSP to understand things reasonably well without docs.
  • If it is included in the typehint, the function should work. Any is far to broad for this heuristic to work.

That said, I do see your concern, and I don't want people to have to use # type: ignore as it could lead to missing other errors on the same line. At first I thought maybe BaseModel | Any is a good compromise (maintains some LSP type hinting without forcing ignores).

However I think the better strategy is:

SchemaType = typing.TypeVar("SchemaType", bound=BaseModel)

# make the validation func return the TypeAdaptor (perhaps rename to "get_adapter"
def _validate_schema_type(self, schema: Any) -> TypeAdapter[SchemaType]:
    try:
        return TypeAdapter(schema)
   except Exception:
       ...

# use the func to get the adaptor in other cases
valid_schema: ValidSchema = self._validate_schema_type(schema: Any)
schema_validator = CoreValidator(schema)

# where CoreValidator __init__ takes TypeAdapter[SchemaType]

Haven't done all the linting here, but I do believe this should work fine, you may need to change things slightly, but I do think a TypeAdapter + TypeVar is the exact thing you are looking for.

if not isinstance(schema, type(BaseModel)):
raise TypeError("Arg `schema` must be a pydantic.BaseModel subclass!")

self._validate_schema_type(schema)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are using _validate_schema_type() in all these functions, consider using a decorator pattern. Would make things more clear and harder to mess up accidentally.

errors = "log"
else:
errors = "skip"
errors = "log" if verbose else "skip"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread src/pandantic/plugins/pandas.py Outdated
for row in self.obj.itertuples(name=None):
try:
_ = schema(**dict(zip(self.obj.columns, row[1:]))) # type: ignore
_ = adapter.validate_python(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not using the variable for anything, maybe we just drop the _. Out of scope for this PR tho so don't worry about it if you don't want to.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally would prefer to see it removed this PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me since he's touching these lines anyways.

@wesselhuising
Copy link
Copy Markdown
Owner

Looked into the comments and nothing really to add. Wondering about the performance difference in time to validate when including the TypeAdapter. The only thing; we will never know as we don't have any benchmark tests in place (at the time of writing). I do like the added functionality, I think this PR is very welcome.

@wesselhuising
Copy link
Copy Markdown
Owner

One thing; it does conflict with the last PR: https://github.com/wesselhuising/pandantic/pull/41/files. Not sure what the implication is here but the change was requested due to linting errors in pyright (#40).

@benbuc
Copy link
Copy Markdown
Author

benbuc commented Sep 3, 2025

Thanks everyone for your responses!
I looked into your comments and am in the process of incorporating the requested changes.

I am very unsure about the type hints and validation decorator though. I am wondering if we need the decorator at all if we are validating using TypeAdapter.

So the official docs for TypeAdapter list the Any type.
Inspecting the pydantic source code for the TypeAdapter, I found the following constructors (I removed other parameters for brevity):

T = TypeVar('T')

@final
class TypeAdapter(Generic[T]):
    @overload
    def __init__(self, type: type[T]) -> None: ...

    # This second overload is for unsupported special forms (such as Annotated, Union, etc.)
    # Currently there is no way to type this correctly
    # See https://github.com/python/typing/pull/1618
    @overload
    def __init__(self, type: Any) -> None: ...

    def __init__(self, type: Any) -> None:
        <actual implementation>

I found the comment especially interesting.
I removed the SchemaTypes because I felt like having to explicitly list all possible types that are able to be validated did not feel right and tried to imitate the pattern that TypeAdapter itself used. Therefore I stuck with Any (from the docs).

Now concerning the decorator and the Exception. As validate_schema_type only instantiates a TypeAdapter and checks for errors, maybe the decorator should be removed completely? It will get instantiated eventually in the constructor of the PandasValidator. Maybe potential exceptions should be handled by the user of the plug-in? This way the specific exceptions will pass through.

I would be glad to hear your opinions.

@xaviernogueira
Copy link
Copy Markdown
Collaborator

xaviernogueira commented Sep 9, 2025

@benbuc side note but next time let's keep discussion attached to the PR comments so things can be dealt with / resolved one by one, but no biggie thanks for the detailed response! Point by point:

On the Any typehint -> I see your point and think it applies for the program internals/base classes likeCoreValidator, which does not need to be opinionated about what is passed in as long as incorrect types are handled by the actual validator implementations. But for PandasValidator and PydanticAccessor I would like the type hint changed to back to SchemaTypes for the following reasons:

  • Philosophically/IMO whatTypeAdapter can accept is not 1-to-1 with the higher-level API we are providing, which should typehint the expected inputs the program is designed to handle in a deterministic way.
  • One this line of PandasValidator
    col for col in dataframe.columns if col not in self.schema.model_fields.keys()
    , we access the .model_fields attribute, which would throw an unhandled AttributeError if a non-SchemaTypes value is set to self.schema.
  • Finally, considering this will be used in data science / analyst contexts frequently (notebooks, lower technicality) providing useful typehints for LSPs is a legitimate value add to what imagine many users look like.
  • TLDR: I see your point but would like you to change it back for user-facing interfaces where unhandled exceptions will arise by passing in a non SchemaTypes arg.

On the decorator -> I agree that in the context you describe removing it entirely makes sense. After reassessing, but considering we want to avoid unhandled errors that will be opaque to end-users (like a class attribute derived AttributeError), I do think we should raise a handled exception if an incorrect input is passed in. IMO i like keeping the type checking logic at the "gate" of the component where the unhandle-able exception could be thrown, which in this case would be PandasValidator.__init__ with some isinstance(schema, SchemaTypes) type logic.

  • TLDR: do you mind just doing a SchemaTypes isinstance type check in PandasValidator.__init__ (I can do this too if you would like).

"""An implementation of the Pydantic BaseValidator."""

def __init__(self, schema: SchemaTypes):
def __init__(self, schema: Any):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up #43 (comment) this change makes sense here for the library internal, pushing input specificity down to the actual implementation classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants