Add understanding TypeVars page

delfick · delfick · commit a1c7bfb6e2f1 · 2025-08-01T14:43:40.000+10:00
diff --git a/docs/advice/advice/understanding_typevars.rst b/docs/advice/advice/understanding_typevars.rst
@@ -0,0 +1,354 @@
+.. _understanding_typevars:
+
+Understanding TypeVars
+======================
+
+Static type checking is by definition a static system of checks, which means
+anything that is determined at runtime is irrelevant to the type checker!
+However it can be necessary for the types discovered by the linter to depend on
+how a piece of code is called.
+
+For example, it can be common to define code that supports many different
+types but only operates on one of those types at a time and this is the situation
+where a TypeVar becomes very useful.
+
+Essentially, a TypeVar is an annotation where the concrete type it represents
+is determined by the caller rather than by the implementation.
+
+For example
+
+.. code-block:: python
+
+    import attrs
+
+    @attrs.frozen
+    class Container[T_Item]:
+        item: T_Item
+
+        def get_item_twice(self) -> tuple[T_Item, T_Item]:
+            return (self.item, self.item)
+
+    # Revealed type is "tuple[builtins.str, builtins.str]"
+    reveal_type(Container("asdf").get_item_twice())
+
+    # Revealed type is "tuple[builtins.int, builtins.int]"
+    reveal_type(Container(1).get_item_twice())
+
+    # Revealed type is "tuple[builtins.bool, builtins.bool]"
+    reveal_type(Container(True).get_item_twice())
+
+Think of it like simple algebra where we need to "solve for X".
+
+.. note::
+
+    ❗ A TypeVar needs to be **bound** which means any time there is a TypeVar in
+    an output, there needs to be a chance for the caller to provide the specific
+    type that the type var represents in that context.
+
+    So either as part of the definition of the enclosing class:
+
+    .. code-block:: python
+
+        class MyProtocol[T_Item](Protocol):
+            @property
+            def item(self) -> T_Item: ...
+            
+            
+        @attrs.frozen
+        class MyBaseClass[T_Item]:
+            item: T_Item
+
+    Or as an input to the function returning the TypeVar:
+
+    .. code-block:: python
+
+        class MyProtocol(Protocol):
+            def process[T_Item](self, item: T_Item) -> T_Item: ...
+            
+            
+        def process[T_Item](item: T_Item) -> T_Item:
+            raise NotImplementedError()
+
+        def extract[T_Item](container: Container[T_Item]) -> T_Item:
+            raise NotImplementedError()
+
+TypeVars also have a concept called "variance". This is relevant when you have
+a bound type variable (like ``T_Vehicle``) and some specific implementations
+of it (like ``Car`` or ``Bicycle``), which, alongside the basic features
+guaranteed by ``Vehicle``, also have some "extra" attributes and methods;
+an extra "API surface", if you will.
+
+There are three types of this variance:
+
+- **covariant** - extra API surface is kept
+- **contravariant** - extra API surface is forgotten
+- **invariant** - is neither just covariant or just contravariant
+
+When you aren't using `PEP 695 <https://peps.python.org/pep-0695/>`_
+(inline squarebrackets) syntax or ``infer_variance=True`` when creating the
+TypeVar, mypy will enforce the variance of the TypeVar based on:
+
+- Inputs are always contravariant
+- Outputs are always covariant
+
+This means:
+
+- A type var that's only ever an output must be defined with ``covariant=True``
+- A type var that's only ever an input must be defined with ``contravariant=True``
+- A type var that appears as an input and as an output cannot be covariant
+  or contravariant, so it must be invariant.
+
+.. note::
+
+  🤔 Note that a TypeVar is only necessary when the API surface of an object
+  changes based on the implementation. If a different implementation doesn't
+  change what attributes or methods are available, then it can be represented
+  as a Protocol.
+
+Giving an upper bound to a ``TypeVar``
+--------------------------------------
+
+By default a TypeVar statically has no methods or attributes on it at the point
+it appears to the linter as a TypeVar.
+
+.. code-block:: python
+
+    def pass_through[T_Item](item: T_Item) -> T_Item:
+        # at this point item statically has no attributes or methods on it
+        return item
+        
+    my_variable: int = 1
+    # my_variable has all the attributes/methods that an int has
+
+    after_pass_through = pass_through(my_variable)
+    # after_pass_through is typed as the type use for the input, so it also
+    # statically has all the attributes/methods that an int has
+
+It's possible to make it so that the code thinking about the TypeVar has
+specific attributes and methods that are statically guaranteed to be available:
+
+.. code-block:: python
+
+    from typing import Protocol
+
+    class HasProcess(Protocol):
+        def process(self) -> None: ...
+        
+    def pass_through[T_Item: HasProcess](item: T_Item) -> T_Item:
+        # Our type var is bound to ``HasProcess``
+        # This means whilst the object passed in is allowed to have any number
+        # of additional attributes and methods, we are guaranteed that it at least
+        # has a "process" method that takes in no arguments and returns None
+        item.process()
+        return item
+        
+    # Error, int has no "process" on it!
+    pass_through(1)
+
+    @attrs.define
+    class MyProcessor:
+        number: int
+        
+        def process(self) -> None:
+            print(self.number)
+            
+    # Valid because an instance of MyProcessor has a method on it
+    # called "process" that can be called with no paramaters and returns a None
+    pass_through(MyProcessor(10))
+
+There is also the ability to constrain the TypeVar to exact types rather than
+subtypes of the upper bound (as long as you have at least two constraints):
+
+.. code-block:: python
+
+    from typing import TypeVar
+
+    T_Item = TypeVar("T_Item", str, int)
+
+    # Or with PEP 695
+    def item_twice[T_Item: (str, int)](item: T_Item) -> T_Item:
+        return item + item
+
+The reason to do this rather than just saying the function takes and returns
+``str | int`` is that this way we know that if we pass in eg a ``str`` we
+definitely get a ``str`` back.
+
+Prefer to not use contravariant type vars
+-----------------------------------------
+
+As a general rule contravariant type vars impose restrictions when extending
+classes and can be avoided by replacing:
+
+.. code-block:: python
+
+    from typing import Protocol
+
+    class Thing[T_Item: Item](Protocol):
+        def do_something(self, item: T_Item) -> None: ...
+
+With:
+
+.. code-block:: python
+
+    from typing import Protocol
+
+    class Thing[T_Item: Item](Protocol):
+        @property
+        def item(self) -> T_Item: ...
+
+        def do_something(self) -> None: ...
+
+Such that the changeable API surface being acted on is separate from the
+signature of the function doing the action.
+
+As mentioned in :ref:`understanding_annotations`, a contravariant type var is
+used to represent a value where extra API surface is always dropped:
+
+.. code-block:: python
+
+    import dataclasses
+    from typing import TYPE_CHECKING, Protocol, TypeVar, cast
+
+    T_COT_Item = TypeVar("T_COT_Item", contravariant=True)
+
+
+    class Recorder(Protocol[T_COT_Item]):
+        def record(self, item: T_COT_Item) -> None: ...
+
+
+    @dataclasses.dataclass(frozen=True)
+    class ItemA:
+        a: int
+
+
+    @dataclasses.dataclass(frozen=True)
+    class ItemB(ItemA):
+        b: int
+
+
+    class RecorderA:
+        def record(self, item: ItemA) -> None:
+            print(item.a)
+
+
+    class RecorderB:
+        def record(self, item: ItemB) -> None:
+            print(item.a, item.b)
+
+
+    def record_things(recorder: Recorder[ItemA]) -> None:
+        recorder.record(ItemA(a=1))
+
+
+    # This fails because Recorder[ItemB] cannot be used where Recorder[ItemA] is required
+    record_things(RecorderB())
+
+    if TYPE_CHECKING:
+        _RA: Recorder[ItemA] = cast(RecorderA, None)
+        _RB: Recorder[ItemB] = cast(RecorderB, None)
+
+An alternative pattern is to create an intermediary object that is specific to
+what is being operated on. This is a bit of a subtle distinction in this example,
+but it would look like this:
+
+.. code-block:: python
+
+    import dataclasses
+    from typing import TYPE_CHECKING, Protocol, cast
+
+
+    class Recorder(Protocol):
+        def record(self) -> None: ...
+
+
+    @dataclasses.dataclass(frozen=True)
+    class ItemA:
+        a: int
+
+
+    class ItemARecorder:
+        item: ItemA
+
+        def record(self) -> None:
+            print(self.item.a)
+
+
+    @dataclasses.dataclass(frozen=True)
+    class ItemB(ItemA):
+        b: int
+
+
+    class ItemBRecorder:
+        item: ItemB
+
+        def record(self) -> None:
+            print(self.item.a, self.item.b)
+
+
+    def record_things(recorder: Recorder) -> None:
+        recorder.record()
+
+
+    record_things(ItemARecorder(item=ItemA(a=1)))
+    record_things(ItemBRecorder(item=ItemB(a=1, b=5)))
+
+    if TYPE_CHECKING:
+        _RA: Recorder = cast(ItemARecorder, None)
+        _RB: Recorder = cast(ItemBRecorder, None)
+
+If this is the extent of requirements then it's likely reasonable to only need
+to have a record method directly on the items themselves, but it's easy to imagine
+a scenario where there's a 1:n relationship between item and "recording" functionality
+and this pattern lets us separate the action of this "record" from what that
+actually means so that it's the caller that controls what that means rather
+than the orchestrator.
+
+.. note::
+
+    Note that in both these situations, we are able to represent the two sides
+    of the design coin such that the implementation is generic and the usage is
+    not
+
+    .. code-block:: python
+
+        import dataclasses
+        from typing import TYPE_CHECKING, Protocol, TypeVar, cast
+
+        T_CO_Item = TypeVar("T_CO_Item", covariant=True)
+
+
+        class ForImplementation(Protocol[T_CO_Item]):
+            @property
+            def item(self) -> T_CO_Item: ...
+
+            def do_something(self) -> None: ...
+
+
+        class ForUse(Protocol):
+            def do_something(self) -> None: ...
+
+
+        @dataclasses.dataclass(frozen=True)
+        class Implementation:
+            item: MyItem
+
+            def do_something(self) -> None:
+                self.item.take_over_the_world()
+
+
+        if TYPE_CHECKING:
+            _FI: ForImplementation[MyItem] = cast(Implementation, None)
+            _FU: ForUse = cast(Implementation, None)
+
+Sharing TypeVars
+----------------
+
+There are many cases where it's not necessary to share TypeVars, but there are
+two scenarios where it can be useful:
+
+- If the TypeVar is bound to a particular type or has a default
+- When the TypeVar is used to create a class that is intended to be subclass'd
+
+In both of these scenarios it can reduce problems around drift to be using a
+common definition of the TypeVar so that shared uses update when the shape of
+the TypeVar change.
diff --git a/docs/advice/index.rst b/docs/advice/index.rst
@@ -5,3 +5,4 @@ Kraken Static Typing Advice
    :maxdepth: 1
 
    advice/understanding_annotations
+   advice/understanding_typevars

Original file line number	Diff line number	Diff line change
`@@ -5,3 +5,4 @@ Kraken Static Typing Advice`
`5`	`5`	`:maxdepth: 1`
`6`	`6`
`7`	`7`	`advice/understanding_annotations`
	`8`	`+ advice/understanding_typevars`