Skip to content

Commit a7feadb

Browse files
authored
docs: improve readme with additional examples (#20)
1 parent 5760b1d commit a7feadb

File tree

6 files changed

+337
-92
lines changed

6 files changed

+337
-92
lines changed

README.md

Lines changed: 291 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,9 @@
1-
# Python Pattern Matching and Object Validation
2-
1+
# Performant Python Pattern Matching and Object Validation
32

43
Reusable pattern matching for Python, implemented in Cython.
54
I originally developed this system for the Ibis Project but
65
hopefully it can be useful for others as well.
76

8-
97
The implementation aims to be as quick as possible, the pure
108
python implementation is already quite fast but taking advantage
119
of Cython allows to mitigate the overhead of the Python
@@ -14,7 +12,8 @@ I have also tried to use PyO3 but it had higher overhead than
1412
Cython. The current implementation uses the pure python mode
1513
of cython allowing quick iteration and testing, and then it
1614
can be cythonized and compiled to an extension module giving
17-
a significant speedup.
15+
a significant speedup. Benchmarks shows more than 2x speedup
16+
over pydantic's model validation which is written in Rust.
1817

1918

2019
## Library components
@@ -48,10 +47,10 @@ natural way.
4847

4948
### 2. Pattern matchers which operate on various Python objects
5049

51-
Patterns are the heart of the library, they allow searching for
52-
specific structures in Python objects. The library provides an
53-
extensible yet simple way to define patterns and match values
54-
against them.
50+
Patterns are the heart of the library, they allow **searching**
51+
and **replacing** specific structures in Python objects. The
52+
library provides an extensible yet simple way to define patterns
53+
and match values against them.
5554

5655
```py
5756
In [1]: from koerce import match, NoMatch, Anything
@@ -66,26 +65,66 @@ Out[4]: {'a': 5}
6665
```
6766

6867
```py
69-
In [1]: from dataclasses import dataclass
70-
71-
In [2]: @dataclass
72-
...: class B:
73-
...: x: int
74-
...: y: int
75-
...: z: float
76-
...:
68+
from dataclasses import dataclass
69+
from koerce import Object, match
7770

78-
In [3]: match(Object(B, y=1, z=2), B(1, 1, 2))
79-
Out[3]: B(x=1, y=1, z=2)
71+
@dataclass
72+
class B:
73+
x: int
74+
y: int
75+
z: float
8076

81-
In [4]: Object(B, y=1, z=2)
82-
Out[4]: ObjectOf2(<class '__main__.B'>, 'y'=EqValue(1), 'z'=EqValue(2))
77+
match(Object(B, y=1, z=2), B(1, 1, 2))
78+
# B(x=1, y=1, z=2)
8379
```
8480

8581
where the `Object` pattern checks whether the passed object is
8682
an instance of `B` and `value.y == 1` and `value.z == 2` ignoring
8783
the `x` field.
8884

85+
Patterns are also able to capture values as variables making the
86+
matching process more flexible:
87+
88+
```py
89+
from koerce import var
90+
91+
x = var("x")
92+
93+
# `+x` means to capture that object argument as variable `x`
94+
# then the `z` argument must match that captured value
95+
match(Object(B, +x, z=x), B(1, 2, 1))
96+
# it is a match because x and z are equal: B(x=1, y=2, z=1)
97+
98+
match(Object(B, +x, z=x), B(1, 2, 0))
99+
# is is a NoMatch because x and z are unequal
100+
```
101+
102+
Patterns also suitable for match and replace tasks because they
103+
can produce new values:
104+
105+
```py
106+
# >> operator constructs a `Replace` pattern where the right
107+
# hand side is a deferred object
108+
match(Object(B, +x, z=x) >> (x, x + 1), B(1, 2, 1))
109+
# result: (1, 2)
110+
```
111+
112+
Patterns are also composable and can be freely combined using
113+
overloaded operators:
114+
115+
```py
116+
In [1]: from koerce import match, Is, Eq, NoMatch
117+
118+
In [2]: pattern = Is(int) | Is(str)
119+
...: assert match(pattern, 1) == 1
120+
...: assert match(pattern, "1") == "1"
121+
...: assert match(pattern, 3.14) is NoMatch
122+
123+
In [3]: pattern = Is(int) | Eq(1)
124+
...: assert match(pattern, 1) == 1
125+
...: assert match(pattern, None) is NoMatch
126+
```
127+
89128
Patterns can also be constructed from python typehints:
90129

91130
```py
@@ -177,6 +216,237 @@ In [5]: {a: 1}
177216
Out[5]: {MyClass(x=1, y=2.0, z=('a', 'b')): 1}
178217
```
179218

219+
## Available Pattern matchers
220+
221+
It is an incompletee list of the matchers, for more details and
222+
examples see `koerce/patterns.py` and `koerce/tests/test_patterns.py`.
223+
224+
### `Anything` and `Nothing`
225+
226+
```py
227+
In [1]: from koerce import match, Anything, Nothing
228+
229+
In [2]: match(Anything(), "a")
230+
Out[2]: 'a'
231+
232+
In [3]: match(Anything(), 1)
233+
Out[3]: 1
234+
235+
In [4]: match(Nothing(), 1)
236+
Out[4]: koerce._internal.NoMatch
237+
```
238+
239+
### `Eq` for equality matching
240+
241+
```py
242+
In [1]: from koerce import Eq, match, var
243+
244+
In [2]: x = var("x")
245+
246+
In [3]: match(Eq(1), 1)
247+
Out[3]: 1
248+
249+
In [4]: match(Eq(1), 2)
250+
Out[4]: koerce._internal.NoMatch
251+
252+
In [5]: match(Eq(x), 2, context={"x": 2})
253+
Out[5]: 2
254+
255+
In [6]: match(Eq(x), 2, context={"x": 3})
256+
Out[6]: koerce._internal.NoMatch
257+
```
258+
259+
### `Is` for instance matching
260+
261+
Couple simple cases are below:
262+
263+
```py
264+
In [1]: from koerce import match, Is
265+
266+
In [2]: class A: pass
267+
268+
In [3]: match(Is(A), A())
269+
Out[3]: <__main__.A at 0x1061070e0>
270+
271+
In [4]: match(Is(A), "A")
272+
Out[4]: koerce._internal.NoMatch
273+
274+
In [5]: match(Is(int), 1)
275+
Out[5]: 1
276+
277+
In [6]: match(Is(int), 3.14)
278+
Out[6]: koerce._internal.NoMatch
279+
280+
In [7]: from typing import Optional
281+
282+
In [8]: match(Is(Optional[int]), 1)
283+
Out[8]: 1
284+
285+
In [9]: match(Is(Optional[int]), None)
286+
```
287+
288+
Generic types are also supported by checking types of attributes / properties:
289+
290+
```py
291+
from koerce import match, Is, NoMatch
292+
from typing import Generic, TypeVar, Any
293+
from dataclasses import dataclass
294+
295+
296+
T = TypeVar("T", covariant=True)
297+
S = TypeVar("S", covariant=True)
298+
299+
@dataclass
300+
class My(Generic[T, S]):
301+
a: T
302+
b: S
303+
c: str
304+
305+
306+
MyAlias = My[T, str]
307+
308+
b_int = My(1, 2, "3")
309+
b_float = My(1, 2.0, "3")
310+
b_str = My("1", "2", "3")
311+
312+
# b_int.a must be an instance of int
313+
# b_int.b must be an instance of Any
314+
assert match(My[int, Any], b_int) is b_int
315+
316+
# both b_int.a and b_int.b must be an instance of int
317+
assert match(My[int, int], b_int) is b_int
318+
319+
# b_int.b should be an instance of a float but it isn't
320+
assert match(My[int, float], b_int) is NoMatch
321+
322+
# now b_float.b is actually a float so it is a match
323+
assert match(My[int, float], b_float) is b_float
324+
325+
# type aliases are also supported
326+
assert match(MyAlias[str], b_str) is b_str
327+
```
328+
329+
### `As` patterns attempting to coerce the value as the given type
330+
331+
```py
332+
from koerce import match, As, NoMatch
333+
from typing import Generic, TypeVar, Any
334+
from dataclasses import dataclass
335+
336+
class MyClass:
337+
pass
338+
339+
class MyInt(int):
340+
@classmethod
341+
def __coerce__(cls, other):
342+
return MyInt(int(other))
343+
344+
345+
class MyNumber(Generic[T]):
346+
value: T
347+
348+
def __init__(self, value):
349+
self.value = value
350+
351+
@classmethod
352+
def __coerce__(cls, other, T):
353+
return cls(T(other))
354+
355+
356+
assert match(As(int), 1.0) == 1
357+
assert match(As(str), 1.0) == "1.0"
358+
assert match(As(float), 1.0) == 1.0
359+
assert match(As(MyClass), "myclass") is NoMatch
360+
361+
# by implementing the coercible protocol objects can be transparently
362+
# coerced to the given type
363+
assert match(As(MyInt), 3.14) == MyInt(3)
364+
365+
# coercible protocol also supports generic types where the `__coerce__`
366+
# method should be implemented on one of the base classes and the
367+
# type parameters are passed as keyword arguments to `cls.__coerce__()`
368+
assert match(As(MyNumber[float]), 8).value == 8.0
369+
```
370+
371+
`As` and `Is` can be omitted because `match()` tries to convert its
372+
first argument to a pattern using the `koerce.pattern()` function:
373+
374+
```py
375+
from koerce import pattern
376+
377+
assert pattern(int, allow_coercion=False) == Is(int)
378+
assert pattern(int, allow_coercion=True) == As(int)
379+
380+
assert match(int, 1, allow_coercion=False) == 1
381+
assert match(int, 1.1, allow_coercion=False) is NoMatch
382+
assert match(int, 1.1, allow_coercion=True) == 1
383+
384+
# default is allow_coercion=False
385+
assert match(int, 1.1) is NoMatch
386+
```
387+
388+
### `If` patterns for conditionals
389+
390+
### `Custom`
391+
392+
### `Capture`
393+
394+
### `Replace`
395+
396+
### `SequenceOf` / `ListOf` / `TupleOf`
397+
398+
### `MappingOf` / `DictOf` / `FrozenDictOf`
399+
400+
### `PatternList`
401+
402+
### `PatternMap`
403+
404+
405+
## Performance
406+
407+
`koerce`'s performance is at least comparable to `pydantic`'s performance.
408+
`pydantic-core` is written in rust using the `PyO3` bindings making it
409+
a pretty performant library. There is a quicker validation / serialization
410+
library from `Jim Crist-Harif` called [msgspec](https://github.com/jcrist/msgspec)
411+
implemented in hand-crafted C directly using python's C API.
412+
413+
`koerce` is not exactly like `pydantic` or `msgpec` but they are good
414+
candidates to benchmark against:
415+
416+
```
417+
koerce/tests/test_y.py::test_pydantic PASSED
418+
koerce/tests/test_y.py::test_msgspec PASSED
419+
koerce/tests/test_y.py::test_annotated PASSED
420+
421+
422+
------------------------------------------------------------------------------------------- benchmark: 3 tests ------------------------------------------------------------------------------------------
423+
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
424+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
425+
test_msgspec 230.2801 (1.0) 6,481.4200 (1.60) 252.1706 (1.0) 97.0572 (1.0) 238.1600 (1.0) 5.0002 (1.0) 485;1616 3,965.5694 (1.0) 20000 50
426+
test_annotated 525.6401 (2.28) 4,038.5600 (1.0) 577.7090 (2.29) 132.9966 (1.37) 553.9799 (2.33) 34.9300 (6.99) 662;671 1,730.9752 (0.44) 20000 50
427+
test_pydantic 1,185.0201 (5.15) 6,027.9400 (1.49) 1,349.1259 (5.35) 320.3790 (3.30) 1,278.5601 (5.37) 75.5100 (15.10) 1071;1424 741.2206 (0.19) 20000 50
428+
```
429+
430+
I tried to used the most performant API of both `msgspec` and `pydantic`
431+
receiving the arguments as a dictionary.
432+
433+
I am planning to make more thorough comparisons, but the model-like
434+
annotation API of `koerce` is roughly twice as fast as `pydantic` but
435+
half as fast as `msgspec`. Considering the implementations it also
436+
makes sense, `PyO3` possible has a higher overhead than `Cython` has
437+
but neither of those can match the performance of hand crafted python
438+
`C-API` code.
439+
440+
This performance result could be slightly improved but has two huge
441+
advantage of the other two libraries:
442+
1. It is implemented in pure python with cython decorators, so it
443+
can be used even without compiling it. It could also enable
444+
JIT compilers like PyPy or the new copy and patch JIT compiler
445+
coming with CPython 3.13 to optimize hot paths better.
446+
2. Development an be done in pure python make it much easier to
447+
contribute to. No one needs to learn Rust or python's C API
448+
in order to fix bugs or contribute new features.
449+
180450
## TODO:
181451

182452
The README is under construction, planning to improve it:
@@ -227,7 +497,7 @@ d = Namespace(builder, __name__)
227497
x = var("x")
228498
y = var("y")
229499

230-
assert match(p.A(~x, ~y) >> d.B(x=x, y=1, z=y), A(1, 2)) == B(x=1, y=1, z=2)
500+
assert match(p.A(+x, +y) >> d.B(x=x, y=1, z=y), A(1, 2)) == B(x=1, y=1, z=2)
231501
```
232502

233503
More examples and a comprehensive readme are on the way.

koerce/__init__.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,12 @@ def __init__(self, name: str):
1010
builder = Var(name)
1111
super().__init__(builder)
1212

13-
def __invert__(self):
13+
def __pos__(self):
1414
return Capture(self)
1515

16+
def __neg__(self):
17+
return self
18+
1619

1720
class _Namespace:
1821
"""Convenience class for creating patterns for various types from a module.
@@ -47,7 +50,7 @@ def var(name):
4750

4851

4952
def match(
50-
pat: Pattern, value: Any, context: Context = None, allow_coercion: bool = True
53+
pat: Pattern, value: Any, context: Context = None, allow_coercion: bool = False
5154
) -> Any:
5255
"""Match a value against a pattern.
5356

0 commit comments

Comments
 (0)