-
Notifications
You must be signed in to change notification settings - Fork 94
Refactor ModelBuilder
and RandomBuilder
#971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,12 @@ | ||
from __future__ import annotations | ||
|
||
import datetime | ||
import inspect | ||
import json | ||
import typing as t | ||
from decimal import Decimal | ||
from uuid import UUID | ||
from functools import partial | ||
from types import MappingProxyType | ||
|
||
from piccolo.columns import JSON, JSONB, Array, Column, ForeignKey | ||
from piccolo.custom_types import TableInstance | ||
|
@@ -13,18 +15,8 @@ | |
|
||
|
||
class ModelBuilder: | ||
__DEFAULT_MAPPER: t.Dict[t.Type, t.Callable] = { | ||
bool: RandomBuilder.next_bool, | ||
bytes: RandomBuilder.next_bytes, | ||
datetime.date: RandomBuilder.next_date, | ||
datetime.datetime: RandomBuilder.next_datetime, | ||
float: RandomBuilder.next_float, | ||
int: RandomBuilder.next_int, | ||
str: RandomBuilder.next_str, | ||
datetime.time: RandomBuilder.next_time, | ||
datetime.timedelta: RandomBuilder.next_timedelta, | ||
UUID: RandomBuilder.next_uuid, | ||
} | ||
__DEFAULT_MAPPER: t.Dict[t.Type, t.Callable] = {} | ||
__OTHER_MAPPER: t.Dict[t.Type, t.Callable] = {} | ||
|
||
@classmethod | ||
async def build( | ||
|
@@ -106,7 +98,7 @@ async def _build( | |
persist: bool = True, | ||
) -> TableInstance: | ||
model = table_class(_ignore_missing=True) | ||
defaults = {} if not defaults else defaults | ||
defaults = defaults or {} | ||
|
||
for column, value in defaults.items(): | ||
if isinstance(column, str): | ||
|
@@ -159,29 +151,110 @@ def _randomize_attribute(cls, column: Column) -> t.Any: | |
Column class to randomize. | ||
|
||
""" | ||
random_value: t.Any | ||
if column.value_type == Decimal: | ||
precision, scale = column._meta.params["digits"] or (4, 2) | ||
random_value = RandomBuilder.next_float( | ||
maximum=10 ** (precision - scale), scale=scale | ||
) | ||
elif column.value_type == datetime.datetime: | ||
tz_aware = getattr(column, "tz_aware", False) | ||
random_value = RandomBuilder.next_datetime(tz_aware=tz_aware) | ||
elif column.value_type == list: | ||
length = RandomBuilder.next_int(maximum=10) | ||
base_type = t.cast(Array, column).base_column.value_type | ||
random_value = [ | ||
cls.__DEFAULT_MAPPER[base_type]() for _ in range(length) | ||
] | ||
elif column._meta.choices: | ||
reg = cls.get_registry(column) | ||
if column._meta.choices: | ||
random_value = RandomBuilder.next_enum(column._meta.choices) | ||
else: | ||
random_value = cls.__DEFAULT_MAPPER[column.value_type]() | ||
random_value = reg[column.value_type]() | ||
|
||
if "length" in column._meta.params and isinstance(random_value, str): | ||
return random_value[: column._meta.params["length"]] | ||
elif isinstance(column, (JSON, JSONB)): | ||
if isinstance(column, (JSON, JSONB)): | ||
return json.dumps({"value": random_value}) | ||
|
||
return random_value | ||
|
||
@classmethod | ||
def get_registry( | ||
cls, column: Column | ||
) -> MappingProxyType[t.Type, t.Callable]: | ||
""" | ||
This serves as the public API allowing users to **view** | ||
the complete registry for the specified column. | ||
|
||
:param column: | ||
Column class to randomize. | ||
|
||
""" | ||
default_mapper = cls.__DEFAULT_MAPPER | ||
if not default_mapper: # execute once only | ||
for typ, callable_ in RandomBuilder.get_mapper().items(): | ||
default_mapper[typ] = callable_ | ||
dantownsend marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# order matters | ||
reg = { | ||
**default_mapper, | ||
**cls._get_local_mapper(column), | ||
**cls._get_other_mapper(column), | ||
} | ||
|
||
if column.value_type == list: | ||
reg[list] = partial( | ||
RandomBuilder.next_list, | ||
reg[t.cast(Array, column).base_column.value_type], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't want to make things too insane, but multidimensional arrays are possible: Array(Array(Integer()) I wonder what would happen here in that situation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for bringing this situation to my attention. I was curious about the behavior in our current codebase, so I conducted a quick test. It seems that the current implementation will throw a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might be a bit off-topic, but I wanted to try out this behavior in the
In [1]: from piccolo.table import Table
In [2]: from piccolo.columns import Array, BigInt
In [3]: class MyTable(Table):
...: my_column = Array(Array(BigInt()))
...:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 class MyTable(Table):
2 my_column = Array(Array(BigInt()))
Cell In[3], line 2, in MyTable()
1 class MyTable(Table):
----> 2 my_column = Array(Array(BigInt()))
NameError: name 'Array' is not defined There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jrycw This happens because the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sinisaos Thank you very much for identifying the source of the issue and providing the solution. It's working now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right that the current implementation of |
||
) | ||
return MappingProxyType(reg) | ||
|
||
@classmethod | ||
def _get_local_mapper(cls, column: Column) -> t.Dict[t.Type, t.Callable]: | ||
""" | ||
This classmethod encapsulates the desired logic, utilizing information | ||
from the column. | ||
|
||
:param column: | ||
Column class to randomize. | ||
""" | ||
local_mapper: t.Dict[t.Type, t.Callable] = {} | ||
|
||
precision, scale = column._meta.params.get("digits") or (4, 2) | ||
local_mapper[Decimal] = partial( | ||
RandomBuilder.next_decimal, precision, scale | ||
) | ||
Comment on lines
+206
to
+209
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If if precision_scale := column._meta.params.get("digits"):
local_mapper[Decimal] = partial(
RandomBuilder.next_decimal, *precision_scale
) |
||
|
||
tz_aware = getattr(column, "tz_aware", False) | ||
local_mapper[datetime.datetime] = partial( | ||
RandomBuilder.next_datetime, tz_aware | ||
) | ||
|
||
if _length := column._meta.params.get("length"): | ||
local_mapper[str] = partial(RandomBuilder.next_str, _length) | ||
|
||
return local_mapper | ||
|
||
@classmethod | ||
def _get_other_mapper(cls, column: Column) -> t.Dict[t.Type, t.Callable]: | ||
""" | ||
This is a hook that allows users to register their own random type | ||
callable. If the callable has a parameter named `column`, we assist | ||
by injecting `column` using `partial`. | ||
|
||
:param column: | ||
Column class to randomize. | ||
|
||
Examples:: | ||
|
||
# a callable not utilizing column information | ||
ModelBuilder.register_random_type(str, lambda: "piccolo") | ||
|
||
# a callable utilizing the column information | ||
def next_str(column: Column) -> str: | ||
length = column._meta.params.get("length", 5) | ||
return "".join("a" for _ in range(length)) | ||
) | ||
ModelBuilder.register_random_type(str, next_str) | ||
|
||
""" | ||
other_mapper: t.Dict[t.Type, t.Callable] = {} | ||
for typ, callable_ in cls.__OTHER_MAPPER.items(): | ||
sig = inspect.signature(callable_) | ||
if sig.parameters.get("column"): | ||
other_mapper[typ] = partial(callable_, column) | ||
else: | ||
other_mapper[typ] = callable_ | ||
return other_mapper | ||
|
||
@classmethod | ||
def register_type(cls, typ: t.Type, callable_: t.Callable) -> None: | ||
cls.__OTHER_MAPPER[typ] = callable_ | ||
|
||
@classmethod | ||
def unregister_type(cls, typ: t.Type) -> None: | ||
Comment on lines
+253
to
+258
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Being able to register custom types is cool, but I wonder about the main use cases. You can specify defaults at the moment: await ModeBuilder.build(MyTable, defaults={MyTable.some_column: "foo"}) I'm not against being able to override how types are handled, but we just need to articulate to users when it's appropriate vs using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for bringing up this question, it prompted me to reflect on the code and its implications. The main difference between using
A draft test for situations (1) and (2) might look like this: class TableWithDecimal(Table):
numeric = Numeric()
numeric_with_digits = Numeric(digits=(4, 2))
decimal = Decimal()
decimal_with_digits = Decimal(digits=(4, 2))
class TestModelBuilder(unittest.TestCase):
...
def test_registry_overwritten1(self):
table = ModelBuilder.build_sync(TableWithDecimal)
for key, value in table.to_dict().items():
if key != "id":
self.assertIsInstance(value, decimal.Decimal)
def fake_next_decimal(column: Column) -> float:
"""will return `float` instead of `decimal.Decimal`"""
precision, scale = column._meta.params["digits"] or (4, 2)
return RandomBuilder.next_float(
maximum=10 ** (precision - scale), scale=scale
)
ModelBuilder.register_type(decimal.Decimal, fake_next_decimal)
overwritten_table = ModelBuilder.build_sync(TableWithDecimal)
for key, value in overwritten_table.to_dict().items():
if key != "id":
self.assertIsInstance(value, float) A draft test for situations (3) might look like this: class TestModelBuilder(unittest.TestCase):
...
def test_registry_overwritten2(self):
choices = "一二三" # Chinese characters
def next_str(length: int = 3) -> str:
# Chinese names often consist of three Chinese characters
return "".join(random.choice(choices) for _ in range(length))
ModelBuilder.register_type(str, next_str)
manager1 = ModelBuilder.build_sync(Manager)
self.assertTrue(all(char_ in choices for char_ in manager1.name))
poster1 = ModelBuilder.build_sync(Poster)
self.assertTrue(all(char_ in choices for char_ in poster1.content))
ModelBuilder.unregister_type(str)
manager2 = ModelBuilder.build_sync(Manager)
self.assertTrue(all(char_ not in choices for char_ in manager2.name))
poster2 = ModelBuilder.build_sync(Poster)
self.assertTrue(all(char_ not in choices for char_ in poster2.content)) The scenario is as follows: Finally, I realized I had overlooked the magic behavior of Python's name mangling rules. For instance: >>> class ModelBuilder:
... __OTHER_MAPPER = {}
...
>>> ModelBuilder.__OTHER_MAPPER
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'ModelBuilder' has no attribute '__OTHER_MAPPER'
>>> ModelBuilder._ModelBuilder__OTHER_MAPPER
{} As a result, the previous test code might be a bit off. I need to use the following code for the setup and teardown phases for each test: def setUp(self) -> None:
ModelBuilder._ModelBuilder__OTHER_MAPPER.clear() # type: ignore
def tearDown(self) -> None:
ModelBuilder._ModelBuilder__OTHER_MAPPER.clear() # type: ignore There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for explaining the rationale behind it - it makes sense. If I was to completely redesign await ModelBuilder.build(MyTable) I would have: await ModelBuilder(some_option=True).build(MyTable) So we can configure ModelBuilder's behaviour easier. For registering types we could have: custom_builder = ModelBuilder(types={...})
await custom_builder.build(MyTable) We could allow the types to be passed in via the await ModelBuilder.build(MyTable, types={...}) If What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for sharing your thoughts with me. Personally, I am inclined towards the instance method approach. However, implementing this change might break the current interface. I propose a three-stage transition plan:
During the first two stages, we'll keep the The concept of import asyncio
import inspect
import typing as t
from concurrent.futures import ThreadPoolExecutor
def run_sync(coroutine: t.Coroutine):
try:
# We try this first, as in most situations this will work.
return asyncio.run(coroutine)
except RuntimeError:
# An event loop already exists.
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(asyncio.run, coroutine)
return future.result()
class dichotomy:
def __init__(self, f):
self.f = f
def __get__(self, instance, owner):
cls_or_inst = instance if instance is not None else owner
if inspect.iscoroutine(self.f):
async def newfunc(*args, **kwargs):
return await self.f(cls_or_inst, *args, **kwargs)
else:
def newfunc(*args, **kwargs):
return self.f(cls_or_inst, *args, **kwargs)
return newfunc
class ModelBuilder:
def __init__(self, *args, **kwargs):
self._types = "..." # Some information for instance method
@dichotomy
async def build(self_or_cls, *args, **kwargs):
if inspect.isclass(self_or_cls):
print("called as a class method from build")
cls = self_or_cls
await cls._build()
else:
print("called as an instance method from build")
self = self_or_cls
await self._build()
@dichotomy
def build_sync(self_or_cls, *args, **kwargs):
return run_sync(self_or_cls.build())
@dichotomy
async def _build(self_or_cls, *args, **kwargs):
if inspect.isclass(self_or_cls):
print("called as a class method from _build", end="\n"*2)
cls = self_or_cls # noqa: F841
# Current implementation remains here.
else:
print("called as an instance method from _build")
self = self_or_cls
# Some information can be retrieved.
print(f'{self._types=}', end="\n"*2)
# Our new logics
async def main():
print('Async ModelBuilder.build: ')
await ModelBuilder.build()
print('Async ModelBuilder().build: ')
await ModelBuilder().build()
print('Sync ModelBuilder.build: ')
ModelBuilder.build_sync()
print('Sync ModelBuilder().build: ')
ModelBuilder().build_sync()
if __name__ == '__main__':
asyncio.run(main())
Finally, I agree that making These are just rough ideas that came to mind. I'm open to further discussions and refinements. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using descriptors is an interesting idea. I've used them sparingly before - as you say, they're very powerful, but can be confusing. There's a lot of really good ideas in this PR, and I don't want to bog things down. I wonder if we could add this in a subsequent PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Certainly! Here are some options to consider for closing this PR:
I'm open to any of these choices. @dantownsend , what are your thoughts on this matter? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dantownsend no worries at all. It's great to see the project progressing on various fronts. I'll make an effort to review it and share any opinions or feedback I may have. |
||
if typ in cls.__OTHER_MAPPER: | ||
del cls.__OTHER_MAPPER[typ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably go into the
else
block, as we don't use it if the column has choices.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid observation. It seems that the presence of
reg
is a result of the requirements in the previous version, where it was needed for multipleelif-else
blocks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is a good idea.