Skip to content

Various small improvements to recursive normalizers #2378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

poodlewars
Copy link
Collaborator

@poodlewars poodlewars commented May 30, 2025

The first commit 630de4 just shows how to fix an existing TODO in the code to make the Flattener non-recursive. It turns out that msgpack.packb which we call straight afterwards is itself recursive as it serializes nested objects, so this change doesn't add any value and I just revert it in 2b8db. The git history is like this so that the reverted commit is still available in the history for people.

b287372 introduces a clearer error message when we hit that msgpack recursion limit.

c872353 just fixes a comment that misunderstood what pickle_on_failure controls.

Recursive normalizers rely on serializing data using a separator of __ which meant that dictionaries with keys containing __ did not roundtrip correctly (typically the __ part of the keys would vanish upon read). 9ff3e94 validates against writing dictionaries with such a key. Alex and I discussed a lot about introducing a new serialization format that could handle these and decided against it - we can always do so in future.

ea0b066 widens the set of dict keys that we can write. We were previously applying our symbol name validation to them, which is stronger than necessary.

@poodlewars poodlewars added the patch Small change, should increase patch version label May 30, 2025
@poodlewars
Copy link
Collaborator Author

On reflection I will add a change to 9ff3e94 to give people a way to disable the validation against __ with an environment variable. This will give us a way out of trouble if the new validation causes an issue in prod (eg an irrelevant dict key with __ prevents a write for an otherwise fine dict).

@poodlewars
Copy link
Collaborator Author

Not ready for review - tests are broken


if depth > DEFAULT_RECURSE_LIMIT // 2:
raise DataTooNestedException(f"Symbol {original_symbol} cannot be recursively normalized as it contains more than "
f"{DEFAULT_RECURSE_LIMIT // 2} levels of nested dictionaries. This is a limitation of the msgpack serializer.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the factor of 2, so probably worth a comment

@@ -19,20 +19,20 @@ namespace stream {
void register_stream_bindings(py::module &m);

inline void register_bindings(py::module &m) {
auto arcticxx_types = m.def_submodule("types", R"pydoc(
auto arcticdb_ext_types = m.def_submodule("types", R"pydoc(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the bridge automagically handle these changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just renaming the variable - so no impact on our APIs

std::string msg_;
};

using CheckOutcome = std::variant<Error, std::monostate>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed now, but this would be more generally useful if templated on the second variant type, so it could be used as the return type from non-void functions as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Small change, should increase patch version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants