[FEATURE] Handling compact data representations for all connectors with minimum configurations by Bilpapster · Pull Request #22 · Bilpapster/stream-DaQ

Bilpapster · 2025-10-05T17:35:20Z

Description

We can now offer data quality monitoring over data sources sending data in compact format. The user needs to define only the column names that contain the fields and values, while the rest of the conversion to native format is abstracted from the user. This feature is expected to have significant added value especially in IoT data quality monitoring scenarios.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Testing

I have tested this change locally
I have added tests that prove my fix is effective or that my feature works
All new and existing tests pass

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation

Fixes #7

…olution

…eam-DaQ into compact-to-native

Gatmatz

Overall, great work again, Vassilis. The examples are clear and to the point, and the code is very well documented. We should discuss the rest of my comments before merging.

Gatmatz · 2025-10-06T12:15:49Z

+# Notice how we can directly reference individual fields (temperature, humidity, pressure)
+# even though they arrive in compact format - Stream DaQ handles the unpacking automatically!
+daq.add(dqm.count("pressure"), name="readings") \
+    .add(dqm.missing_count("temperature") 


Very good example. However, we should resolve issue #21 first before presenting this example.

Thanks for pointing out! The fix to this bug is now merged.

Gatmatz · 2025-10-06T12:22:53Z

+        native_data_types = {
+            column_names[i]: values_dtype
+            for i in range(len(column_names))
+        }


I don't really like the hypothesis that every column will be the same type. An input data stream can include a mix of strings, integers, and floats. Instead of passing a single type, we should consider using an array of types—one for each column.

You are right, George! I see your point here.

I had in mind IoT scenarios, where typically all readings are float, but we should definitely support the general case as well. My only concern is that we should make it as easy for the user as possible and an array of types might not be the case. The user would have to make sure that ordering of types and column names match the actual ones, which becomes more cumbersome the more the fields. For example, consider that in the IoT scenario we discuss on Fridays we have 500 values in compact format (all float). Things become even more complicated if we decide to support schema evolution in the future.

I think we can find a sweet spot between easiness of use and complete control/expressiveness on the user end. Here is my proposal: We keep the values_dtype parameter as is, but we also add another parameter exceptions (optional). exceptions can be a dictionary in the form column_name: type, eliminating the need for the user to memorize the exact order and ensuring compatibility with schema evolution, if we decide so.

I am going to implement it in the revised PR. What do you think about it?

I completely agree with your approach — I really like this idea of implementation.

Keeping values_dtype as the main parameter while adding an optional exceptions dictionary feels like a great balance between usability and flexibility. It keeps things simple for the common IoT case, while still allowing users to handle specific type overrides without worrying about column order or future schema changes.

Bilpapster · 2025-10-06T20:51:25Z

+            for i in range(len(column_names))
+        }
+        # overwrite the default types with the user-specified exceptions, if any
+        if type_exceptions:


NEW to address the above comment

Bilpapster · 2025-10-06T20:52:09Z

+        Args:
+            values_dtype (Type): the (default) data type of the
+            individual values.
+            exceptions (Optional[dict[str, Type]], optional): A mapping 


NEW to address review comments

Bilpapster added 6 commits October 3, 2025 03:18

Implement first version of connector-agnostic, field names-agnostic s…

a618081

…olution

Add support for quality monitoring of compact data and enrich docs

d612f7c

Merge branch 'compact-to-native' of https://github.com/Bilpapster/str…

94bf85a

…eam-DaQ into compact-to-native

Add support for quality monitoring of compact data and enrich docs

aeb501b

Merge branch 'compact-to-native' of https://github.com/Bilpapster/str…

59b90f6

…eam-DaQ into compact-to-native

Remove old example on compact data representation

764ccd1

Bilpapster requested a review from Gatmatz October 5, 2025 17:35

Bilpapster had a problem deploying to github-pages October 5, 2025 17:35 — with GitHub Actions Failure

Update and enrich docs

8947dc5

Bilpapster had a problem deploying to github-pages October 5, 2025 17:39 — with GitHub Actions Failure

Bilpapster self-assigned this Oct 5, 2025

Gatmatz requested changes Oct 6, 2025

View reviewed changes

Address review comments about user control on data type handling

7cbfa12

Bilpapster had a problem deploying to github-pages October 6, 2025 20:50 — with GitHub Actions Failure

Bilpapster commented Oct 6, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into compact-to-native

3847883

Gatmatz had a problem deploying to github-pages October 9, 2025 12:49 — with GitHub Actions Failure

Gatmatz merged commit 657dab7 into main Oct 9, 2025
3 of 4 checks passed

Gatmatz deleted the compact-to-native branch October 9, 2025 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Handling compact data representations for all connectors with minimum configurations#22

[FEATURE] Handling compact data representations for all connectors with minimum configurations#22
Gatmatz merged 9 commits into
mainfrom
compact-to-native

Bilpapster commented Oct 5, 2025 •

edited

Loading

Uh oh!

Gatmatz left a comment

Uh oh!

Gatmatz Oct 6, 2025

Uh oh!

Bilpapster Oct 6, 2025

Uh oh!

Gatmatz Oct 6, 2025

Uh oh!

Bilpapster Oct 6, 2025

Uh oh!

Gatmatz Oct 7, 2025

Uh oh!

Bilpapster Oct 6, 2025

Uh oh!

Bilpapster Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Bilpapster commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Uh oh!

Gatmatz left a comment

Choose a reason for hiding this comment

Uh oh!

Gatmatz Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Bilpapster Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Gatmatz Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Bilpapster Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Gatmatz Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Bilpapster Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Bilpapster Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bilpapster commented Oct 5, 2025 •

edited

Loading