Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ Change Log
Unreleased
__________

* Added support for complex types in dictionaries and lists.

[10.5.0] - 2025-08-19
---------------------

Expand Down
52 changes: 51 additions & 1 deletion docs/how-tos/add-event-bus-support-to-an-event.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,56 @@ Complex Data Types
--------------------

- Type-annotated Lists (e.g., ``List[int]``, ``List[str]``)
- Type-annotated Dictionaries (e.g., ``Dict[str, int]``, ``Dict[str, str]``)
- Attrs Classes (e.g., ``UserNonPersonalData``, ``UserPersonalData``, ``UserData``, ``CourseData``)
- Types with Custom Serializers (e.g., ``CourseKey``, ``datetime``)
- Nested Complex Types:

- Lists containing dictionaries (e.g., ``List[Dict[str, int]]``)
- Dictionaries containing lists (e.g., ``Dict[str, List[int]]``)
- Lists containing attrs classes (e.g., ``List[UserData]``)
- Dictionaries containing attrs classes (e.g., ``Dict[str, CourseData]``)

Ensure that the :term:`Event Payload` is structured as `attrs data classes`_ and that the data types used in those classes align with the event bus schema format.

Examples of Complex Data Types Usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here are practical examples of how to use the supported complex data types in your event payloads:

.. code-block:: python

# Example 1: Event with type-annotated dictionaries and lists
@attr.s(frozen=True)
class CourseMetricsData:
"""
Course metrics with complex data structures.
"""
# Simple dictionary with string keys and integer values
enrollment_counts = attr.ib(type=dict[str, int], factory=dict)

# List of dictionaries
grade_distributions = attr.ib(type=List[dict[str, float]], factory=list)

# Dictionary containing lists
student_groups = attr.ib(type=dict[str, List[str]], factory=dict)


# Example 2: Event with nested attrs classes
@attr.s(frozen=True)
class BatchOperationData:
"""
Batch operation with collections of user data.
"""
# List of attrs classes
affected_users = attr.ib(type=List[UserData], factory=list)

# Dictionary mapping course IDs to course data
courses_mapping = attr.ib(type=dict[str, CourseData], factory=dict)

# Complex nested structure
operation_results = attr.ib(type=dict[str, List[dict[str, bool]]], factory=dict)

In the ``data.py`` files within each architectural subdomain, you can find examples of the :term:`Event Payload` structured as `attrs data classes`_ that align with the event bus schema format.

Step 3: Ensure Serialization and Deserialization
Expand Down Expand Up @@ -103,7 +148,12 @@ After implementing the serializer, add it to ``DEFAULT_CUSTOM_SERIALIZERS`` at t
Now, the :term:`Event Payload` can be serialized and deserialized correctly when sent across services.

.. warning::
One of the known limitations of the current Open edX Event Bus is that it does not support dictionaries as data types. If the :term:`Event Payload` contains dictionaries, you may need to refactor the :term:`Event Payload` to use supported data types. When you know the structure of the dictionary, you can create an attrs class that represents the dictionary structure. If not, you can use a str type to represent the dictionary as a string and deserialize it on the consumer side using JSON deserialization.
The Open edX Event Bus supports type-annotated dictionaries (e.g., ``Dict[str, int]``) and complex nested types. However, dictionaries **without type annotations** are still not supported. Always use proper type annotations for dictionaries and lists in your :term:`Event Payload`. For example:

- ✅ Supported: ``Dict[str, int]``, ``List[Dict[str, str]]``, ``Dict[str, UserData]``
- ❌ Not supported: ``dict``, ``list``, ``Dict`` (without type parameters)

If you need to use unstructured data, consider creating an attrs class that represents the data structure.

Step 4: Generate the Avro Schema
====================================
Expand Down
5 changes: 5 additions & 0 deletions docs/how-tos/create-a-new-event.rst
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,11 @@ In our example, the event definition and payload for the enrollment event could
- Try using nested data classes to group related data together. This will help maintain consistency and make the event more readable. For instance, in the above example, we have grouped the data into User, Course, and Enrollment data.
- Try reusing existing data classes if possible to avoid duplicating data classes. This will help maintain consistency and reduce the chances of introducing errors. You can review the existing data classes in :ref:`Data Attributes` to see if there is a data class that fits your use case.
- Each field in the payload should be documented with a description of what the field represents and the data type it should contain. This will help consumers understand the payload and react to the event. You should be able to justify why each field is included in the payload and how it relates to the event.
- Use type-annotated complex data types when needed. The event bus supports dictionaries and lists with proper type annotations:

- ``Dict[str, int]`` for dictionaries with string keys and integer values.
- ``List[UserData]`` for lists containing attrs classes.
- ``Dict[str, List[str]]`` for nested complex structures.
- Use defaults for optional fields in the payload to ensure its consistency in all cases.

.. note:: When defining the payload, enforce :ref:`Event Bus` compatibility by ensuring that the data types used in the payload align with the event bus schema format. This will help ensure that the event can be sent by the producer and then be re-emitted by the same instance of `OpenEdxPublicSignal`_ on the consumer side, guaranteeing that the data sent and received is identical. For more information about adding event bus support to an event, refer to :ref:`Add Event Bus Support`.
Expand Down
19 changes: 17 additions & 2 deletions openedx_events/event_bus/avro/deserializer.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def _deserialized_avro_record_dict_to_object(data: dict, data_type, deserializer
return deserializer(data)
elif data_type in PYTHON_TYPE_TO_AVRO_MAPPING:
return data
elif data_type_origin == list:
elif data_type_origin is list:
# Returns types of list contents.
# Example: if data_type == List[int], arg_data_type = (int,)
arg_data_type = get_args(data_type)
Expand All @@ -52,7 +52,11 @@ def _deserialized_avro_record_dict_to_object(data: dict, data_type, deserializer
# Check whether list items type is in basic types.
if arg_data_type[0] in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING:
return data
elif data_type_origin == dict:

# Complex nested types like List[List[...]], List[Dict[...]], etc.
item_type = arg_data_type[0]
return [_deserialized_avro_record_dict_to_object(sub_data, item_type, deserializers) for sub_data in data]
elif data_type_origin is dict:
# Returns types of dict contents.
# Example: if data_type == Dict[str, int], arg_data_type = (str, int)
arg_data_type = get_args(data_type)
Expand All @@ -63,6 +67,17 @@ def _deserialized_avro_record_dict_to_object(data: dict, data_type, deserializer
# Check whether dict items type is in basic types.
if all(arg in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING for arg in arg_data_type):
return data

# Complex dict values that need recursive deserialization
key_type, value_type = arg_data_type
if key_type is not str:
raise TypeError("Avro maps only support string keys. The key type must be 'str'.")

# Complex nested types like Dict[str, Dict[...]], Dict[str, List[...]], etc.
return {
key: _deserialized_avro_record_dict_to_object(value, value_type, deserializers)
for key, value in data.items()
}
elif hasattr(data_type, "__attrs_attrs__"):
transformed = {}
for attribute in data_type.__attrs_attrs__:
Expand Down
173 changes: 140 additions & 33 deletions openedx_events/event_bus/avro/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
TODO: Handle optional parameters and allow for schema evolution. https://github.com/edx/edx-arch-experiments/issues/53
"""


from typing import get_args, get_origin
from typing import Any, Type, get_args, get_origin

from .custom_serializers import DEFAULT_CUSTOM_SERIALIZERS
from .types import PYTHON_TYPE_TO_AVRO_MAPPING, SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING
Expand Down Expand Up @@ -74,37 +73,19 @@ def _create_avro_field_definition(data_key, data_type, previously_seen_types,
raise Exception("Unable to generate Avro schema for dict or array fields without annotation types.")
avro_type = PYTHON_TYPE_TO_AVRO_MAPPING[data_type]
field["type"] = avro_type
elif data_type_origin == list:
# Returns types of list contents.
# Example: if data_type == List[int], arg_data_type = (int,)
arg_data_type = get_args(data_type)
if not arg_data_type:
raise TypeError(
"List without annotation type is not supported. The argument should be a type, for eg., List[int]"
)
avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(arg_data_type[0])
if avro_type is None:
raise TypeError(
"Only following types are supported for list arguments:"
f" {set(SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.keys())}"
)
field["type"] = {"type": PYTHON_TYPE_TO_AVRO_MAPPING[data_type_origin], "items": avro_type}
elif data_type_origin == dict:
# Returns types of dict contents.
# Example: if data_type == Dict[str, int], arg_data_type = (str, int)
arg_data_type = get_args(data_type)
if not arg_data_type:
raise TypeError(
"Dict without annotation type is not supported. The argument should be a type, for eg., Dict[str, int]"
)
avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(arg_data_type[1])
if avro_type is None:
raise TypeError(
"Only following types are supported for dict arguments:"
f" {set(SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.keys())}"
)
field["type"] = {"type": PYTHON_TYPE_TO_AVRO_MAPPING[data_type_origin], "values": avro_type}
# Case 3: data_type is an attrs class
# Case 3: data_type is a list (possibly with complex items)
elif data_type_origin is list:
item_avro_type = _get_avro_type_for_list_item(
data_type, previously_seen_types, all_field_type_overrides
)
field["type"] = {"type": "array", "items": item_avro_type}
# Case 4: data_type is a dictionary (possibly with complex values)
elif data_type_origin is dict:
item_avro_type = _get_avro_type_for_dict_item(
data_type, previously_seen_types, all_field_type_overrides
)
field["type"] = {"type": "map", "values": item_avro_type}
# Case 5: data_type is an attrs class
elif hasattr(data_type, "__attrs_attrs__"):
# Inner Attrs Class

Expand Down Expand Up @@ -135,3 +116,129 @@ def _create_avro_field_definition(data_key, data_type, previously_seen_types,
single_type = field["type"]
field["type"] = ["null", single_type]
return field


def _get_avro_type_for_dict_item(
data_type: Type[dict], previously_seen_types: set, type_overrides: dict[Any, str]
) -> str | dict[str, str]:
"""
Determine the Avro type definition for a dictionary value based on its Python type.

This function converts Python dictionary value types to their corresponding
Avro type representations. It supports simple types, complex nested types (like
dictionaries and lists), and custom classes decorated with attrs.

Args:
data_type (Type[dict]): The Python dictionary type with its type annotation
(e.g., Dict[str, str], Dict[str, int], Dict[str, List[str]])
previously_seen_types (set): Set of type names that have already been
processed, used to prevent duplicate record definitions
type_overrides (dict[Any, str]): Dictionary mapping custom Python types to
their Avro type representations

Returns:
One of the following Avro type representations:
- A string (e.g., "string", "int", "boolean") for simple types
- A dictionary with a complex type definition for container types, such as:
- {"type": "array", "items": <avro_type>} for lists
- {"type": "map", "values": <avro_type>} for nested dictionaries
- {"name": "<TypeName>", "type": "record", "fields": [...]} for attrs classes
- A string with a record name for previously defined record types

Raises:
TypeError: If the dictionary has no type annotation, has non-string keys,
or contains unsupported value types
"""
# Validate dict has type annotation
# Example: if data_type == Dict[str, int], arg_data_type = (str, int)
arg_data_type = get_args(data_type)
if not arg_data_type:
raise TypeError(
"Dict without annotation type is not supported. The argument should be a type, e.g. Dict[str, int]"
)

value_type = arg_data_type[1]

# Case 1: Simple types mapped in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING
avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(value_type)
if avro_type is not None:
return avro_type

# Case 2: Complex types (dict, list, or attrs class)
if get_origin(value_type) in (dict, list) or hasattr(value_type, "__attrs_attrs__"):
# Create a temporary field for the value type and extract its type definition
temp_field = _create_avro_field_definition("temp", value_type, previously_seen_types, type_overrides)
return temp_field["type"]

# Case 3: Unannotated containers (raise specific errors)
if value_type is dict:
raise TypeError("A Dictionary as a dictionary value should have a type annotation.")
if value_type is list:
raise TypeError("A List as a dictionary value should have a type annotation.")

# Case 4: Unsupported types
raise TypeError(f"Type {value_type} is not supported for dict values.")


def _get_avro_type_for_list_item(
data_type: Type[list], previously_seen_types: set, type_overrides: dict[Any, str]
) -> str | dict[str, str]:
"""
Determine the Avro type definition for a list item based on its Python type.

This function handles conversion of various Python types that can be
contained within a list to their corresponding Avro type representations.
It supports simple types, complex nested types (like dictionaries and lists),
and custom classes decorated with attrs.

Args:
data_type (Type[list]): The Python list type with its type annotation
(e.g., List[str], List[int], List[Dict[str, str]], etc.)
previously_seen_types (set): Set of type names that have already been
processed, used to prevent duplicate record definitions
type_overrides (dict[Any, str]): Dictionary mapping custom Python types
to their Avro type representations

Returns:
One of the following Avro type representations:
- A string (e.g., "string", "long", "boolean") for simple types
- A dictionary with a complex type definition for container types, such as:
- {"type": "array", "items": <avro_type>} for lists
- {"type": "map", "values": <avro_type>} for dictionaries
- {"name": "<TypeName>", "type": "record", "fields": [...]} for attrs classes
- A string with a record name for previously defined record types

Raises:
TypeError: If the list has no type annotation, contains unsupported
types, or contains containers (dict, list) without proper type
annotations
"""
# Validate list has type annotation
# Example: if data_type == List[int], arg_data_type = (int,)
arg_data_type = get_args(data_type)
if not arg_data_type:
raise TypeError(
"List without annotation type is not supported. The argument should be a type, e.g. List[int]"
)

item_type = arg_data_type[0]

# Case 1: Simple types mapped in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING
avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(item_type)
if avro_type is not None:
return avro_type

# Case 2: Complex types (dict, list, or attrs class)
if get_origin(item_type) in (dict, list) or hasattr(item_type, "__attrs_attrs__"):
# Create a temporary field for the value type and extract its type definition
temp_field = _create_avro_field_definition("temp", item_type, previously_seen_types, type_overrides)
return temp_field["type"]

# Case 3: Unannotated containers (raise specific errors)
if item_type is dict:
raise TypeError("A Dictionary as a list item should have a type annotation.")
if item_type is list:
raise TypeError("A List as a list item should have a type annotation.")

# Case 4: Unsupported types
raise TypeError(f"Type {item_type} is not supported for list items.")
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
"name": "CloudEvent",
"type": "record",
"doc": "Avro Event Format for CloudEvents created with openedx_events/schema",
"fields": [
{
"name": "course_notification_data",
"type": {
"name": "CourseNotificationData",
"type": "record",
"fields": [
{
"name": "course_key",
"type": "string"
},
{
"name": "app_name",
"type": "string"
},
{
"name": "notification_type",
"type": "string"
},
{
"name": "content_url",
"type": "string"
},
{
"name": "content_context",
"type": {
"type": "map",
"values": "string"
}
},
{
"name": "audience_filters",
"type": {
"type": "map",
"values": {
"type": "array",
"items": "string"
}
}
}
]
}
}
],
"namespace": "org.openedx.learning.course.notification.requested.v1"
}
Loading