Skip to content

[wiz-cli] duplicate dataclass schemas should be replaced with just one #59

Open
@rnag

Description

@rnag
  • Dataclass Wizard version: 0.22.1
  • Python version: 3.10
  • Operating System: Mac OS

Description

In certain cases - and especially in certain API responses, most notably for AWS Rekognition - the input JSON object can contain multiple definitions for the same field - for ex. "element", all of which contain an identical schema.

I'd like to eliminate those duplicate dataclass definitions in the output, so that the generated schema is a bit less verbose and we only have the data we care about.

For example, note the below sample input and output.

What I Did

I ran the following command from my mac terminal:

echo '{
    "element": {
        "my_str": "string",
        "my_int": 3
    },
    "Elements": [
        {
            "my_str": "hello",
            "my_int": 5
        },
        {
            "myStr": "world",
            "MyInt": 7
        }
    ],
    "other_field": {
        "element": {
            "my_str": "other string",
            "my_int": 42
        }
    }
}' | wiz gs

The generated output is a bit noisy in this scenario, as it contains duplicate definitions of the dataclass Element:

from dataclasses import dataclass
from typing import List

from dataclass_wizard import JSONWizard


@dataclass
class Data(JSONWizard):
    """
    Data dataclass

    """
    element: 'Element'
    elements: List['Element']
    other_field: 'OtherField'


@dataclass
class Element:
    """
    Element dataclass

    """
    my_str: str
    my_int: int


@dataclass
class Element:
    """
    Element dataclass

    """
    my_str: str
    my_int: int


@dataclass
class OtherField:
    """
    OtherField dataclass

    """
    element: 'Element'


@dataclass
class Element:
    """
    Element dataclass

    """
    my_str: str
    my_int: int

I'd like to eliminate all the duplicate definitions - preferably trim any duplicates after the first dataclass schema for Element.

Resolution

There are multiple ways to achieve this, but I think the easiest might be to store the generated string or __repr__ for the schema in a dict with the class name as the key, and then lookup and compare if those string defintions are the same. If so, we just continue and return an empty __repr__ after the first time. If not, we generate all the field names and types for the dataclass as normal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions