Skip to content

[wiz-cli] the wiz tool shouldn't use any duplicate dataclass names #58

Open
@rnag

Description

@rnag
  • Dataclass Wizard version: 0.22.1
  • Python version: 3.10
  • Operating System: Mac OS

Description

I found an interesting case where the wrong dataclass schema is generated with the wiz command-line tool. The schema is wrong insofar as that the same dataclass name - Ability in this case - is reused between generated dataclasses in a nested dataclass structure. This obviously won't allow us to actually load JSON data to the main dataclass Data, because the nested schema has a bug (duplicate name for dataclass) as indicated.

See output below for more details.

What I Did

I ran wiz gs from my terminal, with the following JSON data as input:

echo '{"id": 1,
"height": 7,
"name": "bulbasur",
"abilities": [
    {
      "ability": {
        "name": "overgrow",
        "url": "https://pokeapi.co/api/v2/ability/65/"
      },
      "is_hidden": false,
      "slot": 1
    },
    {
      "ability": {
        "name": "chlorophyll",
        "url": "https://pokeapi.co/api/v2/ability/34/"
      },
      "is_hidden": true,
      "slot": 3
    }
  ]
}' | wiz gs -x

The result generated the following nested dataclass schema:

from __future__ import annotations

from dataclasses import dataclass

from dataclass_wizard import JSONWizard


@dataclass
class Data(JSONWizard):
    """
    Data dataclass

    """
    id: int
    height: int
    name: str
    abilities: list[Ability]


@dataclass
class Ability:
    """
    Ability dataclass

    """
    ability: Ability
    is_hidden: bool
    slot: int


@dataclass
class Ability:
    """
    Ability dataclass

    """
    name: str
    url: str

As you can already see, the Ability dataclass name is duplicated. This results in the second class definition overwriting the previous one, which is what we would like to avoid in this case.

I've then tried the following Python code with the generated class schema above:

j = '{"id":1,"height":7,"name":"bulbasur","abilities":[{"ability":{"name":"overgrow","url":"https://pokeapi.co/api/v2/ability/65/"},"is_hidden":false,"slot":1},{"ability":{"name":"chlorophyll","url":"https://pokeapi.co/api/v2/ability/34/"},"is_hidden":true,"slot":3}]}'
instance = Data.from_json(j)

Which resulted in an error, as expected, since the second dataclass definition overwrites the first one:

  error: Ability.__init__() missing 2 required positional arguments: 'name' and 'url'

Renaming the second class to something else, for example Ability2, and then updating any references to it, then works as expected to load the JSON input data to a dataclass instance.

Resolution

One possible approach that comes to mind, is to keep a hashset of all generated class names - or better yet a dict mapping of class name to a count of its occurrences, or how many times we've seen it - maybe at the module or global scope, or perhaps recursively pass it in through each function in the generation process.

Then check if the class name we want to add is already in the mapping, and if so we add some random suffix (for example increment by the count of how many times we've already seen the class name) to the generated class name; if not, we use the generated class name as-is. In any case, we then increment the count of the class name in the mapping, to indicate that we've seen it previously.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions