Skip to content

[Feature/Addition] Teach the Tables object about generators. #1529

@fgregg

Description

@fgregg

petl has some support for generators. if we took advantage of this parsons's data creation, we could often avoid memory problems.

Here's what that could look like.

class Table(ETL, ToFrom):
    """
    Create a Parsons Table. Accepts one of the following:
    - A list of lists, with list[0] holding field names, and the other lists holding data
    - A list of dicts
    - A petl table

    `Args:`
        lst: list
            See above for accepted list formats
        source: str
            The original data source from which the data was pulled (optional)
        name: str
            The name of the table (optional)
    """

    def __init__(
        self,
        lst: Union[list, tuple, petl.util.base.Table, _EmptyDefault] = _EMPTYDEFAULT,
    ):
        self.table = None

        # Normally we would use None as the default argument here
        # Instead of using None, we use a sentinal
        # This allows us to maintain the existing behavior
        # This is allowed: Table()
        # This should fail: Table(None)
        if lst is _EMPTYDEFAULT:
            self.table = petl.fromdicts([])

        elif isinstance(lst, petl.util.base.Table):
            # Create from a petl table
            self.table = lst

        else:
            try:
                iterable_data = iter(lst)
            except TypeError:
                raise ValueError(
                    f"Could not initialize table from input type. "
                    f"Got {type(lst)}, expected list, tuple, or petl Table"
                ) from None

            try:
                peek = next(iterable_data)
            except StopIteration:
                self.table = petl.fromdicts([])
            else:
                # petl can handle generators but does an explicit
                # inspect.generator check instead of duck typing, so we have to make
                # sure that this is a generator
                iterable_data = (each for each in itertools.chain([peek], iterable_data))

                row_type = type(peek)
                # Check for list of dicts
                if row_type is dict:
                    self.table = petl.fromdicts(iterable_data)
                    # Check for list of lists
                elif row_type in [list, tuple]:
                    # the wrap method does not support generators (or
                    # more precisely only allows us to read a table
                    # created from generator once
                    self.table = petl.wrap(list(iterable_data))

        if not self.is_valid_table():
            raise ValueError("Could not create Table")

        # Count how many times someone is indexing directly into this
        # table, so we can warn against inefficient usage.
        self._index_count = 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImpact - something should be added to or changed about Parsons that isn't causing a current breakage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions