Skip to content

feat: first-class support for enum datatype #10991

Open
@NickCrews

Description

@NickCrews

Is your feature request related to a problem?

This would be quite a large change. I understand if you just want to squash teh whole idea.

In both my postgres database and my motherduck databases that I am working with, I have some columns as enum datatypes. This helps with both performance and data integrity.

However, when I am using these database with ibis, I lose a lot of the type safety.

eg if I have an enum in the database with values 'LOW', 'MEDIUM', and 'HIGH', then I can (and accidentally have) done something like t.filter(t.priority == 'L'), when I really meant to do 'LOW'. But this silently gives me the wrong result. It would be great if that errored at expression construction time. Not as good, but still an improvement, would be an error at execution time.

The other motivation is for actually creating tables, eg if I do conn.create_table("my_table", schema) then it would be nice if it actually used the underlying enum type.

What is the motivation behind your request?

No response

Describe the solution you'd like

This might require emitting some DDL statements like CREATE YPE IF NOT EXISTS autogenerated_enum_name_123 AS ENUM ('val1', 'val2', 'val3'). I think we already do this for struct types on postgres, so maybe this wouldn't be a huge lift.

These are some examples of what I'd want to be able to do:

import enum
import ibis
from ibis.expr import datatypes as dt

class Priority(enum.StrEnum):
    LOW = "LOW"
    MEDIUM = "MEDIUM"
    HIGH = "HIGH"

t = ibis.table(schema={"priority": Priority})
t = ibis.table(schema={"priority": ibis.dtype(Priority)})
t = ibis.table(schema={"priority": dt.Enum(Priority)})
t = ibis.table(schema={"priority": "enum<LOW, MEDIUM, HIGH>"})

t.filter(t.priority == Priority.LOW) # This is the encouraged way because of better IDE completion and refactoring
t.filter(t.priority == "LOW")
t.filter(t.priority == "L") # error
t.priority.lower()  # error, in general you can't treat them as strings

duckdb supports ordering enum values, which would allow us to do table.priority > 'LOW', but for now, I would say we should punt on actually implementing this, but if it's not too hard, leaving the door open for us to implement this later (eg storing the enum members in order in the datatype instead of storing them in an eg frozenset)

What version of ibis are you running?

main

What backend(s) are you using, if any?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    datatypesIssues relating to ibis's datatypes (under `ibis.expr.datatypes`)featureFeatures or general enhancements

    Type

    No type

    Projects

    • Status

      backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions