Description
Is your feature request related to a problem?
This would be quite a large change. I understand if you just want to squash teh whole idea.
In both my postgres database and my motherduck databases that I am working with, I have some columns as enum datatypes. This helps with both performance and data integrity.
However, when I am using these database with ibis, I lose a lot of the type safety.
eg if I have an enum in the database with values 'LOW', 'MEDIUM', and 'HIGH', then I can (and accidentally have) done something like t.filter(t.priority == 'L')
, when I really meant to do 'LOW'. But this silently gives me the wrong result. It would be great if that errored at expression construction time. Not as good, but still an improvement, would be an error at execution time.
The other motivation is for actually creating tables, eg if I do conn.create_table("my_table", schema)
then it would be nice if it actually used the underlying enum type.
What is the motivation behind your request?
No response
Describe the solution you'd like
This might require emitting some DDL statements like CREATE YPE IF NOT EXISTS autogenerated_enum_name_123 AS ENUM ('val1', 'val2', 'val3')
. I think we already do this for struct types on postgres, so maybe this wouldn't be a huge lift.
These are some examples of what I'd want to be able to do:
import enum
import ibis
from ibis.expr import datatypes as dt
class Priority(enum.StrEnum):
LOW = "LOW"
MEDIUM = "MEDIUM"
HIGH = "HIGH"
t = ibis.table(schema={"priority": Priority})
t = ibis.table(schema={"priority": ibis.dtype(Priority)})
t = ibis.table(schema={"priority": dt.Enum(Priority)})
t = ibis.table(schema={"priority": "enum<LOW, MEDIUM, HIGH>"})
t.filter(t.priority == Priority.LOW) # This is the encouraged way because of better IDE completion and refactoring
t.filter(t.priority == "LOW")
t.filter(t.priority == "L") # error
t.priority.lower() # error, in general you can't treat them as strings
duckdb supports ordering enum values, which would allow us to do table.priority > 'LOW'
, but for now, I would say we should punt on actually implementing this, but if it's not too hard, leaving the door open for us to implement this later (eg storing the enum members in order in the datatype instead of storing them in an eg frozenset)
What version of ibis are you running?
main
What backend(s) are you using, if any?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
backlog