Feature Request: Ducklake python API #419
Replies: 2 comments 2 replies
-
Totally need this feature! |
Beta Was this translation helpful? Give feedback.
-
I’d like to share another example that highlights why this feature is necessary. It took me a while to track this down:
→ Everything worked perfectly. Second write: the next time I wrote to the same DuckLake table, the column order had changed slightly:
→ This failed, because DuckDB tried to cast the frame column into the extracted_date column’s position.
This happens because DuckDB aligns columns by position instead of name. For reference, Delta Lake supports column name-based matching: “When you write a DataFrame to a Delta table, Delta Lake primarily matches columns based on their names, not their ordinal position. If the column names in your DataFrame match the column names in the Delta table’s schema, Delta Lake will correctly map the data.” |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Why do you want this feature?
I’ve worked extensively with Parquet and Delta Lake over the past few years, and one of the biggest advantages I’ve found with Delta Lake is its native Python API and the schema flexibility it provides.
A great example is the schema_mode option in Delta Lake. It’s very intuitive — I can simply tell Delta Lake how to handle schema evolution, and then focus on the real task of collecting and consolidating data, instead of spending time forcing the data into a rigid schema (which can be extremely painful when the schema is unknown or messy or change multiple time).
For context: in the attached example, I’m extracting messy data from the SEC’s EDGAR database, transforming it into a tabular format, and then writing it out. With DuckDB’s Python API today, I need to manually alter the table to fit the data. With Delta Lake, I just set schema_mode and it handles it seamlessly.
I think adding this kind of capability would make DuckLake much more user-friendly for messy, real-world data ingestion scenarios.
I'd love to hear your thoughts!
schema-mode-example.docx
Beta Was this translation helpful? Give feedback.
All reactions