Skip to content

Type system & inlined-data parsing (merge with upstream overlap) #23

@zfarrell

Description

@zfarrell

Context

This is one ticket in a series carrying forward #12 foundation work. Read #12 first for repo context.

The fork expanded src/types.rs by ~1,000 LOC and added src/parse_values.rs to unify lenient (read-side) and strict (write-side) parsing of inlined values. Upstream has independently progressed (list/array support in PR datafusion-contrib#89, type alias normalization in datafusion-contrib#82, entity-name validation in datafusion-contrib#81) — so there is partial overlap here that requires careful merging, not blind porting.

Reference branch

ducklake-features/integration:

  • src/types.rs (+1001 lines vs upstream)
  • src/parse_values.rs — shared parser, ParseMode::Lenient vs Strict
  • Tests: inline #[cfg(test)] in parse_values.rs; cross-coverage in tests/cross_engine_*.rs and write tests

Scope

  1. Diff the fork's types.rs against the rebased base (post-Foundation: rebase integration onto upstream, drop pass-throughs, triage SLT failures #12). Bucket the changes into:
    • (a) functionality upstream now has independently — drop the fork's version
    • (b) functionality unique to the fork — port
    • (c) divergent implementations of the same idea — pick the better one (or merge), then drop the other
  2. Port parse_values.rs as-is — it's a clean extraction with no upstream conflict.
  3. Specific items confirmed by audit to exist in the fork but verify they're not already upstream:
    • VARCHAR(N) / CHAR(N) with explicit precision parsing
    • double precision synonym for DOUBLE
    • validate_no_trailing_after_paren check on parameterized types
    • Type promotion rules (is_type_promotion_allowed, used by ALTER TYPE in ALTER/DROP/CREATE schema evolution DDL #20)
    • Type normalization (case-folding) in is_type_promotion_allowed — R11-S-009 commit
    • NaN/Inf rejection in compaction delete_threshold range check — R11-S-010 (note: only relevant if compaction comes back later)
    • parse_decimal_string with checked arithmetic (R11-S-020)
    • i32::try_from for Date32 num_days to prevent truncation (R11-S-019)
    • checked_mul for timestamp unit conversions (R11-S-013)
    • saturating_add for null count accumulation (R11-S-014)
  4. normalize_value limitation: the fork documented a known type-confusion limitation (R11-S-032) — preserve the doc comment so future contributors don't re-discover it.

Acceptance criteria

  • cargo test passes including the parse_values inline tests and any cross-engine type round-trip tests
  • Type promotion rules cover all DuckLake-spec-allowed promotions (cross-check the spec table)
  • Decimal precision/scale round-trips through write → read without loss for at least: DECIMAL(38, 18), DECIMAL(10, 2), DECIMAL(1, 0)
  • Date32, Timestamp(s/ms/us/ns), and Time roundtrip without overflow
  • parse_values lenient mode degrades unknown types to UTF-8 (per fork's design); strict mode errors — both behaviors tested
  • No duckdb crate imports

Dependencies

Out of scope

Notes

  • The branch's docs/edge-case-findings.md is worth reading for the type-system gotchas the agent team identified.
  • The audit-flagged "documented limitation" in normalize_value is genuine and worth keeping — do not silently "fix" it without understanding why it's documented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions