feat: per-variant constructors for data enums in the dynamic backends#148
Merged
Goldziher merged 7 commits intoJun 25, 2026
Merged
Conversation
Emit one `#[staticmethod]` constructor per data-carrying struct variant of an
internally-tagged data enum, so callers write `EmbeddingModelType.preset("balanced")`
instead of building the value through the stringly-typed
`EmbeddingModelType(type="preset", ...)` form. The discriminator is carried by the
variant name, which is type-safe and discoverable.
Each constructor builds the core variant directly
(`Self { inner: <core_path>::Preset { name } }`) and reuses the existing
param / let-binding / call-arg machinery (and the `pyo3_factory_method.jinja`
template) for field conversion. Constructors always collide with the variant
accessor of the same snake_case name, so they use the `_factory_<name>` Rust ident
plus `#[pyo3(name = "<name>")]`.
Skips unit variants, tuple variants, and binding_excluded variants. A hand-written
`impl` method of the same name suppresses the generated constructor (consumer wins).
The mapper arg to `gen_pyo3_data_enum_with_mapper` now drives this; the dead
associated-function factory projection (only ever called with `None` in production,
and explicitly unwanted) is removed. The pyo3 backend now passes the real
`Pyo3Mapper`, so the constructors emit in generated output. The existing
`#[new]` dict/kwargs/string constructor stays as-is; the variant constructors are
additive.
Build the `(field, converted_expr)` pairs for the per-variant constructor struct literal directly from per-param expression vectors, instead of joining the exprs with `gen_call_args` and re-splitting the comma-joined string. The re-split could misalign field→expr if a converted expression ever carried a top-level comma or tripped the `<`/`>` depth tracking. Add `gen_call_args_vec` and `gen_call_args_with_let_bindings_json_str_vec` returning `Vec<String>`; the existing joined helpers delegate to them so there is one source of truth. Delete the now-unused `split_top_level_args`.
…t-variant wrap Drop the opt-in `#[alef(string_shorthand(variant, field))]` data-variant bare-string shorthand. Per-variant constructors supersede it: they cover every data variant and keep the discriminator type-safe instead of stringly-typed. Removed: the `StringShorthand` IR type and `EnumDef::string_shorthand` field; `extract_string_shorthand`; `resolve_string_shorthand`; the `string_shorthand_diagnostics` / `StringShorthandInvalid` validation path; and the `shorthand_wire_variant`/`shorthand_field` template context for pyo3 and magnus. The internally-tagged UNIT-variant bare-string wrap (xberg-io#132) stays: pyo3 still emits `{"<tag>": s}` and magnus still emits the `{"<tag>": json_str}` TryConvert fallback. The pyo3/magnus templates collapse to the plain `serde_tag` branch, and regression tests assert the xberg-io#132 wrap survives in both backends.
Emit one singleton (class) constructor per data-carrying struct variant of a
magnus data enum, so Ruby callers write `Shape.circle(radius)` /
`Shape.rect(width, height)` instead of building a raw `{ "type" => "circle", ... }`
Hash. Each constructor builds the serde-shaped variant directly
(`Self::Circle { radius }`); the magnus data enum is binding-shaped, so the
parameters use the same types the generated enum declares and no core conversion
is needed.
The Rust function is `_factory_<name>` (registered under the bare snake_case name
via `define_singleton_method`) to avoid colliding with the variant accessor of
the same name. Data enums that gain constructors are now registered as a Ruby
class in `ruby_init`; enums with no qualifying struct variant stay unregistered
and keep round-tripping purely through serde IntoValue/TryConvert.
Unit, tuple, and `binding_excluded` variants are skipped, and a hand-written
`impl` method of the same name suppresses the generated constructor (consumer
wins). Variant selection is shared with the pyo3 path: `collect_variant_constructors`
and `VariantConstructor` in `src/codegen/generators/enums.rs` are lifted to
`pub(crate)` and re-exported (crate-internal) from the generators module as the
second consumer. The internally-tagged unit-variant bare-string fallback
(`{"<tag>": s}`) is untouched.
A tagged data enum lowered to a flat PHP class now exposes a static method per data-carrying struct variant, so PHP callers write Shape::circle($radius) instead of hand-building a JSON blob for from_json. Each method sets the discriminator tag and the variant's flat field(s) directly, reusing the same flat-field naming, tag value, and param->field conversion the core->binding From impl uses; ..Default::default() covers the remaining optional fields and is omitted when the variant covers every flat field. The Rust fn is _factory_<snake> (exposed to PHP under the camelCase snake name) to avoid colliding with the get_<field> accessor. Unit, tuple, and binding_excluded variants are skipped, and a hand-written impl method of the same name suppresses the generated constructor. Reuses collect_variant_constructors shared with the pyo3/magnus paths.
A tagged data enum with struct variants (the JSON-passthrough shape) now exposes a constructor per data-carrying variant on its R class env, so R callers write EmbeddingModelType\$preset(name) alongside the existing \$default()/\$from_json() instead of hand-rolling a JSON string. Each constructor builds the core variant directly and .into()s it into the JSON-passthrough wrapper (wrapper-convert model). DTO fields convert via <field>_core let bindings and extendr-remapped numerics are cast back to the core type. The Rust fn is _factory_<snake>; the R wrapper binds it under the bare snake name. Unit, tuple, and binding_excluded variants are skipped, and a hand-written impl method of the same name suppresses the generated constructor. Reuses collect_variant_constructors shared with the pyo3/magnus/php paths, and adds the reusable gen_call_args_with_let_bindings_json_str_cast_vec per-param helper for numeric-remapping backends.
A tagged data enum with struct variants (the NifTaggedEnum shape) now exposes a
constructor per data-carrying variant in its generated Elixir module, so callers
write Shape.circle(radius) instead of hand-building the tagged tuple. Each
def <snake>(<params>), do: {:<atom>, %{<field>: <param>, ...}} builds the
{:variant, %{field: value}} form the NifTaggedEnum decoder consumes (the
plain-direct model: no NIF, no core conversion, matching what the existing
encode_<snake> param encoder accepts).
Reserved-word variant/param names are guarded via elixir_safe_param_name /
elixir_safe_atom. Unit, tuple, and binding_excluded variants are skipped, and a
hand-written impl method of the same name suppresses the generated constructor.
Reuses collect_variant_constructors shared with the pyo3/magnus/php/extendr
paths.
Goldziher
added a commit
that referenced
this pull request
Jun 25, 2026
…kends Adds a per-variant factory constructor for every data-carrying variant of an internally-tagged data enum across the dynamic backends (pyo3, magnus, php, extendr, rustler), bringing them to parity with the statically-typed backends, and removes the superseded `string_shorthand` mechanism (#135). Closes #147. Reviewed: fmt, clippy, and the full test suite pass on the merge result.
Member
|
Merged into Two trivial adjustments during merge (neither touches the PR's backend logic):
Thanks @tobocop2 — the draft body was stale; the work was complete across pyo3/magnus/php/extendr/rustler with the |
Goldziher
added a commit
that referenced
this pull request
Jun 26, 2026
JetBrains Runtime's Panama linker casts every FunctionDescriptor layout to OfLong internally, so any sub-64-bit integer layout (JAVA_BYTE/SHORT/INT) threw `ClassCastException: OfIntImpl cannot be cast to OfLong` at NativeLib class load and corrupted TreeCursor FFM calls (tree-sitter-language-pack#146, #148). Promote bool, 8/16/32-bit ints, and enum discriminants to JAVA_LONG across java_ffi_type, service_api, the enum-discriminant layout, the LAST_ERROR_CODE descriptor, and the visitor/trait-bridge/registration callback descriptors. java_ffi_return_cast now emits compound narrowing casts ((int)(long), (short)(long), (byte)(long)) and the primitive-result templates no longer double-wrap them. Generated FunctionDescriptors contain zero sub-64-bit integer layouts; verified via the regenerated tree-sitter-language-pack bindings (mvn verify passes).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #147.
What
Generate a per-variant factory constructor for every data-carrying variant of an internally-tagged data enum, in the dynamic backends. The statically-typed backends already derive per-variant constructors from the variant structure; this brings the dynamic backends to the same model instead of leaving them on stringly-typed
new(type=...)/ raw maps.Why
For the same enum, construction diverged purely by target. Take
EmbeddingModelType(Preset { name },Custom { model_id, dimensions },Llm { llm },Plugin { name }):Static backends — already type-safe, the variant name is the constructor:
Dynamic backends, before — stringly-typed, no per-variant constructor:
Dynamic backends, after — per-variant constructors, parity with the static backends:
Type-safe (no magic
"preset"string), discoverable via autocomplete, and complete — it coversCustom/Llm/Plugin, which the data-variant bare-string shorthand never addressed.string_shorthandwas merged in #133 to address #135, but it isn't a good design and is reverted here. A bare string is stringly-typed, only ever worked for the single-fieldPresetcase, and only on pyo3 and magnus — it doesn't extend toCustom/Llm/Pluginor to the other backends. This PR addresses the same goal properly: every variant gets a typed per-variant constructor, the same way across all backends. #135 is superseded.The unit-variant string handling from #132 is preserved and unaffected — a fieldless variant name (
"disabled") is a fine string, and per-variant constructors don't replace it.How
Self { inner: <core>::<Variant> { field, .. } }), reusing the existing param / let-binding / conversion machinery._factory_<name>Rust ident + the host-facing<name>(pyo3#[pyo3(name=...)], etc.), per backend.implconstructor of the same name suppresses synthesis (consumer wins). Unit, tuple, andbinding_excludedvariants are skipped.Verification
Regenerated kreuzberg against this branch and checked every impacted binding:
EmbeddingModelType/RerankerModelType(and other data enums) —EmbeddingModelType.preset(...)/.custom(...)/.llm(...)/.plugin(...)in Python,EmbeddingModelType.preset(...)in Ruby,::preset(...)in PHP,EmbeddingModelType$preset(...)in R,EmbeddingModelType.preset(...)in Elixir.cargo checkpasses on the regeneratedkreuzberg-py,kreuzberg-php, andkreuzberg-node. This surfaced (and the fix commits resolve) several real-world field-type cases the neutral unit fixtures missed: variants with sanitized / binding-excluded fields are skipped (they can't be built from the binding), promoted-optional params unwrap to non-optional core fields, return-only DTOs get a generatedFromimpl, and field conversions are inlined so no non-re-exported core type path is named.{"<tag>": s}wrap) is preserved across pyo3 and magnus;string_shorthandis fully removed with no dangling references.The shared
collect_variant_constructors(variant selection) andvariant_field_init(field conversion) are the single source of truth for the wrapper-convert backends; magnus builds the binding enum directly; rustler emits pure-Elixir constructors. Whole-cratecargo test,clippy -D warnings, andfmtare clean.