Skip to content

Commit 33c6a9f

Browse files
authored
fix(codegen): construct private-field core types via Default-seeded builder (#160)
* fix(codegen): construct private-field core types via Default-seeded builder Core structs with private (`pub(crate)`) fields cannot be built with struct-literal syntax from a foreign crate — neither by naming the private field (E0451) nor by patching it with `..Default::default()`. The generated binding->core `From` impls for such types failed to compile against cores that moved internal fields behind `pub(crate)` (e.g. extraction/OCR result types whose `internal_document` fields became private). Record a `has_private_fields` flag on struct IR during extraction, and centralize the construction strategy in a shared helper (`conversions::construction`): seed the core type's `Default` — built inside the defining crate, so it fills the private fields — and assign only the public fields onto it. When the core type has private fields but no `Default`, emit a `compile_error!` guiding the author to derive it. The helper is used by the shared pyo3/napi/wasm/extendr/rustler/magnus generator and the Dart mirror-crate generator, so every backend that builds core structs stays in lockstep. Also repair two pyo3 trait-bridge regressions surfaced by the same core migration: - Marshal owned (by-value) native-struct callback params into the host's native object via `From<core::T>`, as borrowed params already were. A trait method taking a serde struct by value (e.g. a URI-based extraction-input envelope) passed the raw `core::T` across the Python boundary, which has no `IntoPyObject` and failed to compile. - When a core `register_*` free function shares its name with a trait bridge's `register_fn`, emit only the bridge's duck-typed registration. The function loop also emitted the auto-wrapped core version, which collided (E0428) and no longer type-checks against a registry that takes `Arc<dyn Trait>`. Adds extractor and conversion-strategy unit tests; existing snapshots are unchanged since no neutral fixture has private fields. * fix(php): construct private-field core types and marshal owned bridge params Extends the private-field construction fix to PHP's ext-php-rs paths, the last backend still failing against the migrated core: - The enum-tainted `From<Binding>` emitter (`gen_enum_tainted_from_binding_to_core`) collected its conversions into a struct literal, which cannot build a core type with private fields. It now collects `(field, expr)` pairs and delegates to the shared `conversions::construction` helper — the same `Default`-seeded builder / `compile_error!` strategy the other backends use — for private-field types, keeping the struct literal for fully-public types. - Owned (by-value) native-struct callback params were dereferenced as if borrowed (`{Class}::from((*input).clone())`), which does not type-check on an owned `core::T` (`E0614`). Owned params now construct from the value directly. - The native-object return fast-path emitted `<Binding as FromZval>::from_zval(&val)`, but a PHP `#[php_class]` binding struct implements `FromZvalMut` (for `&mut T`), not `FromZval` (for `T`). The bridge now keeps the JSON/string return path for every type — the well-defined path for PHP — rather than a native fast-path that cannot compile. All seven generated binding crates (py/node/wasm/dart/ffi/swift/php) now compile against xberg 1.0.0-rc.1. * perf(trait-bridge): move owned native-struct params instead of cloning The bridge marshalling cloned the owned core value when handing it to the host (`T::from(owned.clone())`). The param — e.g. a by-value `ExtractInput` envelope carrying document bytes — is consumed exactly once, so it is moved in instead. Borrowed params still clone out of the `&`, which is unavoidable. Applies to the pyo3 (sync + async) and PHP bridge generators. * fix(pyo3): serialize dict/list JSON config fields in api.py converters PyO3 cannot expose a settable serde_json::Value field (unlike NAPI's json_as_value), so JSON config fields are stored as String in the binding while the public dataclass and .pyi stub type them as dict[str, Any]. The generated api.py converter forwarded the dict straight to _rust, raising 'TypeError: dict is not str' at runtime. Serialize a dict/list via json.dumps in the converter; pass str/None through unchanged. * fix(pyo3): enlarge async-runtime worker stack to prevent deep-future SIGBUS pyo3-async-runtimes' default multi-thread runtime gives worker threads a small (~2 MB) stack. A deep consumer future — e.g. a multi-stage PDF/OCR pipeline — overflows it and the process aborts with SIGBUS (EXC_BAD_ACCESS, KERN_PROTECTION_FAILURE on the worker stack guard page). The generated #[pymodule] init now installs a tokio runtime with a 16 MB thread_stack_size via pyo3_async_runtimes::tokio::init before the first future_into_py. Reproduced and verified fixed against a real consumer: an image-only PDF + tesseract OCR aborted with exit 138 before, runs to completion after. * fix(napi): enlarge streaming WORKER_POOL worker stack to prevent deep-future SIGBUS Mirrors the pyo3 worker-stack fix for node: the generated streaming WORKER_POOL tokio runtime used the default (~2 MB) worker stack, which a deep consumer future (e.g. an OCR pipeline) overflows, aborting the process with SIGBUS. Build it with a 16 MB thread_stack_size.
1 parent e31b5f1 commit 33c6a9f

87 files changed

Lines changed: 882 additions & 226 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,41 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2828

2929
### Fixed
3030

31+
- **napi**: give the generated streaming `WORKER_POOL` tokio runtime a 16 MB worker stack, so a
32+
deep consumer future does not overflow the default (~2 MB) worker stack and abort with `SIGBUS`.
33+
- **pyo3**: provision an enlarged worker-thread stack on the generated module's async runtime.
34+
pyo3-async-runtimes' default multi-thread runtime gives workers a small (~2 MB) stack, which a
35+
deep consumer future (e.g. a multi-stage OCR pipeline) overflows — aborting the whole process
36+
with `SIGBUS`. The `#[pymodule]` init now installs a `tokio` runtime with a 16 MB
37+
`thread_stack_size` before the first `future_into_py`.
38+
- **pyo3**: serialize `dict`/`list` values for JSON (`serde_json::Value`) config fields in the
39+
generated `api.py` converters. PyO3 cannot expose a settable `serde_json::Value` field, so the
40+
binding stores such fields as `str`, while the public dataclass and `.pyi` stub type them as
41+
`dict[str, Any]`. The converter forwarded the dict straight through, so the documented dict form
42+
raised `TypeError: 'dict' object is not an instance of 'str'` at runtime; it now `json.dumps`es a
43+
dict/list (passing `str`/`None` through unchanged).
44+
- **codegen**: generate compiling binding→core conversions for core structs that have private
45+
(`pub(crate)`) fields. Such a struct cannot be built with struct-literal syntax from a foreign
46+
crate — neither by naming the private field nor by patching it with `..Default::default()` — so
47+
the conversion now seeds the core type's `Default` (which fills the private fields inside the
48+
defining crate) and assigns only the public fields onto it. The strategy is centralized in a
49+
shared helper used by the pyo3/napi/wasm/extendr/rustler/magnus generator, the Dart mirror crate
50+
generator, and the PHP enum-tainted conversion path; when the core type has private fields but no
51+
`Default`, a `compile_error!` guides the author to derive `Default`. A new `has_private_fields`
52+
flag on struct IR records the condition during extraction.
53+
- **php**: marshal owned (by-value) native-struct callback parameters by value rather than
54+
dereferencing them as a borrow (`(*input)` does not type-check on an owned `core::T`), and stop
55+
emitting the native-object return fast-path — a PHP `#[php_class]` binding struct implements
56+
`FromZvalMut` (for `&mut T`) but not `FromZval` (for `T`), so the bridge keeps the JSON return
57+
path that is well-defined for PHP.
58+
- **pyo3**: marshal owned (by-value) native-struct callback parameters into the host's native
59+
binding object via `From<core::T>`, the same way borrowed ones already were. A trait method that
60+
takes a serde struct by value (e.g. an extraction-input envelope) previously passed the raw
61+
`core::T` across the Python boundary, which has no `IntoPyObject` and failed to compile.
62+
- **pyo3**: when a core `register_*` free function shares its name with a trait bridge's
63+
`register_fn`, emit only the bridge's duck-typed registration. The function loop no longer also
64+
emits the auto-wrapped core version, which collided (`E0428`) with the bridge definition and no
65+
longer type-checks against a registry that takes `Arc<dyn Trait>`.
3166
- **pyo3**: the generated Python package now type-checks clean under `mypy`. Data-enum config fields
3267
are annotated against their public class (so `EmbeddingConfig(model=EmbeddingModelType.plugin(...))`
3368
is accepted) instead of a flattened union alias that shadowed the class; constructors accept the

src/backends/csharp/gen_bindings/types/tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ fn record_type(fields: Vec<FieldDef>) -> TypeDef {
4949
binding_exclusion_reason: None,
5050
is_variant_wrapper: false,
5151
has_lifetime_params: false,
52+
has_private_fields: false,
5253
version: Default::default(),
5354
}
5455
}

src/backends/csharp/gen_visitor.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -662,6 +662,7 @@ mod tests {
662662
binding_exclusion_reason: None,
663663
is_variant_wrapper: false,
664664
has_lifetime_params: false,
665+
has_private_fields: false,
665666
version: Default::default(),
666667
}
667668
}

src/backends/csharp/trait_bridge/tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ fn make_trait_def(name: &str) -> TypeDef {
2323
binding_exclusion_reason: None,
2424
is_variant_wrapper: false,
2525
has_lifetime_params: false,
26+
has_private_fields: false,
2627
version: Default::default(),
2728
}
2829
}

src/backends/dart/gen_bindings/service_api/tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,7 @@ fn test_skip_sanitized_finalize() {
342342
binding_exclusion_reason: None,
343343
is_variant_wrapper: false,
344344
has_lifetime_params: false,
345+
has_private_fields: false,
345346
version: Default::default(),
346347
};
347348

src/backends/dart/gen_bindings/types.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,7 @@ mod tests {
276276
binding_exclusion_reason: None,
277277
is_variant_wrapper: false,
278278
has_lifetime_params: false,
279+
has_private_fields: false,
279280
version: Default::default(),
280281
}
281282
}

src/backends/dart/gen_rust_crate/mirror_conversions.rs

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,40 @@ pub(super) fn emit_from_mirror_to_core_struct(out: &mut String, ty: &TypeDef, so
465465
ty.rust_path.replace('-', "_")
466466
};
467467

468+
// Core types with private (non-`pub`) fields cannot be built with struct-literal syntax
469+
// from the mirror crate. Seed the core `Default` (which fills the private fields) and
470+
// assign the public fields onto it — via the shared construction strategy used by every
471+
// backend. Excluded / lossily-sanitized fields are left at their default on the base.
472+
if ty.has_private_fields {
473+
let mut assignments = Vec::new();
474+
for field in &ty.fields {
475+
if field.binding_excluded {
476+
continue;
477+
}
478+
let safe_sanitized_string = matches!(field.ty, TypeRef::String) && field.core_wrapper == CoreWrapper::Cow;
479+
if field.sanitized && !safe_sanitized_string {
480+
continue;
481+
}
482+
assignments.push(crate::codegen::conversions::construction::FieldAssign {
483+
core_field: field.name.clone(),
484+
expr: field_from_expr_to_core(field, source_crate_name),
485+
});
486+
}
487+
out.push_str(&crate::codegen::conversions::construction::gen_private_field_from_impl(
488+
&crate::codegen::conversions::construction::PrivateFieldImpl {
489+
core_path: &core_ty,
490+
binding_name: name,
491+
param: "v",
492+
has_default: ty.has_default,
493+
assignments: &assignments,
494+
allow_attrs: &[
495+
"clippy::field_reassign_with_default, clippy::let_and_return, clippy::useless_conversion",
496+
],
497+
},
498+
));
499+
return;
500+
}
501+
468502
// The generated literal ends with `..Default::default()` whenever some core
469503
// fields are intentionally omitted from the explicit field list AND the core
470504
// type derives Default (the spread itself requires Default — otherwise E0277).

src/backends/dart/gen_rust_crate/trait_bridge/tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ fn empty_type_def(name: &str, is_trait: bool) -> TypeDef {
2525
binding_exclusion_reason: None,
2626
is_variant_wrapper: false,
2727
has_lifetime_params: false,
28+
has_private_fields: false,
2829
version: Default::default(),
2930
}
3031
}

src/backends/extendr/gen_bindings/tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ fn make_api_surface() -> ApiSurface {
7575
is_variant_wrapper: false,
7676

7777
has_lifetime_params: false,
78+
has_private_fields: false,
7879
version: Default::default(),
7980
}],
8081
functions: vec![FunctionDef {

src/backends/extendr/gen_bindings/tests/trait_bridge.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,7 @@ fn regression_namespace_exports_functions_types_enums() {
162162
is_variant_wrapper: false,
163163

164164
has_lifetime_params: false,
165+
has_private_fields: false,
165166
version: Default::default(),
166167
});
167168
// Add a flat data enum (has variant with data, single field)
@@ -300,6 +301,7 @@ clear_fn = "clear_greeters"
300301
binding_exclusion_reason: None,
301302
is_variant_wrapper: false,
302303
has_lifetime_params: false,
304+
has_private_fields: false,
303305
version: Default::default(),
304306
};
305307
let doc_ty = TypeDef {
@@ -389,6 +391,7 @@ fn r_field_long_descriptions_are_truncated_to_fit_120_char_lines() {
389391
is_variant_wrapper: false,
390392

391393
has_lifetime_params: false,
394+
has_private_fields: false,
392395
version: Default::default(),
393396
}],
394397
functions: vec![],

0 commit comments

Comments
 (0)