Skip to content

Commit 6cde50e

Browse files
authored
fix: type-safe generated bindings — exhaustive type-check sweep across languages
* fix(pyo3): type data-enum config fields as the class, not a flattened union alias options.py typed config fields that reference a data enum (e.g. EmbeddingConfig.model) against a locally-synthesized flattened union alias (EmbeddingModelType = str | int | LlmConfig) instead of the data-enum class users actually construct. The exported symbol and the field type diverged, so the documented usage EmbeddingConfig(model=EmbeddingModelType.plugin("lilbee")) failed mypy even though it runs fine, forcing consumers to carry # type: ignore[arg-type]. Data enums now import from the native module at runtime (joining the existing native enum import block) and are referenced by their class name in field annotations. Enums with a unit (tag-only) variant additionally accept a bare string tag (<Class> | str) so string defaults like output_format="native" still type-check; payload-only enums such as EmbeddingModelType are class-only. The flattened union aliases and their now-unused templates are removed. Closes #157 * fix(pyo3): accept the public config dataclass in data-enum factory stubs The data-enum factory staticmethods (e.g. EmbeddingModelType.llm) typed their config-DTO param against the compiled _xberg class, but the public name users pass is the options.py @DataClass. So EmbeddingModelType.llm(LlmConfig(...)) — the documented #1165 usage — failed mypy ("expected xberg._xberg.LlmConfig, got xberg.options.LlmConfig") even though the runtime coerces it. A dataclass-backed config-DTO factory param is now typed `options.<Name> | dict[str, Any]` — the public dataclass or a dict, matching what the runtime __alef_coerce_dto accepts. The classifier is the existing coercible_dto_names (shared with the variant-constructor coercion), so the typed surface and the runtime coercion stay in lockstep; non-DTO params map exactly as before. The .pyi gains a stub-only `from . import options` (never executed, so no runtime cycle with options.py). With this and the preceding commit, EmbeddingConfig(model=EmbeddingModelType.llm(LlmConfig(...))) type-checks end to end. * refactor(pyo3): split types.rs unit tests into a sibling module The preceding field-typing change pushed gen_bindings/types.rs past the 1,000-line cap. Move its `mod tests` into `types/tests.rs` (mirroring `gen_stubs/enums/tests.rs`), bringing the file back to 800 lines. Pure move — generated output is byte-identical. * test(pyo3): update options.py data-enum test to the class-import behavior The integration test asserted the old design (data enums emitted as a `StructureKind = str` flattened alias, not runtime-imported). The field-typing change inverts that — data enums are imported from the native module as their class and referenced by name. Update the assertions (and rename the test) to the new behavior. Caught by CI's full `cargo test`, which runs the `tests/` integration suite that `cargo test --lib` skips. * fix(pyo3): make generated Python stubs/api type-check (stub + import fixes) Five generator fixes that clear mypy errors in the generated Python package (59 -> 34), none caught by e2e today (which runs pytest/ruff but not mypy): - Data-enum stub gains the `__init__(value, **kwargs)` the runtime `#[new]` actually exposes (gated on the same not-sanitized condition), so converters constructing `OutputFormat(value)` type-check instead of "Too many arguments". - Protocol stubs substitute excluded types (e.g. `InternalDocument`) with their JSON form, matching the go/zig/gleam backends and the runtime bridge — fixes undefined-name in the `.pyi` (the same leak also breaks the napi `.d.ts`). - Trait-bridge `clear_*` is no longer re-declared when it is already a plain registry function (`no-redef`). - TypedDict-style return types are only marked "local" when `has_default` (matching the emission loop), so a return-only class referenced in a TypedDict field (e.g. `ExtractionConfidence`) is imported instead of left undefined. - The dict-coercion helper uses a distinct loop variable for the data-enum pass so its classes are not flagged as an incompatible reassignment. * fix(pyo3): register-wrapper Protocol typing + JSON fields as dict - The api.py `register_*` pass-through wrappers typed `backend: object`, but the native `register_*` (and the `.pyi` stub) expect the host-implementable Protocol — so the wrapper failed to type-check. Type `backend` against the bridge's trait Protocol when resolvable. - `TypeRef::Json` mapped to `str` in the options module but `dict[str, Any]` in the native module, so JSON-valued config fields (`paddle_ocr_config`, `additional`, …) disagreed with the compiled config they convert into. Map Json to `dict[str, Any]` (default None) to match. * fix(pyo3): share constructor param resolution + guard list data-enum coercion - The `.pyi` `__init__` stub resolved param names via rename_fields only, while the runtime `#[new]` (and the converter) deliberately prefer the serde-rename wire name for cross-binding API parity — so the stub drifted (`max_characters` vs the constructor's `max_chars`). Extract the constructor's `resolve_param_ident` to a shared function and use it for the stub too, so the two share one source of truth and cannot diverge. - The Vec-of-data-enum coercion built each element with `_rust.Enum(v)` unconditionally; add the same isinstance guard the scalar path uses so an already-native element is passed through rather than re-wrapped (which a type checker rejects, since the element is `Enum | str`). * fix(pyo3): skip cfg-gated fields in api.py converters A cfg-gated field is conditionally compiled out of the native `#[new]` constructor and omitted from the `.pyi` stub (which cannot express `#[cfg]`), but the converter passed it as a keyword argument regardless — an unknown-kwarg error against the compiled module. Filter `f.cfg.is_none()` in the constructor field loop, mirroring the stub. * fix(pyo3): pass sanitized data enums through in converters (no construction) A data enum with a sanitized (unresolvable) variant field gets no serde-based `#[new]` — it is a return-only type with no `__init__`. The converter still tried to construct it from the field value, a "too many arguments" type error. Exclude sanitized data enums from the converter's data-enum set so an existing native instance is passed straight through. * fix(pyo3): never emit a TypedDict for a reexported-native type (single identity) A type listed in `reexported_types` is re-exported as a native pyclass in the public package, so it is native everywhere — api.py imports it native, results return it native. But options.py also emitted a parallel `TypedDict` for it and typed fields against that, giving the type a second, structurally-incompatible identity: a field `ocr_result: ExtractionResult` was the TypedDict while the converter expected the native class. Thread `reexported_types` into the options-module generator and exclude those types from TypedDict emission / local definition, so they are imported and referenced as the one native class. This is the architectural fix for the dual-representation class of converter type errors (not a cast/overload workaround). * fix(pyo3): reexported-native converters return the value, not a field rebuild Following the single-identity fix, a reexported-native type's `_to_rust_*` converter still rebuilt the result field-by-field — calling `_to_rust_metadata`/`_to_rust_document_structure` on fields that are already native (a type error). After the str/dict/None handling the value is already the native class, so the converter returns it directly. Native types deserialize themselves via serde; no Python-side field reconstruction is needed or correct for them. * fix(pyo3): type the dict-coercion helper as a dict identity for TypedDicts The `_coerce_dict_*` helper coerces a raw dict and returns the public type. For a TypedDict-backed config it constructed the result with `**value` (which mypy rejects for a TypedDict) and was called on a `TypedDict | dict` value (not assignable to its `dict[str, Any]` param). A TypedDict IS its dict at runtime, so for the TypedDict case the helper now returns `cast("X", value)` (the correct identity, not a `**` rebuild) and is called on a copied `dict(value)` so it never mutates the caller's mapping. Dataclass-backed configs keep genuine `X(**value)` construction. * fix(pyo3): Option<T> #[new] param for serde-default data-enum fields A non-optional data-enum field carried with a serde default (e.g. ChunkingConfig.sizing) is None-able in the public surface, but its `#[new]` param was non-optional, so the converter omitted it via a conditional `**{...}` keyword-spread that no type checker can verify. Extend `should_option_for_nested_default` to cover data enums (they live in api.enums, not api.types): the param becomes `Option<T>` (None -> core default via unwrap_or_else), and the converter passes the coerced value or None as a direct keyword argument. This is the right architecture — the public optionality is reflected in the native constructor — not a cast or a spread workaround. * fix(pyo3): function stubs preserve order + promote trailing optionals The `.pyi` function-stub generator partitioned params required-before-optional and reordered them, which (a) diverged from the runtime `#[pyo3(signature = ...)]` order and (b) dropped `| None` from a param promoted to optional only because a preceding param had a default — e.g. `resolve(preset, custom_schema=None, context=None)` where `context: Option<...>` was emitted as a required `context: dict[str, str]`. Emit params in declaration order with the same `is_promoted_optional` promotion the PyO3 binding and the api.py wrapper apply, so the stub matches the real signature. This completes the Python surface: `mypy -p xberg` over the full generated package is clean (0 errors, down from 66 on alef main). * refactor(pyo3): extract gen_typeddict to a sibling module Blast-radius review follow-up: the Python field-typing additions pushed types.rs over the 800-line split-before-adding guideline. Move the self-contained TypedDict generator into types/typeddict.rs (728 lines remaining). Pure move + comment tightening — generated output is byte-identical. * fix(napi): substitute excluded types in the .d.ts host interface The TypeScript host-implementable interface (DocumentExtractor, Renderer) typed method params and returns directly, so an excluded type that is never emitted as a `.d.ts` declaration (e.g. `InternalDocument`) leaked in as an undefined name — 3 `tsc --strict` errors. Substitute excluded types with their JSON marshaling form (`JsonValue`), the same fix applied to the pyo3 protocol stubs and mirrored from the go backend. `tsc --noEmit --strict` over the generated `index.d.ts` is now clean (0 errors, down from 3). * fix(magnus): dedup clear_* + substitute excluded types in the .rbs interface `rbs validate` over the generated Ruby signatures surfaced the same two root causes already fixed in the Python and TypeScript surfaces: (1) a trait-bridge `clear_*` re-declared when it is also a plain registry function (RBS DuplicatedMethodDefinitionError), and (2) an excluded type (`InternalDocument`) referenced in a host interface method but never emitted as an RBS declaration (NoTypeFoundError). Skip bridge functions whose name is already declared, and substitute excluded types with their JSON form (`json_value`). `rbs validate` is now clean (0 errors, down from 3). * refactor(codegen): centralize excluded-type substitution into codegen::shared The go, pyo3, napi, and magnus backends each carried a byte-identical substitute_excluded fn that rewrites Named references to binding-excluded types (e.g. InternalDocument) into TypeRef::Json for trait-bridge interface and stub signatures. Hoist the single implementation into codegen::shared::substitute_excluded_types and have all four backends call it; go re-exports it through its helpers module so existing tests and the method_with_excluded_substituted helper keep their paths. No generated-output change: the shared body matches the removed copies, and the full lib suite (4134 tests) including the go substitution and stub snapshot tests passes. * feat(snippets): add typecheck validation level Adds a TypeCheck level to the snippet validator, ordered between Compile and Run (the strongest static guarantee short of execution). For Python it runs `python -m mypy` against the snippet so it is checked with the installed package's type stubs — catching dual-representation bugs (a config field typed against a flattened union alias rejecting the documented data-enum constructor) that py_compile cannot see. mypy is optional: a missing module is reported Unavailable, not a spurious failure. Languages whose compiler already type-checks (Rust, Go, TS via tsc, the JVM and native toolchains) map TypeCheck onto their existing compile path, so the level is meaningful everywhere without per-language special-casing. Adds a matching `snippet:typecheck-only` ceiling annotation alongside syntax-only and compile-only. * docs(changelog): record type-safety sweep and typecheck snippet level under Unreleased * feat(snippets): strict per-language typecheck commands for compiled languages The typecheck level now runs each compiled language's strict static checker instead of aliasing to its lenient compile step, and none of them need the native library: - go: go vet ./... (type-checks + vet analyzers) - java: javac -Xlint:all -Werror - csharp dotnet build -warnaserror (nullable already enabled) - swift: swiftc -typecheck -warnings-as-errors (no link) - kotlin kotlinc -Werror - dart: dart analyze --fatal-infos - c: cc -fsyntax-only -Wall -Werror (no link) Java, Kotlin, and Swift raise max_level from Compile to TypeCheck so the strict level is actually reachable (a bare snippet cannot be executed, so the static type-check is the deepest reliable level for them). Elixir keeps its parse-level compile: there is no per-snippet strict checker without a mix project. Verified empirically: valid snippets pass and type-broken snippets fail at typecheck across c/go/java/swift/python with precise diagnostics (e.g. go vet 'cannot use "string" as int', javac 'incompatible types', swiftc 'cannot convert String to Int'). * docs(changelog): list strict per-language typecheck commands * fix(kotlin): move jni_emitter tests last in the flattened module jni_emitter.rs include!s its sub-files into one module. bridge_object.rs (included first) ended with the `#[cfg(test)] mod tests`, so every production item in the seven files included after it counted as appearing after a test module — tripping clippy::items_after_test_module under -D warnings. Extract the test module into jni_emitter/tests.rs and include! it last so the test module is the final item. No behavior change; the test still passes.
1 parent 52ad0a8 commit 6cde50e

56 files changed

Lines changed: 1070 additions & 575 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10-
### Fixed
10+
### Added
11+
12+
- **snippets**: `typecheck` validation level. Ordered between `compile` and `run`, it statically
13+
type-checks a snippet without executing it, and for compiled languages without needing the native
14+
library. Each language runs its strict static checker: `python -m mypy`, `tsc --noEmit`,
15+
`cargo check`, `go vet`, `javac -Xlint:all -Werror`, `dotnet build -warnaserror`,
16+
`swiftc -typecheck -warnings-as-errors`, `kotlinc -Werror`, `dart analyze --fatal-infos`, and
17+
`cc -fsyntax-only -Wall -Werror`. This catches dual-representation mistakes (a config field typed
18+
against a flattened union alias that rejects the documented data-enum constructor) that
19+
`py_compile` and a lenient compile cannot see. A matching `snippet:typecheck-only` ceiling
20+
annotation sits alongside `syntax-only` and `compile-only`. mypy is optional: when it is not
21+
installed the Python snippet is reported as unavailable rather than failing.
22+
23+
### Fixed
24+
25+
- **pyo3**: the generated Python package now type-checks clean under `mypy`. Data-enum config fields
26+
are annotated against their public class (so `EmbeddingConfig(model=EmbeddingModelType.plugin(...))`
27+
is accepted) instead of a flattened union alias that shadowed the class; constructors accept the
28+
public dataclass/dict for factory parameters; data-enum `__init__` signatures match the runtime
29+
`#[new]`; `Json` maps to `dict[str, Any]`; and the duplicate `clear_*` registry stub is no longer
30+
emitted twice.
31+
- **napi**: substitute binding-excluded types (e.g. `InternalDocument`) with `JsonValue` in the
32+
`.d.ts` host-interface signatures. Referencing a type that is never emitted produced an undefined
33+
TypeScript name; the runtime bridge marshals such values as JSON, so `JsonValue` is the faithful
34+
stand-in and `tsc --strict` is clean.
35+
- **magnus**: apply the same excluded-type substitution (to `json_value`) in generated `.rbs`
36+
interfaces and skip re-declaring a bridge `clear_*` function that is already exposed as a registry
37+
function, so `rbs validate` no longer reports an undefined type or a duplicated method definition.
1138

1239
- **go/java**: avoid callback return local-name collisions in generated trait
1340
bridges when a method parameter is named `result`.

src/backends/go/trait_bridge/helpers.rs

Lines changed: 6 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,12 @@ use crate::backends::go::type_map::go_type;
22
use crate::core::ir::{MethodDef, TypeRef};
33
use std::collections::HashSet;
44

5-
/// Recursively substitute `TypeRef::Named(n)` references where `n` is explicitly excluded
6-
/// from the binding's public surface (collected from `ApiSurface::excluded_type_paths` and
7-
/// any `binding_excluded` types) with `TypeRef::Json`. This lets trait-bridge interface
8-
/// signatures and trampolines fall back to `json.RawMessage`, since the named Go type was
9-
/// never emitted into `binding.go` and would otherwise produce `undefined: <Name>` build
10-
/// errors.
11-
pub(super) fn substitute_excluded_types(ty: &TypeRef, excluded: &HashSet<&str>) -> TypeRef {
12-
match ty {
13-
TypeRef::Named(name) if excluded.contains(name.as_str()) => TypeRef::Json,
14-
TypeRef::Optional(inner) => TypeRef::Optional(Box::new(substitute_excluded_types(inner, excluded))),
15-
TypeRef::Vec(inner) => TypeRef::Vec(Box::new(substitute_excluded_types(inner, excluded))),
16-
TypeRef::Map(k, v) => TypeRef::Map(
17-
Box::new(substitute_excluded_types(k, excluded)),
18-
Box::new(substitute_excluded_types(v, excluded)),
19-
),
20-
other => other.clone(),
21-
}
22-
}
5+
/// Recursively substitute `TypeRef::Named(n)` references for types excluded from the binding's
6+
/// public surface with `TypeRef::Json`. For Go this lets trait-bridge interface signatures and
7+
/// trampolines fall back to `json.RawMessage`, since the named Go type was never emitted into
8+
/// `binding.go` and would otherwise produce `undefined: <Name>` build errors. Shared with the
9+
/// other backends — see [`crate::codegen::shared::substitute_excluded_types`].
10+
pub(super) use crate::codegen::shared::substitute_excluded_types;
2311

2412
/// Clone a `MethodDef`, substituting any excluded named-type references in its
2513
/// parameters and return type with `TypeRef::Json`. See [`substitute_excluded_types`].

src/backends/kotlin/gen_bindings/jni_emitter.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,6 @@ include!("jni_emitter/constructors.rs");
3232
include!("jni_emitter/binary_json.rs");
3333
include!("jni_emitter/trait_bridge.rs");
3434
include!("jni_emitter/paths.rs");
35+
// Included last so the `#[cfg(test)]` module is the final item in this flattened module
36+
// (`clippy::items_after_test_module`).
37+
include!("jni_emitter/tests.rs");

src/backends/kotlin/gen_bindings/jni_emitter/bridge_object.rs

Lines changed: 0 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -237,29 +237,3 @@ fn trait_bridge_manages_jni_function(func_name: &str, config: &ResolvedCrateConf
237237
|| bridge.clear_fn.as_deref() == Some(func_name))
238238
})
239239
}
240-
241-
#[cfg(test)]
242-
mod tests {
243-
use super::*;
244-
use crate::core::config::{KotlinAndroidConfig, TraitBridgeConfig};
245-
246-
#[test]
247-
fn jni_bridge_object_treats_android_trait_lifecycle_functions_as_managed() {
248-
let config = ResolvedCrateConfig {
249-
kotlin_android: Some(KotlinAndroidConfig::default()),
250-
trait_bridges: vec![TraitBridgeConfig {
251-
trait_name: "Renderer".to_string(),
252-
register_fn: Some("register_renderer".to_string()),
253-
unregister_fn: Some("unregister_renderer".to_string()),
254-
clear_fn: Some("clear_renderers".to_string()),
255-
..TraitBridgeConfig::default()
256-
}],
257-
..ResolvedCrateConfig::default()
258-
};
259-
260-
assert!(trait_bridge_manages_jni_function("register_renderer", &config));
261-
assert!(trait_bridge_manages_jni_function("unregister_renderer", &config));
262-
assert!(trait_bridge_manages_jni_function("clear_renderers", &config));
263-
assert!(!trait_bridge_manages_jni_function("list_renderers", &config));
264-
}
265-
}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
// Tests for the JNI emitter. Kept in a separate file `include!`d last by `jni_emitter.rs` so the
2+
// `#[cfg(test)]` module is the final item in the flattened module (the other `include!`d files
3+
// contribute production items, which must not follow a test module — `clippy::items_after_test_module`).
4+
5+
#[cfg(test)]
6+
mod tests {
7+
use super::*;
8+
use crate::core::config::{KotlinAndroidConfig, TraitBridgeConfig};
9+
10+
#[test]
11+
fn jni_bridge_object_treats_android_trait_lifecycle_functions_as_managed() {
12+
let config = ResolvedCrateConfig {
13+
kotlin_android: Some(KotlinAndroidConfig::default()),
14+
trait_bridges: vec![TraitBridgeConfig {
15+
trait_name: "Renderer".to_string(),
16+
register_fn: Some("register_renderer".to_string()),
17+
unregister_fn: Some("unregister_renderer".to_string()),
18+
clear_fn: Some("clear_renderers".to_string()),
19+
..TraitBridgeConfig::default()
20+
}],
21+
..ResolvedCrateConfig::default()
22+
};
23+
24+
assert!(trait_bridge_manages_jni_function("register_renderer", &config));
25+
assert!(trait_bridge_manages_jni_function("unregister_renderer", &config));
26+
assert!(trait_bridge_manages_jni_function("clear_renderers", &config));
27+
assert!(!trait_bridge_manages_jni_function("list_renderers", &config));
28+
}
29+
}

src/backends/magnus/gen_stubs.rs

Lines changed: 40 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
use crate::backends::magnus::type_map::rbs_type;
2-
use crate::codegen::shared::binding_fields;
2+
use crate::codegen::shared::{binding_fields, substitute_excluded_types};
33
use crate::core::config::TraitBridgeConfig;
44
use crate::core::hash::{self, CommentStyle};
55
use crate::core::ir::{ApiSurface, EnumDef, FunctionDef, MethodDef, TypeDef};
@@ -69,27 +69,39 @@ pub fn gen_stubs(
6969
interface_trait_names.insert(bridge.trait_name.clone());
7070
}
7171
}
72+
// Names already emitted as module methods above (from `api.functions`). A bridge's `clear_fn`
73+
// is often also exposed as a regular registry function (`clear_embedding_backends`), so emitting
74+
// it again here would be a duplicate definition (RBS `DuplicatedMethodDefinitionError`). Skip any
75+
// bridge function whose name already has a declaration; register/unregister are bridge-only.
76+
let declared_function_names: std::collections::HashSet<&str> =
77+
api.functions.iter().map(|f| f.name.as_str()).collect();
7278
for bridge in trait_bridges {
7379
if let Some(register_fn) = bridge.register_fn.as_deref() {
74-
// Type the `backend` param against the host-implementable interface when one was
75-
// emitted for this bridge's trait; otherwise fall back to `untyped`.
76-
let backend_type = if interface_trait_names.contains(&bridge.trait_name) {
77-
plugin_interface_name(&bridge.trait_name)
78-
} else {
79-
"untyped".to_string()
80-
};
81-
lines.push(format!(
82-
" def self.{register_fn}: ({backend_type} backend, String name) -> nil"
83-
));
84-
lines.push("".to_string());
80+
if !declared_function_names.contains(register_fn) {
81+
// Type the `backend` param against the host-implementable interface when one was
82+
// emitted for this bridge's trait; otherwise fall back to `untyped`.
83+
let backend_type = if interface_trait_names.contains(&bridge.trait_name) {
84+
plugin_interface_name(&bridge.trait_name)
85+
} else {
86+
"untyped".to_string()
87+
};
88+
lines.push(format!(
89+
" def self.{register_fn}: ({backend_type} backend, String name) -> nil"
90+
));
91+
lines.push("".to_string());
92+
}
8593
}
8694
if let Some(unregister_fn) = bridge.unregister_fn.as_deref() {
87-
lines.push(format!(" def self.{unregister_fn}: (String name) -> nil"));
88-
lines.push("".to_string());
95+
if !declared_function_names.contains(unregister_fn) {
96+
lines.push(format!(" def self.{unregister_fn}: (String name) -> nil"));
97+
lines.push("".to_string());
98+
}
8999
}
90100
if let Some(clear_fn) = bridge.clear_fn.as_deref() {
91-
lines.push(format!(" def self.{clear_fn}: () -> nil"));
92-
lines.push("".to_string());
101+
if !declared_function_names.contains(clear_fn) {
102+
lines.push(format!(" def self.{clear_fn}: () -> nil"));
103+
lines.push("".to_string());
104+
}
93105
}
94106
}
95107

@@ -140,6 +152,16 @@ fn gen_plugin_interface_stub(bridge: &TraitBridgeConfig, api: &ApiSurface) -> Op
140152
}
141153
api.types.iter().find(|t| t.name == bridge.trait_name)?;
142154

155+
// Types excluded from the binding surface (e.g. `InternalDocument`) are not emitted as RBS
156+
// declarations, so an interface method referencing one would be an undefined RBS type. Substitute
157+
// them with their JSON marshaling form, matching the runtime bridge (and the go/pyo3/napi backends).
158+
let excluded: std::collections::HashSet<&str> = api
159+
.excluded_type_paths
160+
.keys()
161+
.map(String::as_str)
162+
.chain(api.types.iter().filter(|t| t.binding_excluded).map(|t| t.name.as_str()))
163+
.collect();
164+
143165
let interface_name = plugin_interface_name(&bridge.trait_name);
144166
let mut lines = vec![format!(" interface {interface_name}")];
145167

@@ -151,15 +173,15 @@ fn gen_plugin_interface_stub(bridge: &TraitBridgeConfig, api: &ApiSurface) -> Op
151173
.params
152174
.iter()
153175
.map(|p| {
154-
let param_type = rbs_type(&p.ty);
176+
let param_type = rbs_type(&substitute_excluded_types(&p.ty, &excluded));
155177
if p.optional {
156178
format!("?{} {}", param_type, p.name)
157179
} else {
158180
format!("{} {}", param_type, p.name)
159181
}
160182
})
161183
.collect();
162-
let return_type = rbs_type(&method.return_type);
184+
let return_type = rbs_type(&substitute_excluded_types(&method.return_type, &excluded));
163185
lines.push(format!(
164186
" def {}: ({}) -> {}",
165187
method.name,

src/backends/napi/gen_bindings/errors.rs

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
//! TypeScript declaration file (`.d.ts`) generation for NAPI-RS bindings.
22
33
use crate::codegen::naming::{to_node_name, wire_variant_value};
4-
use crate::codegen::shared::binding_fields;
4+
use crate::codegen::shared::{binding_fields, substitute_excluded_types};
55
use crate::core::config::NodeCapsuleTypeConfig;
66
use crate::core::hash::{self, CommentStyle};
77
use crate::core::ir::{ApiSurface, EnumDef, FunctionDef, ParamDef, TypeDef, TypeRef};
@@ -268,6 +268,16 @@ pub(super) fn gen_dts(
268268
Decl::VisitorInterface(typ) => {
269269
// Emit visitor trait as a TypeScript interface with optional callback methods.
270270
// Each method becomes an optional property with a function signature.
271+
//
272+
// Types excluded from the binding surface (e.g. `InternalDocument`) are not emitted as
273+
// `.d.ts` declarations, so substitute them with their JSON marshaling form in method
274+
// signatures — otherwise the interface references an undefined TS name.
275+
let excluded: std::collections::HashSet<&str> = api
276+
.excluded_type_paths
277+
.keys()
278+
.map(String::as_str)
279+
.chain(api.types.iter().filter(|t| t.binding_excluded).map(|t| t.name.as_str()))
280+
.collect();
271281
lines.extend(format_jsdoc(&typ.doc, ""));
272282
lines.push(format!("export interface {} {{", typ.name));
273283
if trait_bridge_requires_plugin_name(typ, trait_bridges) {
@@ -278,8 +288,20 @@ pub(super) fn gen_dts(
278288
if trait_bridge_requires_plugin_name(typ, trait_bridges) && method.name == "name" {
279289
continue;
280290
}
281-
let params = dts_params(&method.params, no_prefix, default_types);
282-
let ret = trait_bridge_dts_return_type(&method.return_type, method.is_async, no_prefix);
291+
let sub_params: Vec<ParamDef> = method
292+
.params
293+
.iter()
294+
.map(|p| ParamDef {
295+
ty: substitute_excluded_types(&p.ty, &excluded),
296+
..p.clone()
297+
})
298+
.collect();
299+
let params = dts_params(&sub_params, no_prefix, default_types);
300+
let ret = trait_bridge_dts_return_type(
301+
&substitute_excluded_types(&method.return_type, &excluded),
302+
method.is_async,
303+
no_prefix,
304+
);
283305
lines.extend(format_jsdoc(&method.doc, " "));
284306
let optional_marker = if method.has_default_impl { "?" } else { "" };
285307
lines.push(format!(" {js_name}{optional_marker}({params}): {ret}"));

src/backends/pyo3/gen_bindings/constructors.rs

Lines changed: 41 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,31 @@ use crate::core::config::TraitBridgeConfig;
44
use crate::core::ir::{ApiSurface, FieldDef, TypeDef, TypeRef};
55
use std::collections::HashMap;
66

7+
/// Resolve the PyO3 constructor parameter identifier for a field — the single source of truth for
8+
/// the `#[new]` signature AND the `.pyi` `__init__` stub, so the two cannot drift apart.
9+
///
10+
/// Prefers the serde rename (the wire name, deliberately used for constructor params so the public
11+
/// surface matches the other language bindings — see [`replace_constructor_with_serde_rename`]),
12+
/// then a per-language `rename_fields` entry, then the bare field name. Only applies a resolved name
13+
/// when it is a syntactically valid Rust identifier (e.g. `"self-harm"` falls back to the field
14+
/// name); a valid name that is also a Rust keyword (e.g. `"type"`) is escaped as `r#type`.
15+
pub(in crate::backends::pyo3) fn resolve_param_ident<'a>(
16+
field_name: &'a str,
17+
serde_rename: Option<&'a String>,
18+
config_renames: Option<&HashMap<String, String>>,
19+
) -> String {
20+
use crate::core::keywords::{is_valid_rust_ident_chars, rust_raw_ident};
21+
let wire_name = serde_rename
22+
.map(|s| s.as_str())
23+
.or_else(|| config_renames.and_then(|r| r.get(field_name)).map(|s| s.as_str()))
24+
.unwrap_or(field_name);
25+
if is_valid_rust_ident_chars(wire_name) {
26+
rust_raw_ident(wire_name)
27+
} else {
28+
field_name.to_string()
29+
}
30+
}
31+
732
/// Replace the constructor in an impl block with one that honors serde_rename.
833
/// For has_default types, the constructor parameters should use serde_rename names
934
/// (the JSON wire names) to match other language bindings' public APIs.
@@ -20,8 +45,6 @@ pub(super) fn replace_constructor_with_serde_rename(
2045
never_skip_cfg_field_names: &[String],
2146
api: &ApiSurface,
2247
) -> String {
23-
use crate::core::keywords::{is_valid_rust_ident_chars, rust_raw_ident};
24-
2548
// When the type already has an explicit static `new()` method in its IR, do not
2649
// emit a second field-based `#[new]` constructor — the static method will be emitted
2750
// as `#[staticmethod] pub fn new(...)` and PyO3 forbids two `new` registrations in
@@ -41,43 +64,23 @@ pub(super) fn replace_constructor_with_serde_rename(
4164
if !typ.has_default || field.optional || matches!(&field.ty, TypeRef::Optional(_)) {
4265
return false;
4366
}
44-
if let TypeRef::Named(ref type_name) = field.ty {
45-
api.types
46-
.iter()
47-
.find(|t| t.name == *type_name)
48-
.map(|t| t.has_default)
49-
.unwrap_or(false)
50-
} else {
51-
false
52-
}
53-
}
54-
55-
/// Resolve the constructor parameter identifier for a field.
56-
///
57-
/// Prefers the serde rename (or config rename) over the bare Rust field name, but only
58-
/// when the resolved name is a syntactically valid Rust identifier (i.e. contains only
59-
/// `[A-Za-z0-9_]` and does not start with a digit). Names like `"self-harm"` or
60-
/// `"self-harm/intent"` (containing hyphens or slashes) are not valid Rust identifiers
61-
/// even with `r#` escaping, so the function falls back to the Rust field name in those
62-
/// cases. When the resolved name is valid but happens to be a Rust keyword (e.g. `"type"`),
63-
/// it is escaped as a raw identifier (`r#type`).
64-
fn resolve_param_ident<'a>(
65-
field_name: &'a str,
66-
serde_rename: Option<&'a String>,
67-
config_renames: Option<&HashMap<String, String>>,
68-
) -> String {
69-
let wire_name = serde_rename
70-
.map(|s| s.as_str())
71-
.or_else(|| config_renames.and_then(|r| r.get(field_name)).map(|s| s.as_str()))
72-
.unwrap_or(field_name);
73-
if is_valid_rust_ident_chars(wire_name) {
74-
rust_raw_ident(wire_name)
75-
} else {
76-
// Wire name contains characters that are not valid in a Rust identifier (e.g. hyphens,
77-
// slashes in serde renames like "self-harm" or "self-harm/intent"). Fall back to the
78-
// Rust field name, which is guaranteed to be a valid identifier.
79-
field_name.to_string()
67+
let TypeRef::Named(ref type_name) = field.ty else {
68+
return false;
69+
};
70+
// A nested struct with its own `Default`.
71+
if api.types.iter().any(|t| t.name == *type_name && t.has_default) {
72+
return true;
8073
}
74+
// A data enum (tagged union — it lives in `api.enums`, not `api.types`) carried with a serde
75+
// default. Such a field is None-able in the public surface, so its `#[new]` param is `Option<T>`
76+
// (None falls back to the core default via `unwrap_or_else`). This lets the converter pass the
77+
// coerced value or None directly rather than a conditional `**{...}` keyword-spread that no
78+
// type checker can verify.
79+
field.default.as_deref() == Some("/* serde(default) */")
80+
&& api
81+
.enums
82+
.iter()
83+
.any(|e| e.name == *type_name && crate::codegen::generators::enum_has_data_variants(e))
8184
}
8285

8386
// Check if this type has an options-field bridge (e.g., ParseOptions.visitor).

0 commit comments

Comments
 (0)