Commit 6cde50e
authored
fix: type-safe generated bindings — exhaustive type-check sweep across languages
* fix(pyo3): type data-enum config fields as the class, not a flattened union alias
options.py typed config fields that reference a data enum (e.g. EmbeddingConfig.model)
against a locally-synthesized flattened union alias (EmbeddingModelType = str | int |
LlmConfig) instead of the data-enum class users actually construct. The exported symbol
and the field type diverged, so the documented usage
EmbeddingConfig(model=EmbeddingModelType.plugin("lilbee")) failed mypy even though it
runs fine, forcing consumers to carry # type: ignore[arg-type].
Data enums now import from the native module at runtime (joining the existing native
enum import block) and are referenced by their class name in field annotations. Enums
with a unit (tag-only) variant additionally accept a bare string tag (<Class> | str) so
string defaults like output_format="native" still type-check; payload-only enums such as
EmbeddingModelType are class-only. The flattened union aliases and their now-unused
templates are removed.
Closes #157
* fix(pyo3): accept the public config dataclass in data-enum factory stubs
The data-enum factory staticmethods (e.g. EmbeddingModelType.llm) typed their
config-DTO param against the compiled _xberg class, but the public name users pass is
the options.py @DataClass. So EmbeddingModelType.llm(LlmConfig(...)) — the documented
#1165 usage — failed mypy ("expected xberg._xberg.LlmConfig, got xberg.options.LlmConfig")
even though the runtime coerces it.
A dataclass-backed config-DTO factory param is now typed `options.<Name> | dict[str, Any]`
— the public dataclass or a dict, matching what the runtime __alef_coerce_dto accepts.
The classifier is the existing coercible_dto_names (shared with the variant-constructor
coercion), so the typed surface and the runtime coercion stay in lockstep; non-DTO params
map exactly as before. The .pyi gains a stub-only `from . import options` (never executed,
so no runtime cycle with options.py).
With this and the preceding commit, EmbeddingConfig(model=EmbeddingModelType.llm(LlmConfig(...)))
type-checks end to end.
* refactor(pyo3): split types.rs unit tests into a sibling module
The preceding field-typing change pushed gen_bindings/types.rs past the 1,000-line cap.
Move its `mod tests` into `types/tests.rs` (mirroring `gen_stubs/enums/tests.rs`),
bringing the file back to 800 lines. Pure move — generated output is byte-identical.
* test(pyo3): update options.py data-enum test to the class-import behavior
The integration test asserted the old design (data enums emitted as a `StructureKind = str`
flattened alias, not runtime-imported). The field-typing change inverts that — data enums are
imported from the native module as their class and referenced by name. Update the assertions
(and rename the test) to the new behavior. Caught by CI's full `cargo test`, which runs the
`tests/` integration suite that `cargo test --lib` skips.
* fix(pyo3): make generated Python stubs/api type-check (stub + import fixes)
Five generator fixes that clear mypy errors in the generated Python package (59 -> 34),
none caught by e2e today (which runs pytest/ruff but not mypy):
- Data-enum stub gains the `__init__(value, **kwargs)` the runtime `#[new]` actually exposes
(gated on the same not-sanitized condition), so converters constructing `OutputFormat(value)`
type-check instead of "Too many arguments".
- Protocol stubs substitute excluded types (e.g. `InternalDocument`) with their JSON form, matching
the go/zig/gleam backends and the runtime bridge — fixes undefined-name in the `.pyi` (the same
leak also breaks the napi `.d.ts`).
- Trait-bridge `clear_*` is no longer re-declared when it is already a plain registry function
(`no-redef`).
- TypedDict-style return types are only marked "local" when `has_default` (matching the emission
loop), so a return-only class referenced in a TypedDict field (e.g. `ExtractionConfidence`) is
imported instead of left undefined.
- The dict-coercion helper uses a distinct loop variable for the data-enum pass so its classes are
not flagged as an incompatible reassignment.
* fix(pyo3): register-wrapper Protocol typing + JSON fields as dict
- The api.py `register_*` pass-through wrappers typed `backend: object`, but the native
`register_*` (and the `.pyi` stub) expect the host-implementable Protocol — so the wrapper
failed to type-check. Type `backend` against the bridge's trait Protocol when resolvable.
- `TypeRef::Json` mapped to `str` in the options module but `dict[str, Any]` in the native
module, so JSON-valued config fields (`paddle_ocr_config`, `additional`, …) disagreed with
the compiled config they convert into. Map Json to `dict[str, Any]` (default None) to match.
* fix(pyo3): share constructor param resolution + guard list data-enum coercion
- The `.pyi` `__init__` stub resolved param names via rename_fields only, while the runtime
`#[new]` (and the converter) deliberately prefer the serde-rename wire name for cross-binding
API parity — so the stub drifted (`max_characters` vs the constructor's `max_chars`). Extract
the constructor's `resolve_param_ident` to a shared function and use it for the stub too, so the
two share one source of truth and cannot diverge.
- The Vec-of-data-enum coercion built each element with `_rust.Enum(v)` unconditionally; add the
same isinstance guard the scalar path uses so an already-native element is passed through rather
than re-wrapped (which a type checker rejects, since the element is `Enum | str`).
* fix(pyo3): skip cfg-gated fields in api.py converters
A cfg-gated field is conditionally compiled out of the native `#[new]` constructor and omitted
from the `.pyi` stub (which cannot express `#[cfg]`), but the converter passed it as a keyword
argument regardless — an unknown-kwarg error against the compiled module. Filter `f.cfg.is_none()`
in the constructor field loop, mirroring the stub.
* fix(pyo3): pass sanitized data enums through in converters (no construction)
A data enum with a sanitized (unresolvable) variant field gets no serde-based `#[new]` — it is a
return-only type with no `__init__`. The converter still tried to construct it from the field
value, a "too many arguments" type error. Exclude sanitized data enums from the converter's
data-enum set so an existing native instance is passed straight through.
* fix(pyo3): never emit a TypedDict for a reexported-native type (single identity)
A type listed in `reexported_types` is re-exported as a native pyclass in the public package, so
it is native everywhere — api.py imports it native, results return it native. But options.py also
emitted a parallel `TypedDict` for it and typed fields against that, giving the type a second,
structurally-incompatible identity: a field `ocr_result: ExtractionResult` was the TypedDict while
the converter expected the native class. Thread `reexported_types` into the options-module
generator and exclude those types from TypedDict emission / local definition, so they are imported
and referenced as the one native class. This is the architectural fix for the dual-representation
class of converter type errors (not a cast/overload workaround).
* fix(pyo3): reexported-native converters return the value, not a field rebuild
Following the single-identity fix, a reexported-native type's `_to_rust_*` converter still rebuilt
the result field-by-field — calling `_to_rust_metadata`/`_to_rust_document_structure` on fields that
are already native (a type error). After the str/dict/None handling the value is already the native
class, so the converter returns it directly. Native types deserialize themselves via serde; no
Python-side field reconstruction is needed or correct for them.
* fix(pyo3): type the dict-coercion helper as a dict identity for TypedDicts
The `_coerce_dict_*` helper coerces a raw dict and returns the public type. For a TypedDict-backed
config it constructed the result with `**value` (which mypy rejects for a TypedDict) and was called
on a `TypedDict | dict` value (not assignable to its `dict[str, Any]` param). A TypedDict IS its dict
at runtime, so for the TypedDict case the helper now returns `cast("X", value)` (the correct identity,
not a `**` rebuild) and is called on a copied `dict(value)` so it never mutates the caller's mapping.
Dataclass-backed configs keep genuine `X(**value)` construction.
* fix(pyo3): Option<T> #[new] param for serde-default data-enum fields
A non-optional data-enum field carried with a serde default (e.g. ChunkingConfig.sizing) is
None-able in the public surface, but its `#[new]` param was non-optional, so the converter omitted
it via a conditional `**{...}` keyword-spread that no type checker can verify. Extend
`should_option_for_nested_default` to cover data enums (they live in api.enums, not api.types):
the param becomes `Option<T>` (None -> core default via unwrap_or_else), and the converter passes
the coerced value or None as a direct keyword argument. This is the right architecture — the public
optionality is reflected in the native constructor — not a cast or a spread workaround.
* fix(pyo3): function stubs preserve order + promote trailing optionals
The `.pyi` function-stub generator partitioned params required-before-optional and reordered them,
which (a) diverged from the runtime `#[pyo3(signature = ...)]` order and (b) dropped `| None` from a
param promoted to optional only because a preceding param had a default — e.g. `resolve(preset,
custom_schema=None, context=None)` where `context: Option<...>` was emitted as a required
`context: dict[str, str]`. Emit params in declaration order with the same `is_promoted_optional`
promotion the PyO3 binding and the api.py wrapper apply, so the stub matches the real signature.
This completes the Python surface: `mypy -p xberg` over the full generated package is clean (0 errors,
down from 66 on alef main).
* refactor(pyo3): extract gen_typeddict to a sibling module
Blast-radius review follow-up: the Python field-typing additions pushed types.rs over the 800-line
split-before-adding guideline. Move the self-contained TypedDict generator into types/typeddict.rs
(728 lines remaining). Pure move + comment tightening — generated output is byte-identical.
* fix(napi): substitute excluded types in the .d.ts host interface
The TypeScript host-implementable interface (DocumentExtractor, Renderer) typed method params and
returns directly, so an excluded type that is never emitted as a `.d.ts` declaration (e.g.
`InternalDocument`) leaked in as an undefined name — 3 `tsc --strict` errors. Substitute excluded
types with their JSON marshaling form (`JsonValue`), the same fix applied to the pyo3 protocol stubs
and mirrored from the go backend. `tsc --noEmit --strict` over the generated `index.d.ts` is now
clean (0 errors, down from 3).
* fix(magnus): dedup clear_* + substitute excluded types in the .rbs interface
`rbs validate` over the generated Ruby signatures surfaced the same two root causes already fixed in
the Python and TypeScript surfaces: (1) a trait-bridge `clear_*` re-declared when it is also a plain
registry function (RBS DuplicatedMethodDefinitionError), and (2) an excluded type (`InternalDocument`)
referenced in a host interface method but never emitted as an RBS declaration (NoTypeFoundError).
Skip bridge functions whose name is already declared, and substitute excluded types with their JSON
form (`json_value`). `rbs validate` is now clean (0 errors, down from 3).
* refactor(codegen): centralize excluded-type substitution into codegen::shared
The go, pyo3, napi, and magnus backends each carried a byte-identical
substitute_excluded fn that rewrites Named references to binding-excluded
types (e.g. InternalDocument) into TypeRef::Json for trait-bridge interface
and stub signatures. Hoist the single implementation into
codegen::shared::substitute_excluded_types and have all four backends call
it; go re-exports it through its helpers module so existing tests and the
method_with_excluded_substituted helper keep their paths.
No generated-output change: the shared body matches the removed copies, and
the full lib suite (4134 tests) including the go substitution and stub
snapshot tests passes.
* feat(snippets): add typecheck validation level
Adds a TypeCheck level to the snippet validator, ordered between Compile and
Run (the strongest static guarantee short of execution). For Python it runs
`python -m mypy` against the snippet so it is checked with the installed
package's type stubs — catching dual-representation bugs (a config field typed
against a flattened union alias rejecting the documented data-enum
constructor) that py_compile cannot see. mypy is optional: a missing module is
reported Unavailable, not a spurious failure.
Languages whose compiler already type-checks (Rust, Go, TS via tsc, the JVM
and native toolchains) map TypeCheck onto their existing compile path, so the
level is meaningful everywhere without per-language special-casing. Adds a
matching `snippet:typecheck-only` ceiling annotation alongside syntax-only and
compile-only.
* docs(changelog): record type-safety sweep and typecheck snippet level under Unreleased
* feat(snippets): strict per-language typecheck commands for compiled languages
The typecheck level now runs each compiled language's strict static checker
instead of aliasing to its lenient compile step, and none of them need the
native library:
- go: go vet ./... (type-checks + vet analyzers)
- java: javac -Xlint:all -Werror
- csharp dotnet build -warnaserror (nullable already enabled)
- swift: swiftc -typecheck -warnings-as-errors (no link)
- kotlin kotlinc -Werror
- dart: dart analyze --fatal-infos
- c: cc -fsyntax-only -Wall -Werror (no link)
Java, Kotlin, and Swift raise max_level from Compile to TypeCheck so the
strict level is actually reachable (a bare snippet cannot be executed, so the
static type-check is the deepest reliable level for them). Elixir keeps its
parse-level compile: there is no per-snippet strict checker without a mix
project.
Verified empirically: valid snippets pass and type-broken snippets fail at
typecheck across c/go/java/swift/python with precise diagnostics (e.g. go vet
'cannot use "string" as int', javac 'incompatible types', swiftc 'cannot
convert String to Int').
* docs(changelog): list strict per-language typecheck commands
* fix(kotlin): move jni_emitter tests last in the flattened module
jni_emitter.rs include!s its sub-files into one module. bridge_object.rs
(included first) ended with the `#[cfg(test)] mod tests`, so every production
item in the seven files included after it counted as appearing after a test
module — tripping clippy::items_after_test_module under -D warnings. Extract
the test module into jni_emitter/tests.rs and include! it last so the test
module is the final item. No behavior change; the test still passes.1 parent 52ad0a8 commit 6cde50e
56 files changed
Lines changed: 1070 additions & 575 deletions
File tree
- src
- backends
- go/trait_bridge
- kotlin/gen_bindings
- jni_emitter
- magnus
- napi/gen_bindings
- pyo3
- gen_bindings
- functions
- types
- gen_stubs
- enums
- templates
- converters
- codegen
- generators
- snippets
- validators
- tests/backends_pyo3_gen_bindings
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
11 | 38 | | |
12 | 39 | | |
13 | 40 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
23 | 11 | | |
24 | 12 | | |
25 | 13 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
Lines changed: 0 additions & 26 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
72 | 78 | | |
73 | 79 | | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
85 | 93 | | |
86 | 94 | | |
87 | | - | |
88 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
89 | 99 | | |
90 | 100 | | |
91 | | - | |
92 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
93 | 105 | | |
94 | 106 | | |
95 | 107 | | |
| |||
140 | 152 | | |
141 | 153 | | |
142 | 154 | | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
143 | 165 | | |
144 | 166 | | |
145 | 167 | | |
| |||
151 | 173 | | |
152 | 174 | | |
153 | 175 | | |
154 | | - | |
| 176 | + | |
155 | 177 | | |
156 | 178 | | |
157 | 179 | | |
158 | 180 | | |
159 | 181 | | |
160 | 182 | | |
161 | 183 | | |
162 | | - | |
| 184 | + | |
163 | 185 | | |
164 | 186 | | |
165 | 187 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
268 | 268 | | |
269 | 269 | | |
270 | 270 | | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
271 | 281 | | |
272 | 282 | | |
273 | 283 | | |
| |||
278 | 288 | | |
279 | 289 | | |
280 | 290 | | |
281 | | - | |
282 | | - | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
283 | 305 | | |
284 | 306 | | |
285 | 307 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
7 | 32 | | |
8 | 33 | | |
9 | 34 | | |
| |||
20 | 45 | | |
21 | 46 | | |
22 | 47 | | |
23 | | - | |
24 | | - | |
25 | 48 | | |
26 | 49 | | |
27 | 50 | | |
| |||
41 | 64 | | |
42 | 65 | | |
43 | 66 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
80 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
81 | 84 | | |
82 | 85 | | |
83 | 86 | | |
| |||
0 commit comments