Skip to content

v0.30.0

Choose a tag to compare

@Goldziher Goldziher released this 29 Jun 09:23
v0.30.0
886dacb

Added

  • docs: add a template-driven docs stage for API, CLI, MCP, llms.txt, agent skills, and
    snippet validation. Repos can configure generated reference output, required local templates for
    llms.txt and grouped skill files, static Clap/rmcp source extraction, and docs-specific snippet
    checks. Alef now warns on explicit skipped docs inputs such as missing configured sources or
    unavailable snippet toolchains while avoiding noisy warnings for unset optional docs layers.

  • snippets: typecheck validation level. Ordered between compile and run, it statically
    type-checks a snippet without executing it, and for compiled languages without needing the native
    library. Each language runs its strict static checker: python -m mypy, tsc --noEmit,
    cargo check, go vet, javac -Xlint:all -Werror, dotnet build -warnaserror,
    swiftc -typecheck -warnings-as-errors, kotlinc -Werror, dart analyze --fatal-infos, and
    cc -fsyntax-only -Wall -Werror. This catches dual-representation mistakes (a config field typed
    against a flattened union alias that rejects the documented data-enum constructor) that
    py_compile and a lenient compile cannot see. A matching snippet:typecheck-only ceiling
    annotation sits alongside syntax-only and compile-only. mypy is optional: when it is not
    installed the Python snippet is reported as unavailable rather than failing.

Fixed

  • napi: give the generated streaming WORKER_POOL tokio runtime a 16 MB worker stack, so a
    deep consumer future does not overflow the default (~2 MB) worker stack and abort with SIGBUS.

  • pyo3: provision an enlarged worker-thread stack on the generated module's async runtime.
    pyo3-async-runtimes' default multi-thread runtime gives workers a small (~2 MB) stack, which a
    deep consumer future (e.g. a multi-stage OCR pipeline) overflows — aborting the whole process
    with SIGBUS. The #[pymodule] init now installs a tokio runtime with a 16 MB
    thread_stack_size before the first future_into_py.

  • pyo3: serialize dict/list values for JSON (serde_json::Value) config fields in the
    generated api.py converters. PyO3 cannot expose a settable serde_json::Value field, so the
    binding stores such fields as str, while the public dataclass and .pyi stub type them as
    dict[str, Any]. The converter forwarded the dict straight through, so the documented dict form
    raised TypeError: 'dict' object is not an instance of 'str' at runtime; it now json.dumpses a
    dict/list (passing str/None through unchanged).

  • pyo3: re-point each re-exported exception's __module__ at the public package in the
    generated exceptions.py. The classes are the native ones (create_exception! sets their
    module to the compiled _native extension), so tracebacks and repr() previously read
    _native.DownloadError instead of the public name, and the exceptions were not picklable under
    their public path. exceptions.py now reassigns __module__ for every name in __all__
    (tree-sitter-language-pack issue #147).

  • codegen: generate compiling binding→core conversions for core structs that have private
    (pub(crate)) fields. Such a struct cannot be built with struct-literal syntax from a foreign
    crate — neither by naming the private field nor by patching it with ..Default::default() — so
    the conversion now seeds the core type's Default (which fills the private fields inside the
    defining crate) and assigns only the public fields onto it. The strategy is centralized in a
    shared helper used by the pyo3/napi/wasm/extendr/rustler/magnus generator, the Dart mirror crate
    generator, and the PHP enum-tainted conversion path; when the core type has private fields but no
    Default, a compile_error! guides the author to derive Default. A new has_private_fields
    flag on struct IR records the condition during extraction.

  • php: marshal owned (by-value) native-struct callback parameters by value rather than
    dereferencing them as a borrow ((*input) does not type-check on an owned core::T), and stop
    emitting the native-object return fast-path — a PHP #[php_class] binding struct implements
    FromZvalMut (for &mut T) but not FromZval (for T), so the bridge keeps the JSON return
    path that is well-defined for PHP.

  • pyo3: marshal owned (by-value) native-struct callback parameters into the host's native
    binding object via From<core::T>, the same way borrowed ones already were. A trait method that
    takes a serde struct by value (e.g. an extraction-input envelope) previously passed the raw
    core::T across the Python boundary, which has no IntoPyObject and failed to compile.

  • pyo3: when a core register_* free function shares its name with a trait bridge's
    register_fn, emit only the bridge's duck-typed registration. The function loop no longer also
    emits the auto-wrapped core version, which collided (E0428) with the bridge definition and no
    longer type-checks against a registry that takes Arc<dyn Trait>.

  • pyo3: the generated Python package now type-checks clean under mypy. Data-enum config fields
    are annotated against their public class (so EmbeddingConfig(model=EmbeddingModelType.plugin(...))
    is accepted) instead of a flattened union alias that shadowed the class; constructors accept the
    public dataclass/dict for factory parameters; data-enum __init__ signatures match the runtime
    #[new]; Json maps to dict[str, Any]; and the duplicate clear_* registry stub is no longer
    emitted twice.

  • napi: substitute binding-excluded types (e.g. InternalDocument) with JsonValue in the
    .d.ts host-interface signatures. Referencing a type that is never emitted produced an undefined
    TypeScript name; the runtime bridge marshals such values as JSON, so JsonValue is the faithful
    stand-in and tsc --strict is clean.

  • magnus: apply the same excluded-type substitution (to json_value) in generated .rbs
    interfaces and skip re-declaring a bridge clear_* function that is already exposed as a registry
    function, so rbs validate no longer reports an undefined type or a duplicated method definition.

  • node/wasm: require Node 22 or newer in generated npm package
    manifests, and keep Python package generation on Python 3.10 or newer.

  • e2e/dart: resolve config JSON object helper types from compatible
    call overrides so generated tests use concrete helpers such as
    createExtractionConfigFromJson.

  • wasm: filter cfg-gated struct fields with the WASM backend's active feature set so
    inactive fields are omitted and active fields are generated consistently across structs,
    constructors, accessors, and conversions.

  • r: keep cfg-gated struct fields when the R backend's configured feature set enables
    them, and align R wrapper exports with the classes registered in extendr_module!.

  • scaffold: let managed .cargo/config.toml render an explicit
    rustc-wrapper, and make the R Rust crate honor curated feature sets the
    same way as WASM by disabling core default features and declaring cfg
    passthrough features without enabling them by default.

  • r: merge crate-level extra_dependencies into the generated R Rust
    crate so external DTO conversion impls can depend on sibling Rust crates
    such as crawlberg.

  • elixir: render known generated public DTO fields in struct typespecs as
    their concrete module types instead of falling back to map().

  • swift: filter host Swift bindings with the same effective cfg feature set
    as the generated Rust bridge crate, including default cfg passthrough
    features.

  • swift: wrap method-shim DTO returns for Option<&T> and Vec<T>, and
    pass &Path method parameters as borrowed paths instead of owned PathBufs.

  • pyo3/magnus/wasm: delegate generated binding defaults for defaultable
    DTOs to the core Rust Default impl so omitted nested config fields keep
    semantic core defaults.

  • extract: support root-scoped external DTO source crates so host bindings
    can expand typed config graphs from sibling crates without exposing sibling
    functions or importing sibling language packages.

  • extract: preserve explicit field type_rust_path values and reject
    same-name types from different crates, while keeping binding-excluded fields
    out of include-list expansion.

  • go/java: avoid callback return local-name collisions in generated trait
    bridges when a method parameter is named result.

  • ffi: keep cbindgen forward declarations for live binding DTOs when cfg-gated
    skipped duplicates leave older entries in Alef's excluded type-path map.

  • dart: suppress ordinary trait-bridge lifecycle wrappers so FRB only sees the generated
    {Trait}DartImpl registration surface.

  • e2e: emit typed single-call json_object inputs for Dart, Swift, and R so unified
    extract(input, config) fixtures pass their ExtractInput payload instead of defaulting it away.

  • pyo3: include Pyo3-present cfg-gated fields in generated .pyi constructor stubs so native
    signatures and type stubs agree for typed nested configs such as UrlExtractionConfig.crawl.

  • dart: normalize trailing whitespace in FRB-generated Dart files, including *.freezed.dart
    files that dart format leaves unchanged.

  • e2e: prefer configured config DTO types when rendering Dart config
    JSON objects, preventing fallback helpers such as createConfigFromJson.

  • e2e: include WASM nested DTO imports reached through json_object
    element types, such as per-input file configs nested under extract inputs.

  • elixir: JSON-encode default-typed single DTO parameters before calling
    Rustler NIFs, matching the NIF boundary used for unified extract inputs.