baml_language(feat): implement generic parameter declarations by miguelcsx · Pull Request #3196 · BoundaryML/baml

miguelcsx · 2026-03-01T18:18:35Z

We already supports call-site generic typing for a few built-in functions (e.g., baml.fetch_as<Todo>(...) and baml.deep_copy<T>(x)), and the type checker actively resolves those substitutions.

However, BAML lacked support for declaration-site generics on user-defined items. If a user tried to declare function Foo<T>(...), class Box<T>, or enum Option<T>, the parser failed or ignored it. In the HIR layer, queries like class_generic_params were merely stubbed out, hard-coded to return empty parameter lists with a TODO: Extract generic parameters from CST when BAML adds generic syntax.

This PR bridges that gap by adding the ability to parse generic syntax in declarations and correctly lower those definitions into the HIR.

TODOs

Address generic parameters in TIR

Summary by CodeRabbit

New Features
- Generic-parameter lists are exposed and queryable on functions, classes, enums, type aliases, and methods.
- New public type metadata for generic parameters is available.
Improvements
- CST-driven extraction of generics with improved handling for duplicate names and hash collisions.
- Collision-aware local ID allocation to make local item identification more robust.
Tests
- Added end-to-end tests for generic extraction, duplicate-name resolution, and collision scenarios.

vercel · 2026-03-01T18:18:41Z

@miguelcsx is attempting to deploy a commit to the Boundary Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-03-01T18:19:01Z

📝 Walkthrough

Walkthrough

Adds end-to-end generic-parameter support: parser recognizes <...> generic lists, syntax/AST expose GENERIC_PARAM_LIST and GenericParamList, HIR extracts generic params from the CST via new scanners/helpers, and a collision-aware LocalIdAllocator is introduced for local item ID allocation.

Changes

Cohort / File(s)	Summary
Parser & CST/AST `baml_language/crates/baml_compiler_parser/src/parser.rs`, `baml_language/crates/baml_compiler_syntax/src/syntax_kind.rs`, `baml_language/crates/baml_compiler_syntax/src/ast.rs`	Adds `parse_generic_param_list` invocation in declaration parsers; introduces `GENERIC_PARAM_LIST` syntax kind and `GenericParamList` AST node with `params()` and `generic_param_list()` accessors on `FunctionDef`, `ClassDef`, `EnumDef`, `TypeAliasDef`. Tests for parsing and recovery added.
HIR: generics extraction `baml_language/crates/baml_compiler_hir/src/generics.rs`, `baml_language/crates/baml_compiler_hir/src/lib.rs`	Implements CST-driven extraction functions (`*_generic_params_from_cst`) to derive `GenericParams`/`TypeParam` from CST/AST; public HIR query entrypoints now delegate to these CST-based implementations; adds helpers to traverse CST and match LocalId allocation.
Local ID allocation `baml_language/crates/baml_compiler_hir/src/ids.rs`, `baml_language/crates/baml_compiler_hir/src/item_tree.rs`	Adds `LocalIdAllocator` with collision-aware allocation and `allocate_local_id` helper; replaces inline id-generation in ItemTree to use allocator and shared allocation logic.
Tests & utilities `baml_language/crates/baml_tests/src/incremental/scenarios.rs`	Adds name-based lookup helpers, collision finder, and extensive tests covering generic-parameter extraction for functions, methods, classes, enums, type aliases, duplicate generic names, and hash-collision scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Parser
    participant CST
    participant AST
    participant HIR_Query
    participant Extractor
    participant Allocator

    Parser->>CST: parse source (detect `<`)
    CST->>AST: attach GENERIC_PARAM_LIST node
    HIR_Query->>Extractor: request generic params for item
    Extractor->>AST: call generic_param_list() / iterate params()
    Extractor->>Allocator: alloc_id(kind, name)
    Allocator-->>Extractor: LocalItemId
    Extractor->>Extractor: build Arc<GenericParams>
    Extractor-->>HIR_Query: return Arc<GenericParams>

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.61% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'baml_language(feat): implement generic parameter declarations' accurately and specifically summarizes the main change: adding support for generic parameter declarations in BAML user-defined items.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

hellovai · 2026-03-01T18:26:45Z

@miguelcsx if you've got time, i'd love to hop on a call with you! been enjoying the PRs you're building out and i wonder if we can better support you. Either tag me on discord or want to grab a time on my calendar?

vbv@boundaryml.com

if the timeslot is open, you can book it! (I have open read access).

coderabbitai

Actionable comments posted: 6

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac50196 and d102c72.

📒 Files selected for processing (7)

baml_language/crates/baml_compiler_hir/src/generics.rs
baml_language/crates/baml_compiler_hir/src/ids.rs
baml_language/crates/baml_compiler_hir/src/lib.rs
baml_language/crates/baml_compiler_parser/src/parser.rs
baml_language/crates/baml_compiler_syntax/src/ast.rs
baml_language/crates/baml_compiler_syntax/src/syntax_kind.rs
baml_language/crates/baml_tests/src/incremental/scenarios.rs

baml_language/crates/baml_compiler_hir/src/generics.rs

coderabbitai · 2026-03-01T18:27:05Z

baml_language/crates/baml_compiler_hir/src/generics.rs

+pub(crate) fn function_generic_params_from_cst(
+    db: &dyn Db,
+    func: FunctionId<'_>,
+) -> Arc<GenericParams> {
+    let file = func.file(db);
+    let target_id = func.id(db).as_u32();
+
+    let result = with_source_file(db, file, |source_file| {
+        let mut allocator = LocalIdAllocator::new();
+        for item in source_file.items() {
+            match item {
+                ast::Item::Function(func_node) => {
+                    if let Some(params) =
+                        scan_function_item_for_generics(&mut allocator, &func_node, target_id)
+                    {
+                        return params;
+                    }
+                }
+                ast::Item::Class(class_node) => {
+                    if let Some(params) =
+                        scan_class_methods_for_generics(&mut allocator, &class_node, target_id)
+                    {
+                        return params;
+                    }
+                }
+                _ => {}
+            }
+        }
+        empty_generic_params()
+    });
+
+    match result {
+        Some(params) => params,
+        None => empty_generic_params(),
+    }
+}
+
+pub(crate) fn class_generic_params_from_cst(db: &dyn Db, class: ClassId<'_>) -> Arc<GenericParams> {
+    let file = class.file(db);
+    let target_id = class.id(db).as_u32();
+
+    let result = with_source_file(db, file, |source_file| {
+        let mut allocator = LocalIdAllocator::new();
+        find_top_level_generic_params(
+            &source_file,
+            &mut allocator,
+            target_id,
+            ItemKind::Class,
+            |item| {
+                if let ast::Item::Class(class_node) = item {
+                    class_node.name().map(|name_token| {
+                        (
+                            Name::new(name_token.text()),
+                            class_node.generic_param_list(),
+                        )
+                    })
+                } else {
+                    None
+                }
+            },
+        )
+    });
+
+    if let Some(Some(params)) = result {
+        params
+    } else {
+        empty_generic_params()
+    }
+}
+
+pub(crate) fn enum_generic_params_from_cst(
+    db: &dyn Db,
+    enum_def: EnumId<'_>,
+) -> Arc<GenericParams> {
+    let file = enum_def.file(db);
+    let target_id = enum_def.id(db).as_u32();
+
+    let result = with_source_file(db, file, |source_file| {
+        let mut allocator = LocalIdAllocator::new();
+        find_top_level_generic_params(
+            &source_file,
+            &mut allocator,
+            target_id,
+            ItemKind::Enum,
+            |item| {
+                if let ast::Item::Enum(enum_node) = item {
+                    enum_node.name().map(|name_token| {
+                        (Name::new(name_token.text()), enum_node.generic_param_list())
+                    })
+                } else {
+                    None
+                }
+            },
+        )
+    });
+
+    if let Some(Some(params)) = result {
+        params
+    } else {
+        empty_generic_params()
+    }
+}
+
+pub(crate) fn type_alias_generic_params_from_cst(
+    db: &dyn Db,
+    alias: TypeAliasId<'_>,
+) -> Arc<GenericParams> {
+    let file = alias.file(db);
+    let target_id = alias.id(db).as_u32();
+
+    let result = with_source_file(db, file, |source_file| {
+        let mut allocator = LocalIdAllocator::new();
+        find_top_level_generic_params(
+            &source_file,
+            &mut allocator,
+            target_id,
+            ItemKind::TypeAlias,
+            |item| {
+                if let ast::Item::TypeAlias(alias_node) = item {
+                    alias_node.name().map(|name_token| {
+                        (
+                            Name::new(name_token.text()),
+                            alias_node.generic_param_list(),
+                        )
+                    })
+                } else {
+                    None
+                }
+            },
+        )
+    });
+
+    if let Some(Some(params)) = result {
+        params
+    } else {
+        empty_generic_params()
+    }
+}


🧹 Nitpick | 🔵 Trivial

Add focused unit tests for CST generic-param extraction and share cargo test --lib results.

Please add unit tests in this crate for:

function ID alignment with compiler-generated <client>.resolve,

class-method scanning with malformed/unnamed methods,

LLM synthetic function variants inheriting the base generic list.

Please also include local cargo test --lib output for this change set.

As per coding guidelines, **/*.rs: Prefer writing Rust unit tests over integration tests where possible, and always run cargo test --lib if Rust code changes.

baml_language/crates/baml_compiler_hir/src/generics.rs

baml_language/crates/baml_compiler_hir/src/ids.rs

baml_language/crates/baml_compiler_parser/src/parser.rs

miguelcsx · 2026-03-04T06:45:03Z

Hey @hellovai! Thanks so much for reaching out.

I saw your calendar link. Let me check my schedule and pick a time that works.

I'll send you a quick message with some options. Looking forward to chatting!

- add support for generic parameter declarations on functions, classes, enums, and type aliases - Extend the parser to recognize `<...>` after declaration keywords and introduce AST-level `GenericParamList`

- fix CST generic param replay for class methods and client resolve

coderabbitai

Actionable comments posted: 5

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8e09698b-32cc-486d-b29c-f5cd3e5c4649

📥 Commits

Reviewing files that changed from the base of the PR and between d102c72 and 97e30fb.

📒 Files selected for processing (8)

baml_language/crates/baml_compiler_hir/src/generics.rs
baml_language/crates/baml_compiler_hir/src/ids.rs
baml_language/crates/baml_compiler_hir/src/item_tree.rs
baml_language/crates/baml_compiler_hir/src/lib.rs
baml_language/crates/baml_compiler_parser/src/parser.rs
baml_language/crates/baml_compiler_syntax/src/ast.rs
baml_language/crates/baml_compiler_syntax/src/syntax_kind.rs
baml_language/crates/baml_tests/src/incremental/scenarios.rs

baml_language/crates/baml_compiler_hir/src/generics.rs

baml_language/crates/baml_compiler_hir/src/ids.rs

baml_language/crates/baml_compiler_parser/src/parser.rs

baml_language/crates/baml_tests/src/incremental/scenarios.rs

- simple refactor to use helper functions

coderabbitai

♻️ Duplicate comments (1)

baml_language/crates/baml_compiler_hir/src/generics.rs (1)

292-296: ⚠️ Potential issue | 🟠 Major

Keep enum/type-alias ID allocation rules exactly aligned with lowering.

Line [293] and Line [324] currently drop nameless items, and the enum path at Line [292] does not filter bodyless recovery nodes. That can desynchronize LocalIdAllocator ordering and return generic params for the wrong EnumId/TypeAliasId.

🔧 Suggested fix

         find_top_level_generic_params(
             &source_file,
             &mut allocator,
             target_id,
             ItemKind::Enum,
             |item| {
                 if let ast::Item::Enum(enum_node) = item {
-                    item_name_and_list(
-                        enum_node.name().map(|token| Name::new(token.text())),
-                        enum_node.generic_param_list(),
-                    )
+                    if !enum_node.has_body() {
+                        return None;
+                    }
+                    let name = enum_node
+                        .name()
+                        .map(|token| Name::new(token.text()))
+                        .unwrap_or_else(|| Name::new("UnnamedEnum"));
+                    Some((name, enum_node.generic_param_list()))
                 } else {
                     None
                 }
             },
         )
@@
         find_top_level_generic_params(
             &source_file,
             &mut allocator,
             target_id,
             ItemKind::TypeAlias,
             |item| {
                 if let ast::Item::TypeAlias(alias_node) = item {
-                    item_name_and_list(
-                        alias_node.name().map(|token| Name::new(token.text())),
-                        alias_node.generic_param_list(),
-                    )
+                    let name = alias_node
+                        .name()
+                        .map(|token| Name::new(token.text()))
+                        .unwrap_or_else(|| Name::new("UnnamedTypeAlias"));
+                    Some((name, alias_node.generic_param_list()))
                 } else {
                     None
                 }
             },
         )

#!/bin/bash
set -euo pipefail

# Verify lowering and CST scanning use identical enum/type-alias allocation rules.
rg -n -C4 'ItemKind::Enum|ItemKind::TypeAlias|UnnamedEnum|UnnamedTypeAlias|has_body\(' \
  baml_language/crates/baml_compiler_hir/src/lib.rs \
  baml_language/crates/baml_compiler_hir/src/generics.rs

Also applies to: 322-326

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 538ab30f-24e7-42b3-b717-914e22d72d89

📥 Commits

Reviewing files that changed from the base of the PR and between 97e30fb and 2557934.

📒 Files selected for processing (1)

baml_language/crates/baml_compiler_hir/src/generics.rs

- minimal cleanup

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

baml_language/crates/baml_compiler_parser/src/parser.rs (1)

2107-2110: ⚠️ Potential issue | 🟠 Major

Declaration generic parsing can still leak pending_greaters across contexts.

When expect_greater() consumes >>, declaration parsing may leave pending_greaters > 0, which later interferes with unrelated type parsing. This is a parser-state corruption path for malformed declarations.

♻️ Suggested guard/flush after declaration generic close

             p.expect_greater();
+            if p.pending_greaters > 0 {
+                if let Some(span) = p.pending_greater_span {
+                    p.error(
+                        format!(
+                            "Unmatched '>' in generic parameter list (found {} extra)",
+                            p.pending_greaters
+                        ),
+                        span,
+                    );
+                }
+                for _ in 0..p.pending_greaters {
+                    p.events.push(Event::Token {
+                        kind: SyntaxKind::GREATER,
+                        text: ">".to_string(),
+                    });
+                }
+                p.pending_greaters = 0;
+                p.pending_greater_span = None;
+            }

Also applies to: 2136-2137

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 03e13c59-fa7f-4fae-bd92-4d4d07452605

📥 Commits

Reviewing files that changed from the base of the PR and between 2557934 and ede07d6.

📒 Files selected for processing (4)

baml_language/crates/baml_compiler_hir/src/ids.rs
baml_language/crates/baml_compiler_hir/src/lib.rs
baml_language/crates/baml_compiler_parser/src/parser.rs
baml_language/crates/baml_tests/src/incremental/scenarios.rs

coderabbitai · 2026-03-05T18:47:04Z

baml_language/crates/baml_compiler_hir/src/ids.rs

+pub(crate) fn allocate_local_id<T>(
+    next_index: &mut FxHashMap<(ItemKind, u16), u16>,
+    kind: ItemKind,
+    name: &Name,
+) -> LocalItemId<T> {
+    let hash = hash_name(name);
+    let index = next_index.entry((kind, hash)).or_insert(0);
+    let id = LocalItemId::new(hash, *index);
+    // Saturating add prevents silent wraparound in release builds.
+    // Reaching u16::MAX collisions for a single (kind, hash) bucket is
+    // practically impossible (requires 65 535 items with the same 16-bit
+    // name hash), so saturation is a safe sentinel rather than a recoverable
+    // error path.
+    debug_assert!(
+        *index < u16::MAX,
+        "LocalItemId collision index saturated for {name:?}"
+    );
+    *index = index.saturating_add(1);
+    id
+}
+
+/// Allocator for `LocalItemId`s with collision handling.
+///
+/// Replays the same hashing and collision-indexing logic as `ItemTree`.
+/// This allows queries to reproduce the same local IDs when scanning CST.
+pub struct LocalIdAllocator {
+    next_index: FxHashMap<(ItemKind, u16), u16>,
+}
+
+impl LocalIdAllocator {
+    /// Create a new allocator with empty collision state.
+    pub fn new() -> Self {
+        Self {
+            next_index: FxHashMap::default(),
+        }
+    }
+
+    /// Allocate a `LocalItemId` for a named item, updating collision state.
+    pub fn alloc_id<T>(&mut self, kind: ItemKind, name: &Name) -> LocalItemId<T> {
+        allocate_local_id(&mut self.next_index, kind, name)
+    }
+}
+
+impl Default for LocalIdAllocator {
+    fn default() -> Self {
+        Self::new()
+    }
+}


🧹 Nitpick | 🔵 Trivial

Add focused unit tests for allocator collision progression/overflow semantics.

allocate_local_id and LocalIdAllocator introduce critical ID-stability logic; this should be covered with unit tests in this crate (normal increment, hash collision bucket separation by ItemKind, overflow behavior).

As per coding guidelines: "Prefer writing Rust unit tests over integration tests where possible".

coderabbitai · 2026-03-05T18:47:04Z

baml_language/crates/baml_compiler_hir/src/ids.rs

+    // Saturating add prevents silent wraparound in release builds.
+    // Reaching u16::MAX collisions for a single (kind, hash) bucket is
+    // practically impossible (requires 65 535 items with the same 16-bit
+    // name hash), so saturation is a safe sentinel rather than a recoverable
+    // error path.
+    debug_assert!(
+        *index < u16::MAX,
+        "LocalItemId collision index saturated for {name:?}"
+    );
+    *index = index.saturating_add(1);
+    id


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n baml_language/crates/baml_compiler_hir/src/ids.rs | sed -n '190,220p'

Repository: BoundaryML/baml

Length of output: 1325

🏁 Script executed:

rg -A 20 "fn allocate_local_id" baml_language/crates/baml_compiler_hir/src/ids.rs

Repository: BoundaryML/baml

Length of output: 818

🏁 Script executed:

rg -B 5 -A 15 "struct LocalIdAllocator" baml_language/crates/baml_compiler_hir/src/ids.rs

Repository: BoundaryML/baml

Length of output: 765

🏁 Script executed:

rg "mod test|#\[test\]|#\[cfg.*test" baml_language/crates/baml_compiler_hir/src/ids.rs

Repository: BoundaryML/baml

Length of output: 41

🏁 Script executed:

find baml_language/crates/baml_compiler_hir -type f -name "*.rs" | xargs grep -l "LocalIdAllocator\|allocate_local_id" | head -10

Repository: BoundaryML/baml

Length of output: 219

🏁 Script executed:

rg -B 5 -A 10 "struct LocalItemId|impl LocalItemId" baml_language/crates/baml_compiler_hir/src/ids.rs | head -40

Repository: BoundaryML/baml

Length of output: 653

🏁 Script executed:

fd -e rs . baml_language/crates/baml_compiler_hir/tests/ 2>/dev/null | head -5

Repository: BoundaryML/baml

Length of output: 41

🏁 Script executed:

rg "test.*allocat|allocat.*test" baml_language/crates/baml_compiler_hir/ -i

Repository: BoundaryML/baml

Length of output: 41

🏁 Script executed:

rg -A 5 "LocalIdAllocator" baml_language/crates/baml_compiler_hir/src/item_tree.rs | head -20

Repository: BoundaryML/baml

Length of output: 41

🏁 Script executed:

rg "practically impossible|safe sentinel|u16::MAX" baml_language/crates/baml_compiler_hir/src/ids.rs

Repository: BoundaryML/baml

Length of output: 309

Missing unit test coverage for LocalIdAllocator and underdocumented saturation behavior.

The saturating_add on line 212 will cause ID aliasing when a single (kind, hash) bucket exceeds 65,535 collisions. While the comment acknowledges this is "practically impossible," the code provides only a debug-mode assertion and no test coverage. Per coding guidelines, Rust unit tests should be added for allocator edge cases.

Test coverage needed

Add unit tests for:

Allocator behavior at and beyond collision saturation

Multiple items with forced hash collisions

Verify debug_assert triggers (or add test for saturation behavior in release mode)

The proposed fix using checked_add with expect is valid but changes the design intent (panic vs. silent saturation); either approach should be explicitly tested and documented.

coderabbitai bot requested changes Mar 1, 2026

View reviewed changes

miguelcsx added 2 commits March 5, 2026 11:54

baml_language(feat): implement generic parameter declarations

96a5355

- add support for generic parameter declarations on functions, classes, enums, and type aliases - Extend the parser to recognize `<...>` after declaration keywords and introduce AST-level `GenericParamList`

baml_language(fix): align generic param replay + ID allocation

97e30fb

- fix CST generic param replay for class methods and client resolve

miguelcsx force-pushed the feat/generic-parameters branch from d102c72 to 97e30fb Compare March 5, 2026 17:07

coderabbitai bot requested changes Mar 5, 2026

View reviewed changes

baml_language(refactor): clean repeated patterns on generics hir

2557934

- simple refactor to use helper functions

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

baml_language(refactor): address hash colission limits

ede07d6

- minimal cleanup

coderabbitai bot requested changes Mar 5, 2026

View reviewed changes

Conversation

miguelcsx commented Mar 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODOs

Summary by CodeRabbit

Uh oh!

vercel bot commented Mar 1, 2026

Uh oh!

coderabbitai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

hellovai commented Mar 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

miguelcsx commented Mar 4, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

miguelcsx commented Mar 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 1, 2026 •

edited

Loading