baml_language(feat): implement generic parameter declarations#3196
baml_language(feat): implement generic parameter declarations#3196miguelcsx wants to merge 4 commits intoBoundaryML:canaryfrom
Conversation
|
@miguelcsx is attempting to deploy a commit to the Boundary Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughAdds end-to-end generic-parameter support: parser recognizes Changes
Sequence Diagram(s)sequenceDiagram
participant Parser
participant CST
participant AST
participant HIR_Query
participant Extractor
participant Allocator
Parser->>CST: parse source (detect `<`)
CST->>AST: attach GENERIC_PARAM_LIST node
HIR_Query->>Extractor: request generic params for item
Extractor->>AST: call generic_param_list() / iterate params()
Extractor->>Allocator: alloc_id(kind, name)
Allocator-->>Extractor: LocalItemId
Extractor->>Extractor: build Arc<GenericParams>
Extractor-->>HIR_Query: return Arc<GenericParams>
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@miguelcsx if you've got time, i'd love to hop on a call with you! been enjoying the PRs you're building out and i wonder if we can better support you. Either tag me on discord or want to grab a time on my calendar? if the timeslot is open, you can book it! (I have open read access). |
There was a problem hiding this comment.
Actionable comments posted: 6
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
baml_language/crates/baml_compiler_hir/src/generics.rsbaml_language/crates/baml_compiler_hir/src/ids.rsbaml_language/crates/baml_compiler_hir/src/lib.rsbaml_language/crates/baml_compiler_parser/src/parser.rsbaml_language/crates/baml_compiler_syntax/src/ast.rsbaml_language/crates/baml_compiler_syntax/src/syntax_kind.rsbaml_language/crates/baml_tests/src/incremental/scenarios.rs
| pub(crate) fn function_generic_params_from_cst( | ||
| db: &dyn Db, | ||
| func: FunctionId<'_>, | ||
| ) -> Arc<GenericParams> { | ||
| let file = func.file(db); | ||
| let target_id = func.id(db).as_u32(); | ||
|
|
||
| let result = with_source_file(db, file, |source_file| { | ||
| let mut allocator = LocalIdAllocator::new(); | ||
| for item in source_file.items() { | ||
| match item { | ||
| ast::Item::Function(func_node) => { | ||
| if let Some(params) = | ||
| scan_function_item_for_generics(&mut allocator, &func_node, target_id) | ||
| { | ||
| return params; | ||
| } | ||
| } | ||
| ast::Item::Class(class_node) => { | ||
| if let Some(params) = | ||
| scan_class_methods_for_generics(&mut allocator, &class_node, target_id) | ||
| { | ||
| return params; | ||
| } | ||
| } | ||
| _ => {} | ||
| } | ||
| } | ||
| empty_generic_params() | ||
| }); | ||
|
|
||
| match result { | ||
| Some(params) => params, | ||
| None => empty_generic_params(), | ||
| } | ||
| } | ||
|
|
||
| pub(crate) fn class_generic_params_from_cst(db: &dyn Db, class: ClassId<'_>) -> Arc<GenericParams> { | ||
| let file = class.file(db); | ||
| let target_id = class.id(db).as_u32(); | ||
|
|
||
| let result = with_source_file(db, file, |source_file| { | ||
| let mut allocator = LocalIdAllocator::new(); | ||
| find_top_level_generic_params( | ||
| &source_file, | ||
| &mut allocator, | ||
| target_id, | ||
| ItemKind::Class, | ||
| |item| { | ||
| if let ast::Item::Class(class_node) = item { | ||
| class_node.name().map(|name_token| { | ||
| ( | ||
| Name::new(name_token.text()), | ||
| class_node.generic_param_list(), | ||
| ) | ||
| }) | ||
| } else { | ||
| None | ||
| } | ||
| }, | ||
| ) | ||
| }); | ||
|
|
||
| if let Some(Some(params)) = result { | ||
| params | ||
| } else { | ||
| empty_generic_params() | ||
| } | ||
| } | ||
|
|
||
| pub(crate) fn enum_generic_params_from_cst( | ||
| db: &dyn Db, | ||
| enum_def: EnumId<'_>, | ||
| ) -> Arc<GenericParams> { | ||
| let file = enum_def.file(db); | ||
| let target_id = enum_def.id(db).as_u32(); | ||
|
|
||
| let result = with_source_file(db, file, |source_file| { | ||
| let mut allocator = LocalIdAllocator::new(); | ||
| find_top_level_generic_params( | ||
| &source_file, | ||
| &mut allocator, | ||
| target_id, | ||
| ItemKind::Enum, | ||
| |item| { | ||
| if let ast::Item::Enum(enum_node) = item { | ||
| enum_node.name().map(|name_token| { | ||
| (Name::new(name_token.text()), enum_node.generic_param_list()) | ||
| }) | ||
| } else { | ||
| None | ||
| } | ||
| }, | ||
| ) | ||
| }); | ||
|
|
||
| if let Some(Some(params)) = result { | ||
| params | ||
| } else { | ||
| empty_generic_params() | ||
| } | ||
| } | ||
|
|
||
| pub(crate) fn type_alias_generic_params_from_cst( | ||
| db: &dyn Db, | ||
| alias: TypeAliasId<'_>, | ||
| ) -> Arc<GenericParams> { | ||
| let file = alias.file(db); | ||
| let target_id = alias.id(db).as_u32(); | ||
|
|
||
| let result = with_source_file(db, file, |source_file| { | ||
| let mut allocator = LocalIdAllocator::new(); | ||
| find_top_level_generic_params( | ||
| &source_file, | ||
| &mut allocator, | ||
| target_id, | ||
| ItemKind::TypeAlias, | ||
| |item| { | ||
| if let ast::Item::TypeAlias(alias_node) = item { | ||
| alias_node.name().map(|name_token| { | ||
| ( | ||
| Name::new(name_token.text()), | ||
| alias_node.generic_param_list(), | ||
| ) | ||
| }) | ||
| } else { | ||
| None | ||
| } | ||
| }, | ||
| ) | ||
| }); | ||
|
|
||
| if let Some(Some(params)) = result { | ||
| params | ||
| } else { | ||
| empty_generic_params() | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add focused unit tests for CST generic-param extraction and share cargo test --lib results.
Please add unit tests in this crate for:
- function ID alignment with compiler-generated
<client>.resolve, - class-method scanning with malformed/unnamed methods,
- LLM synthetic function variants inheriting the base generic list.
Please also include local cargo test --lib output for this change set.
As per coding guidelines, **/*.rs: Prefer writing Rust unit tests over integration tests where possible, and always run cargo test --lib if Rust code changes.
|
Hey @hellovai! Thanks so much for reaching out. I saw your calendar link. Let me check my schedule and pick a time that works. I'll send you a quick message with some options. Looking forward to chatting! |
- add support for generic parameter declarations on functions, classes, enums, and type aliases - Extend the parser to recognize `<...>` after declaration keywords and introduce AST-level `GenericParamList`
- fix CST generic param replay for class methods and client resolve
d102c72 to
97e30fb
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8e09698b-32cc-486d-b29c-f5cd3e5c4649
📒 Files selected for processing (8)
baml_language/crates/baml_compiler_hir/src/generics.rsbaml_language/crates/baml_compiler_hir/src/ids.rsbaml_language/crates/baml_compiler_hir/src/item_tree.rsbaml_language/crates/baml_compiler_hir/src/lib.rsbaml_language/crates/baml_compiler_parser/src/parser.rsbaml_language/crates/baml_compiler_syntax/src/ast.rsbaml_language/crates/baml_compiler_syntax/src/syntax_kind.rsbaml_language/crates/baml_tests/src/incremental/scenarios.rs
- simple refactor to use helper functions
There was a problem hiding this comment.
♻️ Duplicate comments (1)
baml_language/crates/baml_compiler_hir/src/generics.rs (1)
292-296:⚠️ Potential issue | 🟠 MajorKeep enum/type-alias ID allocation rules exactly aligned with lowering.
Line [293] and Line [324] currently drop nameless items, and the enum path at Line [292] does not filter bodyless recovery nodes. That can desynchronize
LocalIdAllocatorordering and return generic params for the wrongEnumId/TypeAliasId.🔧 Suggested fix
find_top_level_generic_params( &source_file, &mut allocator, target_id, ItemKind::Enum, |item| { if let ast::Item::Enum(enum_node) = item { - item_name_and_list( - enum_node.name().map(|token| Name::new(token.text())), - enum_node.generic_param_list(), - ) + if !enum_node.has_body() { + return None; + } + let name = enum_node + .name() + .map(|token| Name::new(token.text())) + .unwrap_or_else(|| Name::new("UnnamedEnum")); + Some((name, enum_node.generic_param_list())) } else { None } }, ) @@ find_top_level_generic_params( &source_file, &mut allocator, target_id, ItemKind::TypeAlias, |item| { if let ast::Item::TypeAlias(alias_node) = item { - item_name_and_list( - alias_node.name().map(|token| Name::new(token.text())), - alias_node.generic_param_list(), - ) + let name = alias_node + .name() + .map(|token| Name::new(token.text())) + .unwrap_or_else(|| Name::new("UnnamedTypeAlias")); + Some((name, alias_node.generic_param_list())) } else { None } }, )#!/bin/bash set -euo pipefail # Verify lowering and CST scanning use identical enum/type-alias allocation rules. rg -n -C4 'ItemKind::Enum|ItemKind::TypeAlias|UnnamedEnum|UnnamedTypeAlias|has_body\(' \ baml_language/crates/baml_compiler_hir/src/lib.rs \ baml_language/crates/baml_compiler_hir/src/generics.rsAlso applies to: 322-326
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 538ab30f-24e7-42b3-b717-914e22d72d89
📒 Files selected for processing (1)
baml_language/crates/baml_compiler_hir/src/generics.rs
- minimal cleanup
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
baml_language/crates/baml_compiler_parser/src/parser.rs (1)
2107-2110:⚠️ Potential issue | 🟠 MajorDeclaration generic parsing can still leak
pending_greatersacross contexts.When
expect_greater()consumes>>, declaration parsing may leavepending_greaters > 0, which later interferes with unrelated type parsing. This is a parser-state corruption path for malformed declarations.♻️ Suggested guard/flush after declaration generic close
p.expect_greater(); + if p.pending_greaters > 0 { + if let Some(span) = p.pending_greater_span { + p.error( + format!( + "Unmatched '>' in generic parameter list (found {} extra)", + p.pending_greaters + ), + span, + ); + } + for _ in 0..p.pending_greaters { + p.events.push(Event::Token { + kind: SyntaxKind::GREATER, + text: ">".to_string(), + }); + } + p.pending_greaters = 0; + p.pending_greater_span = None; + }Also applies to: 2136-2137
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 03e13c59-fa7f-4fae-bd92-4d4d07452605
📒 Files selected for processing (4)
baml_language/crates/baml_compiler_hir/src/ids.rsbaml_language/crates/baml_compiler_hir/src/lib.rsbaml_language/crates/baml_compiler_parser/src/parser.rsbaml_language/crates/baml_tests/src/incremental/scenarios.rs
| pub(crate) fn allocate_local_id<T>( | ||
| next_index: &mut FxHashMap<(ItemKind, u16), u16>, | ||
| kind: ItemKind, | ||
| name: &Name, | ||
| ) -> LocalItemId<T> { | ||
| let hash = hash_name(name); | ||
| let index = next_index.entry((kind, hash)).or_insert(0); | ||
| let id = LocalItemId::new(hash, *index); | ||
| // Saturating add prevents silent wraparound in release builds. | ||
| // Reaching u16::MAX collisions for a single (kind, hash) bucket is | ||
| // practically impossible (requires 65 535 items with the same 16-bit | ||
| // name hash), so saturation is a safe sentinel rather than a recoverable | ||
| // error path. | ||
| debug_assert!( | ||
| *index < u16::MAX, | ||
| "LocalItemId collision index saturated for {name:?}" | ||
| ); | ||
| *index = index.saturating_add(1); | ||
| id | ||
| } | ||
|
|
||
| /// Allocator for `LocalItemId`s with collision handling. | ||
| /// | ||
| /// Replays the same hashing and collision-indexing logic as `ItemTree`. | ||
| /// This allows queries to reproduce the same local IDs when scanning CST. | ||
| pub struct LocalIdAllocator { | ||
| next_index: FxHashMap<(ItemKind, u16), u16>, | ||
| } | ||
|
|
||
| impl LocalIdAllocator { | ||
| /// Create a new allocator with empty collision state. | ||
| pub fn new() -> Self { | ||
| Self { | ||
| next_index: FxHashMap::default(), | ||
| } | ||
| } | ||
|
|
||
| /// Allocate a `LocalItemId` for a named item, updating collision state. | ||
| pub fn alloc_id<T>(&mut self, kind: ItemKind, name: &Name) -> LocalItemId<T> { | ||
| allocate_local_id(&mut self.next_index, kind, name) | ||
| } | ||
| } | ||
|
|
||
| impl Default for LocalIdAllocator { | ||
| fn default() -> Self { | ||
| Self::new() | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add focused unit tests for allocator collision progression/overflow semantics.
allocate_local_id and LocalIdAllocator introduce critical ID-stability logic; this should be covered with unit tests in this crate (normal increment, hash collision bucket separation by ItemKind, overflow behavior).
As per coding guidelines: "Prefer writing Rust unit tests over integration tests where possible".
| // Saturating add prevents silent wraparound in release builds. | ||
| // Reaching u16::MAX collisions for a single (kind, hash) bucket is | ||
| // practically impossible (requires 65 535 items with the same 16-bit | ||
| // name hash), so saturation is a safe sentinel rather than a recoverable | ||
| // error path. | ||
| debug_assert!( | ||
| *index < u16::MAX, | ||
| "LocalItemId collision index saturated for {name:?}" | ||
| ); | ||
| *index = index.saturating_add(1); | ||
| id |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n baml_language/crates/baml_compiler_hir/src/ids.rs | sed -n '190,220p'Repository: BoundaryML/baml
Length of output: 1325
🏁 Script executed:
rg -A 20 "fn allocate_local_id" baml_language/crates/baml_compiler_hir/src/ids.rsRepository: BoundaryML/baml
Length of output: 818
🏁 Script executed:
rg -B 5 -A 15 "struct LocalIdAllocator" baml_language/crates/baml_compiler_hir/src/ids.rsRepository: BoundaryML/baml
Length of output: 765
🏁 Script executed:
rg "mod test|#\[test\]|#\[cfg.*test" baml_language/crates/baml_compiler_hir/src/ids.rsRepository: BoundaryML/baml
Length of output: 41
🏁 Script executed:
find baml_language/crates/baml_compiler_hir -type f -name "*.rs" | xargs grep -l "LocalIdAllocator\|allocate_local_id" | head -10Repository: BoundaryML/baml
Length of output: 219
🏁 Script executed:
rg -B 5 -A 10 "struct LocalItemId|impl LocalItemId" baml_language/crates/baml_compiler_hir/src/ids.rs | head -40Repository: BoundaryML/baml
Length of output: 653
🏁 Script executed:
fd -e rs . baml_language/crates/baml_compiler_hir/tests/ 2>/dev/null | head -5Repository: BoundaryML/baml
Length of output: 41
🏁 Script executed:
rg "test.*allocat|allocat.*test" baml_language/crates/baml_compiler_hir/ -iRepository: BoundaryML/baml
Length of output: 41
🏁 Script executed:
rg -A 5 "LocalIdAllocator" baml_language/crates/baml_compiler_hir/src/item_tree.rs | head -20Repository: BoundaryML/baml
Length of output: 41
🏁 Script executed:
rg "practically impossible|safe sentinel|u16::MAX" baml_language/crates/baml_compiler_hir/src/ids.rsRepository: BoundaryML/baml
Length of output: 309
Missing unit test coverage for LocalIdAllocator and underdocumented saturation behavior.
The saturating_add on line 212 will cause ID aliasing when a single (kind, hash) bucket exceeds 65,535 collisions. While the comment acknowledges this is "practically impossible," the code provides only a debug-mode assertion and no test coverage. Per coding guidelines, Rust unit tests should be added for allocator edge cases.
Test coverage needed
Add unit tests for:
- Allocator behavior at and beyond collision saturation
- Multiple items with forced hash collisions
- Verify debug_assert triggers (or add test for saturation behavior in release mode)
The proposed fix using checked_add with expect is valid but changes the design intent (panic vs. silent saturation); either approach should be explicitly tested and documented.
We already supports call-site generic typing for a few built-in functions (e.g.,
baml.fetch_as<Todo>(...)andbaml.deep_copy<T>(x)), and the type checker actively resolves those substitutions.However, BAML lacked support for declaration-site generics on user-defined items. If a user tried to declare
function Foo<T>(...),class Box<T>, orenum Option<T>, the parser failed or ignored it. In the HIR layer, queries likeclass_generic_paramswere merely stubbed out, hard-coded to return empty parameter lists with aTODO: Extract generic parameters from CST when BAML adds generic syntax.This PR bridges that gap by adding the ability to parse generic syntax in declarations and correctly lower those definitions into the HIR.
TODOs
Summary by CodeRabbit
New Features
Improvements
Tests