Ruby-idiomatic conversion from Sorbet types to BAML (Boundary AI Markup Language) for efficient LLM prompting.
This gem provides a clean, Ruby-idiomatic API to convert your Sorbet type definitions (T::Struct, T::Enum) into BAML's concise format. BAML uses approximately 60% fewer tokens than JSON Schema while maintaining complete type information, making your LLM interactions more efficient and cost-effective.
When working with LLMs, token efficiency directly impacts:
- Cost: Fewer tokens = lower API costs
- Performance: Smaller prompts = faster responses
- Context: More room for actual content vs. type definitions
BAML provides the perfect balance: concise, readable, and LLM-friendly.
# Complex LLM workflow types for autonomous research
class ComplexityLevel < T::Enum
enums do
# Basic analysis requiring straightforward research
Basic = new('basic')
# Advanced analysis requiring deep domain expertise
Advanced = new('advanced')
end
end
class TaskDecomposition < T::Struct
# The main research topic being investigated
const :research_topic, String
# Target complexity level for the decomposition
const :complexity_level, ComplexityLevel
# Autonomously generated list of research subtasks
const :subtasks, T::Array[String]
# Strategic priority rankings for each subtask
const :priority_order, T::Array[Integer]
end
# Ruby-idiomatic conversion with dependencies
TaskDecomposition.to_baml
# =>
# enum ComplexityLevel {
# "basic" @description("Basic analysis requiring straightforward research")
# "advanced" @description("Advanced analysis requiring deep domain expertise")
# }
#
# class TaskDecomposition {
# research_topic string @description("The main research topic being investigated")
# complexity_level ComplexityLevel @description("Target complexity level for the decomposition")
# subtasks string[] @description("Autonomously generated list of research subtasks")
# priority_order int[] @description("Strategic priority rankings for each subtask")
# }Add to your Gemfile:
gem 'sorbet-baml'Or install directly:
gem install sorbet-bamlrequire 'sorbet-baml'
# π― Ruby-idiomatic API for complex LLM workflows
class ConfidenceLevel < T::Enum
enums do
# Low confidence, requires further verification
Low = new('low')
# High confidence, strongly supported by multiple sources
High = new('high')
end
end
class ResearchFindings < T::Struct
# Detailed findings and analysis results
const :findings, String
# Key actionable insights extracted
const :key_insights, T::Array[String]
# Assessment of evidence quality and reliability
const :evidence_quality, ConfidenceLevel
# Confidence score for the findings (1-10 scale)
const :confidence_score, Integer
end
class ResearchSynthesis < T::Struct
# High-level executive summary of all findings
const :executive_summary, String
# Primary conclusions drawn from the research
const :key_conclusions, T::Array[String]
# Collection of research findings
const :findings_collection, T::Array[ResearchFindings]
end
# Convert with smart defaults (dependencies + descriptions included!)
ResearchSynthesis.to_baml
# π Smart defaults include dependencies and descriptions automatically
# =>
# enum ConfidenceLevel {
# "low" @description("Low confidence, requires further verification")
# "high" @description("High confidence, strongly supported by multiple sources")
# }
#
# class ResearchFindings {
# findings string @description("Detailed findings and analysis results")
# key_insights string[] @description("Key actionable insights extracted")
# evidence_quality ConfidenceLevel @description("Assessment of evidence quality and reliability")
# confidence_score int @description("Confidence score for the findings (1-10 scale)")
# }
#
# class ResearchSynthesis {
# executive_summary string @description("High-level executive summary of all findings")
# key_conclusions string[] @description("Primary conclusions drawn from the research")
# findings_collection ResearchFindings[] @description("Collection of research findings")
# }
# π― Disable features if needed
ResearchSynthesis.to_baml(include_descriptions: false)
ResearchSynthesis.to_baml(include_dependencies: false)
# π Customize formatting (smart defaults still apply)
ResearchSynthesis.to_baml(indent_size: 4)
# Legacy API (no smart defaults, for backwards compatibility)
SorbetBaml.from_struct(ResearchSynthesis)
SorbetBaml.from_structs([ResearchSynthesis, ResearchFindings])Add crucial context to your BAML types by documenting fields - essential for autonomous agents and complex workflows. The gem supports two ways to provide field descriptions:
class ResearchTask < T::Struct
# Clear description of the research objective
const :objective, String
# Strategic priority ranking (1-5 scale)
const :priority, Integer
endclass ResearchTask < T::Struct
const :objective, String, description: "Clear description of the research objective"
const :priority, Integer, description: "Strategic priority ranking (1-5 scale)"
endBoth methods produce identical BAML output with @description annotations. The description: parameter provides a more explicit, Sorbet-native way to document fields.
class TaskType < T::Enum
enums do
# Literature review and information gathering
Research = new('research')
# Combining multiple sources into coherent insights
Synthesis = new('synthesis')
# Evaluating options or making recommendations
Evaluation = new('evaluation')
end
end
class ResearchSubtask < T::Struct
# Clear description of the specific research objective
const :objective, String
# Type of research task to be performed
const :task_type, TaskType
# Strategic priority ranking for task sequencing (1-5 scale)
const :priority, Integer
# Estimated effort required in hours
const :estimated_hours, Integer
# Suggested agent capabilities needed for optimal execution
const :required_capabilities, T::Array[String]
end
# Generate BAML (descriptions included by default!)
ResearchSubtask.to_baml
# =>
# enum TaskType {
# "research" @description("Literature review and information gathering")
# "synthesis" @description("Combining multiple sources into coherent insights")
# "evaluation" @description("Evaluating options or making recommendations")
# }
#
# class ResearchSubtask {
# objective string @description("Clear description of the specific research objective")
# task_type TaskType @description("Type of research task to be performed")
# priority int @description("Strategic priority ranking for task sequencing (1-5 scale)")
# estimated_hours int @description("Estimated effort required in hours")
# required_capabilities string[] @description("Suggested agent capabilities needed for optimal execution")
# }Why descriptions matter: LLMs use field descriptions to understand context and generate more accurate, meaningful data. This is crucial for complex domains where field names alone aren't sufficient.
Generate BAML tool specifications for agentic workflows, function calling, and structured LLM interactions:
class ReplyTool < T::Struct
# The response message to send back to the user
const :response, String
end
class SearchTool < T::Struct
# Using description: parameter (both methods work)
const :query, String, description: "The search query to execute"
const :limit, T.nilable(Integer), description: "Maximum number of results to return"
end
# Generate BAML tool definitions
ReplyTool.to_baml_tool
# =>
# class ReplyTool {
# response string @description("The response message to send back to the user")
# }
SearchTool.to_baml_tool
# =>
# class SearchTool {
# query string @description("The search query to execute")
# limit int? @description("Maximum number of results to return")
# }
# Module API also available
SorbetBaml.from_tool(ReplyTool)When dspy.rb is available, automatically convert DSPy tools with rich metadata:
class CalculatorTool < DSPy::Tools::Base
extend T::Sig
tool_name 'calculator'
tool_description 'Performs basic arithmetic operations'
sig { params(operation: String, num1: Float, num2: Float).returns(T.any(Float, String)) }
def call(operation:, num1:, num2:)
case operation.downcase
when 'add' then num1 + num2
when 'subtract' then num1 - num2
when 'multiply' then num1 * num2
when 'divide'
return "Error: Cannot divide by zero" if num2 == 0
num1 / num2
else
"Error: Unknown operation '#{operation}'. Use add, subtract, multiply, or divide"
end
end
end
# Automatic extraction of tool metadata and parameter types
CalculatorTool.to_baml
# =>
# // Performs basic arithmetic operations
# class calculator {
# operation string @description("Parameter operation")
# num1 float @description("Parameter num1")
# num2 float @description("Parameter num2")
# }
# Optional parameters handled correctly
class SearchTool < DSPy::Tools::Base
extend T::Sig
tool_name 'search'
tool_description 'Search for information'
sig { params(query: String, limit: T.nilable(Integer)).returns(T::Array[String]) }
def call(query:, limit: nil)
# Implementation...
end
end
SearchTool.to_baml
# =>
# // Search for information
# class search {
# query string @description("Parameter query")
# limit int? @description("Parameter limit (optional)")
# }
# Module API also available
SorbetBaml.from_dspy_tool(CalculatorTool)Tool Features:
- β T::Struct tools: Convert any struct to BAML tool definition
- β DSPy integration: Automatic extraction from DSPy::Tools::Base classes
- β Parameter types: Full Sorbet type support (string, int, float, arrays, maps, etc.)
- β
Optional parameters: Automatically detect and mark with
? - β
Descriptions: Extract from comments or
description:parameter (T::Struct) or automatic generation (DSPy) - β Tool metadata: Names, descriptions, and parameter documentation
- β
Ruby-idiomatic:
.to_baml_tool()and.to_baml()methods
Basic Types
StringβstringIntegerβintFloatβfloatT::BooleanβboolSymbolβstringDate/DateTime/Timeβstring
Complex Types
T.nilable(T)βT?(optional types)T::Array[T]βT[](arrays)T::Hash[K,V]βmap<K,V>(hash maps)T.any(T1, T2)βT1 | T2(union types)T.nilable(T.any(T1, T2))β(T1 | T2)?(optional unions)T::Array[T.any(T1, T2)]β(T1 | T2)[](union arrays)
Structured Types
T::Structβclass Name { ... }(classes with fields)T::Enumβenum Name { "value1" "value2" }(enums)- Nested structs with proper reference handling
- Automatic dependency resolution with topological sorting
- Ruby-idiomatic API: Every T::Struct and T::Enum gets
.to_bamlmethod - Tool definitions: Generate BAML tool specs for function calling and agentic workflows
- DSPy integration: Automatic tool conversion from DSPy::Tools::Base classes
- Smart defaults: Field descriptions and dependencies included automatically
- Field descriptions: Extracts from comments or
description:parameter for LLM context - Dependency management: Automatically includes all referenced types
- Proper ordering: Dependencies are sorted topologically (no forward references needed)
- Circular reference handling: Won't get stuck in infinite loops
- Customizable formatting: Control indentation and other output options
- Type-safe: Full Sorbet type checking throughout
| Sorbet Type | BAML Output | Example |
|---|---|---|
String |
string |
name string |
Integer |
int |
age int |
Float |
float |
price float |
T::Boolean |
bool |
active bool |
T.nilable(String) |
string? |
email string? |
T::Array[String] |
string[] |
tags string[] |
T::Hash[String, Integer] |
map<string, int> |
counts map<string, int> |
T.any(String, Integer) |
string | int |
value string | int |
T.nilable(T.any(String, Integer)) |
(string | int)? |
optional_value (string | int)? |
T::Array[T.any(String, Integer)] |
(string | int)[] |
mixed_array (string | int)[] |
MyStruct |
MyStruct |
user MyStruct |
MyEnum |
MyEnum |
status MyEnum |
This gem has reached feature completeness for core BAML conversion needs. The Ruby-idiomatic API is stable and thoroughly tested with 80 test cases covering all type combinations, tool definitions, and edge cases.
- β Comprehensive Test Coverage - All features thoroughly tested
- β Full Sorbet Type Safety - Zero type errors throughout codebase
- β 80 Test Cases - Covering basic types, complex combinations, tool definitions, and edge cases
- β TDD Development - All features built test-first
- β Field Descriptions - Automatic comment extraction for LLM context
- β Tool Definitions - BAML tool specifications for function calling and agentic workflows
- β DSPy Integration - Automatic tool conversion from DSPy::Tools::Base classes
- β Smart Defaults - Dependencies and descriptions included by default
- β Zero Breaking Changes - Maintains backward compatibility
- β
Ruby-idiomatic API: Every T::Struct and T::Enum gets
.to_bamlmethod - β Tool definitions: Generate BAML tool specifications from T::Struct classes
- β DSPy integration: Automatic tool conversion from DSPy::Tools::Base classes
- β Smart defaults: Field descriptions and dependencies included automatically
- β
Field descriptions: Extract documentation from comments or
description:parameter for LLM context - β Dependency management: Automatically includes all referenced types
- β Proper ordering: Dependencies are sorted topologically
- β Type safety: Full Sorbet type checking throughout
- DSPy-independent tool API: Tools shouldn't require DSPy, just follow the same API pattern
- Type aliases:
T.type_alias { String }βtype Alias = string - Custom naming: Convert between snake_case β camelCase
- CLI tool:
sorbet-baml convert MyStructcommand - Validation: Verify generated BAML syntax
- Self-referential types:
Employeewithmanager: T.nilable(Employee)
- v0.0.1 - Initial implementation with basic type support
- v0.1.0 - Complete type system + Ruby-idiomatic API + field descriptions + smart defaults
- v0.2.0 - Description parameter support and enhanced field extraction
- v0.3.0 - Tool type definitions + DSPy integration + 80+ test cases + comprehensive documentation
Perfect for agentic workflows, deep research systems, and complex LLM applications:
# Define your autonomous research workflow types
class TaskDecomposition < T::Struct
# Your complex research schema...
end
# Generate BAML for LLM agents
prompt = <<~PROMPT
You are an autonomous research agent. Analyze this topic and decompose it into strategic subtasks.
Schema for your output:
#{TaskDecomposition.to_baml}
Topic: "Impact of AI on healthcare delivery systems"
Provide a comprehensive task decomposition in JSON format.
PROMPT
# Use with OpenAI, Anthropic, or any LLM provider
response = llm_client.chat(prompt)
result = TaskDecomposition.from_json(response.content)With OpenAI structured outputs:
User.to_baml(include_dependencies: true)
# Use the generated BAML in your function calling schemasWith prompt engineering:
# More efficient than JSON Schema
schema = User.to_baml(include_dependencies: true)
prompt = "Generate data matching: #{schema}"With documentation generation:
# Auto-generate API docs
api_types = [User, Order, Product].map(&:to_baml).join("\n\n")Here's a real-world comparison using a complex agentic workflow from production DSPy.rb usage:
# Real autonomous research workflow from production DSPy.rb usage
class ComplexityLevel < T::Enum
enums do
# Basic analysis requiring straightforward research
Basic = new('basic')
# Intermediate analysis requiring synthesis of multiple sources
Intermediate = new('intermediate')
# Advanced analysis requiring deep domain expertise and complex reasoning
Advanced = new('advanced')
end
end
class TaskDecomposition < T::Struct
# The main research topic being investigated
const :research_topic, String
# Additional context or constraints for the research
const :context, String
# Target complexity level for the decomposition
const :complexity_level, ComplexityLevel
# Autonomously generated list of research subtasks
const :subtasks, T::Array[String]
# Type classification for each task (analysis, synthesis, investigation, etc.)
const :task_types, T::Array[String]
# Strategic priority rankings (1-5 scale) for each subtask
const :priority_order, T::Array[Integer]
# Effort estimates in hours for each subtask
const :estimated_effort, T::Array[Integer]
# Task dependency relationships for optimal sequencing
const :dependencies, T::Array[String]
# Suggested agent types/skills needed for each task
const :agent_requirements, T::Array[String]
end
class ResearchExecution < T::Struct
# The specific research subtask to execute
const :subtask, String
# Accumulated context from previous research steps
const :context, String
# Any specific constraints or focus areas for this research
const :constraints, String
# Detailed research findings and analysis
const :findings, String
# Key actionable insights extracted from the research
const :key_insights, T::Array[String]
# Confidence in findings quality (1-10 scale)
const :confidence_level, Integer
# Assessment of evidence quality and reliability
const :evidence_quality, String
# Recommended next steps based on these findings
const :next_steps, T::Array[String]
# Identified gaps in knowledge or areas needing further research
const :knowledge_gaps, T::Array[String]
end[ComplexityLevel, TaskDecomposition, ResearchExecution].map(&:to_baml).join("\n\n")enum ComplexityLevel {
"basic" @description("Basic analysis requiring straightforward research")
"intermediate" @description("Intermediate analysis requiring synthesis of multiple sources")
"advanced" @description("Advanced analysis requiring deep domain expertise and complex reasoning")
}
class TaskDecomposition {
research_topic string @description("The main research topic being investigated")
context string @description("Additional context or constraints for the research")
complexity_level ComplexityLevel @description("Target complexity level for the decomposition")
subtasks string[] @description("Autonomously generated list of research subtasks")
task_types string[] @description("Type classification for each task (analysis, synthesis, investigation, etc.)")
priority_order int[] @description("Strategic priority rankings (1-5 scale) for each subtask")
estimated_effort int[] @description("Effort estimates in hours for each subtask")
dependencies string[] @description("Task dependency relationships for optimal sequencing")
agent_requirements string[] @description("Suggested agent types/skills needed for each task")
}
class ResearchExecution {
subtask string @description("The specific research subtask to execute")
context string @description("Accumulated context from previous research steps")
constraints string @description("Any specific constraints or focus areas for this research")
findings string @description("Detailed research findings and analysis")
key_insights string[] @description("Key actionable insights extracted from the research")
confidence_level int @description("Confidence in findings quality (1-10 scale)")
evidence_quality string @description("Assessment of evidence quality and reliability")
next_steps string[] @description("Recommended next steps based on these findings")
knowledge_gaps string[] @description("Identified gaps in knowledge or areas needing further research")
}
BAML Token Count: ~320 tokens
{
"ComplexityLevel": {
"type": "string",
"enum": ["basic", "intermediate", "advanced"],
"description": "Complexity level enumeration"
},
"TaskDecomposition": {
"type": "object",
"properties": {
"topic": {"type": "string"},
"context": {"type": "string"},
"complexity_level": {"$ref": "#/definitions/ComplexityLevel"},
"subtasks": {
"type": "array",
"items": {"type": "string"}
},
"task_types": {
"type": "array",
"items": {"type": "string"}
},
"priority_order": {
"type": "array",
"items": {"type": "integer"}
},
"estimated_effort": {
"type": "array",
"items": {"type": "integer"}
},
"dependencies": {
"type": "array",
"items": {"type": "string"}
},
"agent_requirements": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["topic", "context", "complexity_level", "subtasks", "task_types", "priority_order", "estimated_effort", "dependencies", "agent_requirements"],
"additionalProperties": false
},
"ResearchExecution": {
"type": "object",
"properties": {
"subtask": {"type": "string"},
"context": {"type": "string"},
"constraints": {"type": "string"},
"findings": {"type": "string"},
"key_insights": {
"type": "array",
"items": {"type": "string"}
},
"confidence_level": {"type": "integer"},
"evidence_quality": {"type": "string"},
"next_steps": {
"type": "array",
"items": {"type": "string"}
},
"knowledge_gaps": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["subtask", "context", "constraints", "findings", "key_insights", "confidence_level", "evidence_quality", "next_steps", "knowledge_gaps"],
"additionalProperties": false
}
}JSON Schema Token Count: ~680 tokens
| Format | Tokens | Reduction |
|---|---|---|
| JSON Schema | ~680 | baseline |
| BAML | ~320 | π₯ 53% fewer |
Without descriptions:
| Format | Tokens | Reduction |
|---|---|---|
| JSON Schema | ~450 | baseline |
| BAML | ~180 | π₯ 60% fewer |
Real Impact:
- Cost Savings: 53-60% reduction in prompt tokens = significant LLM API cost savings
- Performance: Smaller prompts = faster LLM response times
- Context Efficiency: More room for actual content vs. type definitions
- LLM Understanding: Descriptions provide crucial context for autonomous agents
- Readability: BAML is human-readable and maintainable
This example represents actual agentic workflows from production DSPy.rb applications using complex nested types, enums, and arrays - exactly the scenarios where token efficiency and LLM understanding matter most.
This project was inspired by sorbet-schema which provides excellent Sorbet type introspection capabilities. While sorbet-schema focuses on serialization/deserialization, sorbet-baml focuses on generating efficient type definitions for LLM consumption.
Bug reports and pull requests are welcome at https://github.com/vicentereig/sorbet-baml
MIT License. See LICENSE.txt for details.