Skip to content

[ENH] Use tree-sitter for robust C API parsing and auto-generation #29

@tazarov

Description

@tazarov

Description

Currently, the OrtApi struct generator (tools/gen_ortapi.go) uses regex-based parsing to extract function pointers from the ONNX Runtime C header file. While this works for the current API, it has limitations:

  • Fragile: Regex patterns may break with header formatting changes
  • Limited: Only extracts function names, not full signatures
  • Manual: Requires manual pattern updates for new C API patterns

Proposal

Migrate to using tree-sitter-c for parsing the C header files. This would enable:

  1. Robust parsing: Proper C AST parsing handles all valid C syntax
  2. Full signature extraction: Parse return types, parameter types, and names
  3. Auto-generate purego bindings: Generate not just struct definitions but also Go wrapper functions
  4. Better validation: Detect breaking changes in C API structure

Potential Implementation

// Pseudo-code example
func parseWithTreeSitter(headerPath string) []FunctionDecl {
    // Use tree-sitter-c to parse the header
    parser := treesitter.NewParser()
    parser.SetLanguage(c.GetLanguage())
    tree := parser.Parse(readFile(headerPath))
    
    // Walk the AST to find OrtApi struct
    // Extract function pointer declarations with full signatures
    // Generate both struct and wrapper functions
}

Benefits

  • Type safety: Generate typed wrappers instead of manual purego.RegisterFunc calls
  • Documentation: Extract and preserve C API comments
  • Maintainability: Easier to update when ONNX Runtime evolves
  • Correctness: Eliminate regex parsing bugs

Related

Implementation Notes

This is a non-trivial enhancement that should be done incrementally:

  1. First, implement tree-sitter parsing alongside existing regex approach
  2. Validate both produce identical results
  3. Switch to tree-sitter-only generation
  4. Add auto-generation of wrapper functions (optional)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions