Skip to content

PoC: Integrate with Apache Arrow C++ #112

@xmnlab

Description

@xmnlab

This is an initial idea from gpt, please feel free to use your own approach (maybe this idea is wrong):


Since we don’t want any extra native compilation and we’re okay starting with literals, the cleanest “no external build” path is to lower Arrow values using the Arrow C Data Interface—i.e., model a scalar as an ArrowArray of length 1 (plus a schema later, if/when you need it). We can construct that entirely in LLVM IR with llvmlite (no Arrow C++ linking or shims).

Below is a new module llvmlite_arrow.py that:

  • Adds an Arrow-aware visitor LLVMLiteArrowIRVisitor (subclasses your existing visitor).

  • Defines the ArrowArray struct layout (C Data Interface) as an identified LLVM struct.

  • Implements visit(LiteralInt32) by materializing a length-1 Arrow array on the stack in the function’s entry block:

    • length = 1, null_count = 0, offset = 0
    • n_buffers = 2 (validity bitmap + data buffer)
    • buffers[0] → 1-byte bitmap with bit 0 set (value is valid)
    • buffers[1] → 4-byte i32 value
    • n_children = 0, children = null, dictionary = null, release = null, private_data = null
  • Returns a %ArrowArray* (pointer to the stack-allocated ArrowArray). This is perfect for immediate, intra-function use. (If/when you need the value to outlive the function, we can switch to malloc + a release function—all still emitted as IR, no external build.)

You can keep your original LLVMLiteIR untouched; here we provide a parallel Arrow-flavored builder LLVMLiteArrowIR wired to the new visitor.


irx/builders/llvmlite_arrow.py

"""LLVM-IR builder with Arrow C Data Interface (experimental).

This backend lowers literals to Arrow-compatible shapes using ONLY emitted
LLVM IR (no external C/C++ shims). For now, we model a scalar as an ArrowArray
of length 1 (C Data Interface).

- LiteralInt32 -> ArrowArray(length=1, int32, 2 buffers: validity + values)

Notes:
- Objects are materialized in the current function's entry block using `alloca`.
  They are valid until the function returns (stack lifetime). If you later need
  heap allocation + release, we can add that (also via IR, still no external build).
"""

from __future__ import annotations

from typing import Optional, Any, Callable

import astx
from llvmlite import binding as llvm
from llvmlite import ir
from plum import dispatch
from public import public

# Reuse your base builder & visitor.
from irx.builders.base import Builder
from irx.builders.llvmliteir import LLVMLiteIRVisitor


# -----------------------------
# Arrow-aware IR Visitor
# -----------------------------
class LLVMLiteArrowIRVisitor(LLVMLiteIRVisitor):
    """IR visitor that lowers literals to Arrow C Data Interface objects."""

    # Identified struct type for ArrowArray
    _arrow_array_ty: ir.IdentifiedStructType

    def __init__(self) -> None:
        super().__init__()
        self._init_arrow_types()

    # C Data Interface: ArrowArray
    #   struct ArrowArray {
    #     int64_t length;
    #     int64_t null_count;
    #     int64_t offset;
    #     int64_t n_buffers;
    #     int64_t n_children;
    #     const void** buffers;
    #     struct ArrowArray** children;
    #     void* dictionary;                // we model as i8*
    #     void (*release)(struct ArrowArray*);
    #     void* private_data;
    #   };
    def _init_arrow_types(self) -> None:
        ctx = ir.global_context
        self._arrow_array_ty = ctx.get_identified_type("struct.ArrowArray")

        i64 = ir.IntType(64)
        i8p = ir.IntType(8).as_pointer()

        # We need a self-pointer type for fields.
        arr_ptr = self._arrow_array_ty.as_pointer()
        # Function pointer type: void (*release)(ArrowArray*)
        release_fn_ty = ir.FunctionType(ir.VoidType(), [arr_ptr]).as_pointer()

        # buffers: i8** (const void**)
        buffers_ptr_ty = i8p.as_pointer()
        # children: ArrowArray**  (we won't use it yet; set to null)
        children_ptr_ty = arr_ptr.as_pointer()

        self._arrow_array_ty.set_body(
            i64,              # length
            i64,              # null_count
            i64,              # offset
            i64,              # n_buffers
            i64,              # n_children
            buffers_ptr_ty,   # buffers
            children_ptr_ty,  # children
            i8p,              # dictionary (opaque)
            release_fn_ty,    # release
            i8p,              # private_data
        )

    def _entry_alloca(self, ty: ir.Type, name: str) -> ir.Instruction:
        """Allocate in the function entry block (mem2reg-friendly)."""
        ib = self._llvm.ir_builder
        cur = ib.block
        ib.position_at_start(ib.function.entry_basic_block)
        slot = ib.alloca(ty, name=name)
        ib.position_at_end(cur)
        return slot

    @dispatch  # type: ignore[no-redef]
    def visit(self, node: astx.LiteralInt32) -> None:
        """
        Lower LiteralInt32 to an ArrowArray(length=1) representing an int32 scalar.

        Layout (C Data Interface):
          - length      = 1
          - null_count  = 0
          - offset      = 0
          - n_buffers   = 2 (validity bitmap, values)
          - n_children  = 0
          - buffers[0]  = &validity_byte (i8*, bit 0 set to 1)
          - buffers[1]  = &value_i32     (i8* to 4-byte i32 storage)
          - children    = null
          - dictionary  = null
          - release     = null (stack lifetime only)
          - private_data= null
        """
        ib = self._llvm.ir_builder
        i8  = ir.IntType(8)
        i8p = i8.as_pointer()
        i32 = self._llvm.INT32_TYPE
        i64 = ir.IntType(64)

        # Allocate ArrowArray in entry block.
        arr_ptr = self._entry_alloca(self._arrow_array_ty, name="arrow.i32.scalar")

        # Allocate buffers array [2 x i8*] in entry block.
        buffers_arr_ty = ir.ArrayType(i8p, 2)
        buffers_slot = self._entry_alloca(buffers_arr_ty, name="arrow.buffers")

        # Allocate and initialize validity byte (bitmap) on stack: bit 0 = 1 (valid)
        valid_slot = self._entry_alloca(i8, name="arrow.valid")
        ib.store(ir.Constant(i8, 1), valid_slot)  # 0000_0001

        # Allocate and initialize 4-byte value on stack
        value_slot = self._entry_alloca(i32, name="arrow.i32.value")
        ib.store(ir.Constant(i32, node.value), value_slot)

        # Compute i8* pointers for buffers[0] and buffers[1]
        valid_i8p = ib.bitcast(valid_slot, i8p, name="valid_i8p")
        value_i8p = ib.bitcast(value_slot, i8p, name="value_i8p")

        # Fill buffers array
        i32_ty = ir.IntType(32)
        buf0_ptr = ib.gep(buffers_slot, [ir.Constant(i32_ty, 0), ir.Constant(i32_ty, 0)], inbounds=True)
        buf1_ptr = ib.gep(buffers_slot, [ir.Constant(i32_ty, 0), ir.Constant(i32_ty, 1)], inbounds=True)
        ib.store(valid_i8p, buf0_ptr)
        ib.store(value_i8p, buf1_ptr)

        # Pointer-to-first element: i8**  (const void**)
        buffers_i8pp = ib.gep(buffers_slot, [ir.Constant(i32_ty, 0), ir.Constant(i32_ty, 0)], inbounds=True)

        # Set ArrowArray fields
        # GEP helpers for fields [0..9]
        def fld(idx: int):
            return ib.gep(arr_ptr, [ir.Constant(i32_ty, 0), ir.Constant(i32_ty, idx)], inbounds=True)

        ib.store(ir.Constant(i64, 1),  fld(0))  # length
        ib.store(ir.Constant(i64, 0),  fld(1))  # null_count
        ib.store(ir.Constant(i64, 0),  fld(2))  # offset
        ib.store(ir.Constant(i64, 2),  fld(3))  # n_buffers
        ib.store(ir.Constant(i64, 0),  fld(4))  # n_children
        ib.store(buffers_i8pp,         fld(5))  # buffers
        # children = null
        children_ty = self._arrow_array_ty.as_pointer().as_pointer()
        ib.store(ir.Constant(children_ty, None), fld(6))
        # dictionary = null
        ib.store(ir.Constant(i8p, None), fld(7))
        # release = null (stack lifetime; do not export)
        rel_fn_ptr_ty = ir.FunctionType(ir.VoidType(), [self._arrow_array_ty.as_pointer()]).as_pointer()
        ib.store(ir.Constant(rel_fn_ptr_ty, None), fld(8))
        # private_data = null
        ib.store(ir.Constant(i8p, None), fld(9))

        # Result: %ArrowArray* (stack-allocated)
        self.result_stack.append(arr_ptr)


# -----------------------------
# Arrow-aware Builder
# -----------------------------
@public
class LLVMLiteArrowIR(Builder):
    """LLVM-IR transpiler that uses LLVMLiteArrowIRVisitor."""

    def __init__(self) -> None:
        super().__init__()
        self.translator: LLVMLiteArrowIRVisitor = LLVMLiteArrowIRVisitor()
        self.output_file: Optional[str] = None
        self.tmp_path: Optional[str] = None

    def translate(self, node: astx.AST) -> str:
        return self.translator.translate(node)

    def build(self, node: astx.AST, output_file: str) -> None:
        """Transpile ASTx to LLVM-IR and build an executable via clang (no extra libs)."""
        # Fresh visitor per build (mirrors your LLVMLiteIR)
        self.translator = LLVMLiteArrowIRVisitor()
        ir_text = self.translator.translate(node)

        mod = llvm.parse_assembly(ir_text)
        obj = self.translator.target_machine.emit_object(mod)

        import os, tempfile
        with tempfile.NamedTemporaryFile(suffix="", delete=False) as temp_file:
            self.tmp_path = temp_file.name
        obj_path = f"{self.tmp_path}.o"
        with open(obj_path, "wb") as f:
            f.write(obj)

        self.output_file = output_file

        # Link only with libc/clang (no Arrow libs needed; we used the C Data Interface layout)
        from xh import clang  # keep parity with your existing builder
        clang(obj_path, "-o", self.output_file)

        import os
        os.chmod(self.output_file, 0o755)

    def run(self) -> None:
        import sh
        if not self.output_file:
            raise RuntimeError("No built output to run.")
        sh([self.output_file])

Why this approach?

  • No external compilation: we emit everything with llvmlite; we only link with clang (as you already do).
  • Arrow compatibility: the Arrow C Data Interface is the lingua franca. Modeling a scalar as a 1-element array is valid and easy to extend to lists, structs, etc. Later, we can add the ArrowSchema side (also a plain C struct) and heap allocation + release if you need cross-function lifetimes or FFI export.
  • Pragmatic lifetime: stack allocation keeps the first version simple and fast. If a value needs to escape, we can switch to a heap version with an emitted release function—still inside IR, still no external build.

If you like this, I can next add:

  • a heap-allocated variant with an internal release (also IR-emitted),
  • LiteralInt16 / LiteralInt64,
  • LiteralList[Int32] → ArrowArray,
  • an ArrowSchema builder for the literals.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions