Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions agent_notes/00_baseline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Baseline Info

## Environment
- **rustc**: 1.88.0 (6b00bc388 2025-06-23) (Homebrew)
- **cargo**: 1.88.0 (Homebrew)
- **OS**: macOS Darwin 22.6.0 x86_64

## Repository
- **Repo**: spaceandtimefdn/sxt-proof-of-sql
- **Branch**: fix/nullable-columns-183
- **Issue**: #183 - Add nullable column support

## Test Commands (from README)
- CPU-only (no GPU): `cargo test --no-default-features --features="arrow cpu-perf"`
- With Blitzar CPU backend: `export BLITZAR_BACKEND=cpu && cargo test --all-features --all-targets`
- Example run: `cargo run --example hello_world --no-default-features --features="rayon test"`

## Notes
- Project requires lld and clang on Linux
- GPU acceleration via NVIDIA recommended but CPU workaround available
- Must use `--no-default-features` for non-GPU machines
76 changes: 76 additions & 0 deletions agent_notes/01_map.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Codebase Map for Nullable Column Support

## Key Files and Their Roles

### Type System
- `crates/proof-of-sql/src/base/database/column_type.rs` - `ColumnType` enum defines all supported types (Boolean, TinyInt, SmallInt, Int, BigInt, Int128, Decimal75, VarChar, TimestampTZ, Scalar, VarBinary)
- `crates/proof-of-sql/src/base/database/column_type_operation.rs` - Type coercion and operation result type inference
- `crates/proof-of-sql/src/base/database/literal_value.rs` - Literal values for SQL constants

### Column Storage
- `crates/proof-of-sql/src/base/database/column.rs` - `Column<'a, S>` - read-only view of column data (borrowed)
- `crates/proof-of-sql/src/base/database/owned_column.rs` - `OwnedColumn<S>` - owned column data with Vec storage
- `crates/proof-of-sql/src/base/database/owned_table.rs` - `OwnedTable<S>` - collection of named columns

### Column Operations
- `crates/proof-of-sql/src/base/database/column_arithmetic_operation.rs` - Add, subtract, multiply, divide
- `crates/proof-of-sql/src/base/database/column_comparison_operation.rs` - Equality, inequality
- `crates/proof-of-sql/src/base/database/owned_column_operation.rs` - Operations on OwnedColumn

### Arrow Conversions
- `crates/proof-of-sql/src/base/arrow/arrow_array_to_column_conversion.rs` - Arrow → Column (currently rejects nulls!)
- `crates/proof-of-sql/src/base/arrow/owned_and_arrow_conversions.rs` - OwnedColumn ↔ Arrow

### Commitment/Proof System
- `crates/proof-of-sql/src/base/commitment/committable_column.rs` - Column data in "committable form"
- `crates/proof-of-sql/src/base/commitment/column_commitments.rs` - Commitments for columns
- `crates/proof-of-sql/src/base/commitment/table_commitment.rs` - Table-level commitments
- `crates/proof-of-sql/src/sql/proof/query_proof.rs` - Query proof generation

### Proof Expressions
- `crates/proof-of-sql/src/sql/proof_exprs/add_expr.rs` - Add expression with proof
- `crates/proof-of-sql/src/sql/proof_exprs/subtract_expr.rs` - Subtract expression
- `crates/proof-of-sql/src/sql/proof_exprs/multiply_expr.rs` - Multiply expression
- `crates/proof-of-sql/src/sql/proof_exprs/equals_expr.rs` - Equality expression
- `crates/proof-of-sql/src/sql/proof_exprs/inequality_expr.rs` - Inequality expressions

## Current Null Handling

Currently in `arrow_array_to_column_conversion.rs:112-113`:
```rust
if self.null_count() != 0 {
return Err(ArrowArrayToColumnConversionError::ArrayContainsNulls);
}
```
Nulls are explicitly rejected!

## Implementation Plan

### Phase 1: Type System
Add nullability tracking:
- Option 1: Add `nullable: bool` flag to ColumnType (changes enum size, affects many places)
- Option 2: Create wrapper `NullableColumnType { base: ColumnType, nullable: bool }`
- Option 3: Create nullable variants in OwnedColumn/Column with validity mask

**Decision**: Use Option 3 - add validity mask to column storage, least invasive to existing type system.

### Phase 2: Column Storage
Add validity mask support:
- `OwnedColumn` variants get `Option<Vec<bool>>` validity
- `Column` variants get `Option<&'a [bool]>` validity
- Helper methods: `is_valid(index)`, `validity_mask()`, `with_validity(mask)`

### Phase 3: Arrow Conversion
- Accept nullable arrays
- Extract validity bitmap
- Enforce canonical null values (0 for numeric, empty for strings)

### Phase 4: Operations
- Null propagation: `NULL op X = NULL`
- Combine validity masks: `and` for binary ops
- WHERE treats NULL as false

### Phase 5: Proof Integration
- Commit validity mask
- Add constraints: `!valid[i] => value[i] == 0`
- Prove null propagation correctly
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ impl From<&ColumnType> for DataType {
ColumnType::TinyInt => DataType::Int8,
ColumnType::SmallInt => DataType::Int16,
ColumnType::Int => DataType::Int32,
ColumnType::BigInt => DataType::Int64,
ColumnType::BigInt | ColumnType::NullableBigInt => DataType::Int64,
ColumnType::Int128 => DataType::Decimal128(38, 0),
ColumnType::Decimal75(precision, scale) => {
DataType::Decimal256(precision.value(), *scale)
Expand Down Expand Up @@ -74,10 +74,11 @@ impl TryFrom<DataType> for ColumnType {
/// Convert [`ColumnField`] values to arrow Field
impl From<&ColumnField> for Field {
fn from(column_field: &ColumnField) -> Self {
let is_nullable = matches!(column_field.data_type(), ColumnType::NullableBigInt);
Field::new(
column_field.name().value.as_str(),
(&column_field.data_type()).into(),
false,
is_nullable,
)
}
}
Expand Down
6 changes: 6 additions & 0 deletions crates/proof-of-sql/src/base/arrow/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,9 @@ pub mod scalar_and_i256_conversions;

/// Module for handling conversions between columns and Arrow arrays.
pub mod column_arrow_conversions;

/// Module for nullable Arrow array conversions.
///
/// Provides utilities to convert Arrow arrays with null values into
/// Proof of SQL nullable column types while preserving validity information.
pub mod nullable_conversion;
243 changes: 243 additions & 0 deletions crates/proof-of-sql/src/base/arrow/nullable_conversion.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
//! Nullable Arrow array conversion utilities.
//!
//! This module provides functions to convert Arrow arrays with null values
//! into Proof of SQL nullable column types, preserving validity information.
//!
//! ## Key Features
//!
//! - Extracts validity bitmaps from Arrow arrays
//! - Enforces canonical null values (0 for numeric, empty for strings)
//! - Creates `NullableOwnedColumn` from Arrow arrays
//!
//! ## Usage
//!
//! ```ignore
//! use arrow::array::Int64Array;
//! let array = Int64Array::from(vec![Some(1), None, Some(3)]);
//! let nullable_col = nullable_bigint_from_arrow(&array)?;
//! ```

use crate::base::{
database::{NullableOwnedColumn, OwnedColumn},
scalar::Scalar,
};
use alloc::vec::Vec;
use arrow::array::{Array, Int64Array};
use snafu::Snafu;

/// Errors that can occur during nullable Arrow conversion.
#[derive(Debug, Snafu, PartialEq)]
pub enum NullableArrowConversionError {
/// The array type is not supported for nullable conversion.
#[snafu(display("unsupported array type for nullable conversion"))]
UnsupportedType,
}

/// Extracts the validity mask from an Arrow array.
///
/// Returns `None` if the array has no nulls (all valid).
/// Returns `Some(Vec<bool>)` where `true` = valid, `false` = null.
#[must_use]
pub fn extract_validity(array: &dyn Array) -> Option<Vec<bool>> {
if array.null_count() == 0 {
return None;
}

let validity: Vec<bool> = (0..array.len()).map(|i| array.is_valid(i)).collect();
Some(validity)
}

/// Converts an Arrow `Int64Array` to a `NullableOwnedColumn<BigInt>`.
///
/// - Extracts the validity bitmap
/// - Enforces canonical null values (0 for null positions)
/// - Returns a `NullableOwnedColumn` with both data and validity
///
/// # Arguments
/// * `array` - The Arrow `Int64Array` to convert
///
/// # Returns
/// A `NullableOwnedColumn` containing `BigInt` values with validity mask.
#[must_use]
pub fn nullable_bigint_from_arrow<S: Scalar>(array: &Int64Array) -> NullableOwnedColumn<S> {
let validity = extract_validity(array);

// Extract values, using 0 for null positions (will be canonicalized anyway)
let values: Vec<i64> = (0..array.len())
.map(|i| {
if array.is_valid(i) {
array.value(i)
} else {
0 // Canonical null value
}
})
.collect();

NullableOwnedColumn::new(OwnedColumn::BigInt(values), validity)
}

/// Converts an Arrow `Int64Array` slice to a `NullableOwnedColumn<BigInt>`.
///
/// # Arguments
/// * `array` - The Arrow `Int64Array` to convert
/// * `start` - Start index (inclusive)
/// * `end` - End index (exclusive)
#[must_use]
pub fn nullable_bigint_from_arrow_slice<S: Scalar>(
array: &Int64Array,
start: usize,
end: usize,
) -> NullableOwnedColumn<S> {
let len = end - start;

let validity = if array.null_count() == 0 {
None
} else {
let v: Vec<bool> = (start..end).map(|i| array.is_valid(i)).collect();
// Only return Some if there are actual nulls in the slice
if v.iter().all(|&b| b) {
None
} else {
Some(v)
}
};

let values: Vec<i64> = (start..end)
.map(|i| {
if array.is_valid(i) {
array.value(i)
} else {
0 // Canonical null value
}
})
.collect();

NullableOwnedColumn::new(OwnedColumn::BigInt(values), validity)
}

/// Checks if an Arrow array has any null values.
#[must_use]
pub fn has_nulls(array: &dyn Array) -> bool {
array.null_count() > 0
}

/// Computes the validity mask for a range of an Arrow array.
#[must_use]
pub fn validity_for_range(array: &dyn Array, start: usize, end: usize) -> Option<Vec<bool>> {
if array.null_count() == 0 {
return None;
}

let validity: Vec<bool> = (start..end).map(|i| array.is_valid(i)).collect();

// If all valid in range, return None
if validity.iter().all(|&b| b) {
None
} else {
Some(validity)
}
}

#[cfg(test)]
mod tests {
use super::*;
use crate::base::scalar::test_scalar::TestScalar;
use arrow::array::Int64Array;

#[test]
fn test_extract_validity_no_nulls() {
let array = Int64Array::from(vec![1i64, 2, 3, 4, 5]);
assert!(extract_validity(&array).is_none());
}

#[test]
fn test_extract_validity_with_nulls() {
let array = Int64Array::from(vec![Some(1i64), None, Some(3), None, Some(5)]);
let validity = extract_validity(&array).unwrap();
assert_eq!(validity, vec![true, false, true, false, true]);
}

#[test]
fn test_nullable_bigint_from_arrow_no_nulls() {
let array = Int64Array::from(vec![10i64, 20, 30]);
let nullable: NullableOwnedColumn<TestScalar> = nullable_bigint_from_arrow(&array);

assert!(!nullable.has_nulls());
assert!(!nullable.is_nullable());

if let OwnedColumn::BigInt(vals) = nullable.column() {
assert_eq!(vals, &[10, 20, 30]);
} else {
panic!("Expected BigInt");
}
}

#[test]
fn test_nullable_bigint_from_arrow_with_nulls() {
let array = Int64Array::from(vec![Some(10i64), None, Some(30), None]);
let nullable: NullableOwnedColumn<TestScalar> = nullable_bigint_from_arrow(&array);

assert!(nullable.has_nulls());
assert!(nullable.is_nullable());
assert_eq!(nullable.null_count(), 2);

// Check values - nulls should be canonical (0)
if let OwnedColumn::BigInt(vals) = nullable.column() {
assert_eq!(vals, &[10, 0, 30, 0]);
} else {
panic!("Expected BigInt");
}

// Check validity
assert_eq!(
nullable.validity(),
Some(vec![true, false, true, false].as_slice())
);
}

#[test]
fn test_nullable_bigint_from_arrow_all_nulls() {
let array = Int64Array::from(vec![None, None, None]);
let nullable: NullableOwnedColumn<TestScalar> = nullable_bigint_from_arrow(&array);

assert_eq!(nullable.null_count(), 3);

if let OwnedColumn::BigInt(vals) = nullable.column() {
assert_eq!(vals, &[0, 0, 0]); // All canonical
} else {
panic!("Expected BigInt");
}
}

#[test]
fn test_nullable_bigint_from_arrow_slice() {
let array = Int64Array::from(vec![Some(1i64), None, Some(3), None, Some(5)]);
let nullable: NullableOwnedColumn<TestScalar> =
nullable_bigint_from_arrow_slice(&array, 1, 4);

// Slice is [None, Some(3), None]
assert_eq!(nullable.len(), 3);
assert_eq!(nullable.null_count(), 2);

if let OwnedColumn::BigInt(vals) = nullable.column() {
assert_eq!(vals, &[0, 3, 0]);
} else {
panic!("Expected BigInt");
}
}

#[test]
fn test_validity_for_range_no_nulls_in_range() {
let array = Int64Array::from(vec![None, Some(2i64), Some(3), None]);
// Range [1, 3) has no nulls
let validity = validity_for_range(&array, 1, 3);
assert!(validity.is_none());
}

#[test]
fn test_validity_for_range_with_nulls() {
let array = Int64Array::from(vec![None, Some(2i64), None, Some(4)]);
let validity = validity_for_range(&array, 0, 4);
assert_eq!(validity, Some(vec![false, true, false, true]));
}
}
Loading
Loading