|
| 1 | +# Spice.ai Rust SDK - GitHub Copilot Instructions |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +This is the official Rust SDK for Spice.ai, providing a client library for connecting to and querying Spice.ai runtime instances via Apache Arrow Flight SQL. |
| 6 | + |
| 7 | +**Architecture:** Arrow Flight SQL client built on Apache Arrow 57, tonic gRPC, and rustls TLS. |
| 8 | + |
| 9 | +**Core Principle:** Developer Experience First — Simple, type-safe, and performant API for querying Spice.ai. |
| 10 | + |
| 11 | +## Build & Test Commands |
| 12 | + |
| 13 | +```bash |
| 14 | +cargo build # Dev build |
| 15 | +cargo build --release # Release build |
| 16 | +cargo test # Run all tests |
| 17 | +cargo clippy --all-features # Lint check |
| 18 | +cargo fmt # Format code |
| 19 | +cargo fmt --check # Check formatting |
| 20 | +``` |
| 21 | + |
| 22 | +## Rust Coding Standards |
| 23 | + |
| 24 | +### Error Handling (CRITICAL) |
| 25 | + |
| 26 | +- Use SNAFU: Derive `Snafu` and `Debug` on error enums |
| 27 | +- NO `.unwrap()`/`.expect()` in library code: Use `?` operator or `match` |
| 28 | +- In tests: Use `.expect("descriptive message")` instead of `.unwrap()` |
| 29 | +- Use `ensure!` macro: Preferred over `if` + `return Err` |
| 30 | +- Define `Result` type alias: `pub type Result<T, E = Error> = std::result::Result<T, E>;` |
| 31 | +- Don't use `assert!()` macros in non-test code: Prefer proper error handling |
| 32 | + |
| 33 | +```rust |
| 34 | +// GOOD |
| 35 | +#[derive(Snafu, Debug)] |
| 36 | +pub enum Error { |
| 37 | + #[snafu(display("Failed to connect to {endpoint}: {source}"))] |
| 38 | + ConnectionFailed { endpoint: String, source: tonic::transport::Error }, |
| 39 | +} |
| 40 | +ensure!(!data.is_empty(), DataEmptySnafu); |
| 41 | +let value = option.context(ValueMissingSnafu)?; |
| 42 | + |
| 43 | +// Tests only |
| 44 | +#[cfg(test)] |
| 45 | +fn test() { let value = option.expect("descriptive message"); } |
| 46 | +``` |
| 47 | + |
| 48 | +### Logging (CRITICAL) |
| 49 | + |
| 50 | +- Use `tracing::` for logging: Use `tracing::info!`, `tracing::error!`, `tracing::debug!`, etc. |
| 51 | +- DO NOT use `log::`: The project uses `tracing` crate |
| 52 | +- DO NOT add newlines in log messages or error strings |
| 53 | + |
| 54 | +```rust |
| 55 | +// GOOD |
| 56 | +tracing::info!("Connecting to Spice.ai endpoint"); |
| 57 | +tracing::error!("Failed to execute query: {}", error); |
| 58 | + |
| 59 | +// BAD - don't use log crate |
| 60 | +log::info!("Starting runtime"); |
| 61 | +``` |
| 62 | + |
| 63 | +### Async/Blocking (CRITICAL) |
| 64 | + |
| 65 | +Rule: Async code must reach `.await` within 10-100 microseconds. |
| 66 | + |
| 67 | +Never block async runtime: |
| 68 | + |
| 69 | +- ❌ `std::thread::sleep` → ✅ `tokio::time::sleep` |
| 70 | +- ❌ `std::fs` → ✅ `tokio::fs` |
| 71 | +- ❌ Blocking operations → ✅ `tokio::task::spawn_blocking` |
| 72 | + |
| 73 | +```rust |
| 74 | +// GOOD - use spawn_blocking for sync operations |
| 75 | +let result = tokio::task::spawn_blocking(move || { |
| 76 | + // Blocking operations here |
| 77 | +}).await?; |
| 78 | + |
| 79 | +// BAD - blocking in async context |
| 80 | +async fn bad() { |
| 81 | + std::thread::sleep(Duration::from_secs(1)); // Blocks runtime! |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +### Clippy (Enforced in CI) |
| 86 | + |
| 87 | +Errors: `clippy::pedantic`, `clippy::unwrap_used`, `clippy::expect_used`, `clippy::clone_on_ref_ptr` |
| 88 | + |
| 89 | +Allowed: `clippy::module_name_repetitions`, `clippy::large_futures` |
| 90 | + |
| 91 | +## Performance & Memory (CRITICAL) |
| 92 | + |
| 93 | +### Zero-Copy Operations |
| 94 | + |
| 95 | +- Prefer zero-copy with Arrow arrays: avoid `.to_data()`, `.clone()`, conversions |
| 96 | +- Use `Arc<dyn Array>` for type-erased arrays (cheap clone, shares buffers) |
| 97 | +- Use `RecordBatch::slice()` instead of filtering/copying |
| 98 | +- Prefer `ArrayRef` in function signatures over owned arrays |
| 99 | + |
| 100 | +```rust |
| 101 | +// GOOD |
| 102 | +let subset = batch.slice(offset, length); // Shares buffers |
| 103 | +let shared: ArrayRef = Arc::clone(&array); // Just refcount++ |
| 104 | + |
| 105 | +// BAD |
| 106 | +let values: Vec<i32> = array.values().iter().copied().collect(); // Avoid |
| 107 | +``` |
| 108 | + |
| 109 | +### Stream Handling |
| 110 | + |
| 111 | +- AVOID `stream!` macro: Breaks rust-analyzer IDE hints |
| 112 | +- Keep streaming: Don't collect streams early (`RecordBatchStream`) |
| 113 | + |
| 114 | +```rust |
| 115 | +// GOOD - streaming |
| 116 | +while let Some(batch) = stream.next().await { |
| 117 | + process_batch(batch?)?; |
| 118 | +} |
| 119 | + |
| 120 | +// BAD - materializes entire dataset (OOM risk) |
| 121 | +let all_batches: Vec<RecordBatch> = stream.try_collect().await?; |
| 122 | +``` |
| 123 | + |
| 124 | +### Allocation Minimization |
| 125 | + |
| 126 | +- Reuse buffers: `String::clear()`, `Vec::clear()` to keep capacity |
| 127 | +- Prefer `&str`/`&[T]` in signatures over `String`/`Vec<T>` |
| 128 | +- Use `Cow<str>`: When ownership might be needed but often isn't |
| 129 | +- Pre-allocate: `Vec::with_capacity()`, array builders with hints |
| 130 | + |
| 131 | +### Arc/Rc Cloning |
| 132 | + |
| 133 | +- Avoid unnecessary `Arc`/`Rc` clones (caught by `clippy::clone_on_ref_ptr`) |
| 134 | +- `Arc::clone()` is cheap but not free - don't clone in hot loops |
| 135 | +- When passing `Arc<T>` to functions, prefer `&Arc<T>` if you don't need ownership |
| 136 | + |
| 137 | +```rust |
| 138 | +// GOOD - function signature |
| 139 | +fn process_data(data: &Arc<RecordBatch>) { ... } |
| 140 | +``` |
| 141 | + |
| 142 | +## Project Structure |
| 143 | + |
| 144 | +``` |
| 145 | +src/ |
| 146 | +├── lib.rs # Public API exports |
| 147 | +├── client.rs # SpiceClient implementation |
| 148 | +├── config.rs # Configuration and constants |
| 149 | +├── flight.rs # Arrow Flight SQL client |
| 150 | +├── tls.rs # TLS/rustls configuration |
| 151 | +└── util.rs # Utilities (backoff, retry) |
| 152 | +
|
| 153 | +tests/ |
| 154 | +└── client_test.rs # Integration tests |
| 155 | +``` |
| 156 | + |
| 157 | +## Development Workflow |
| 158 | + |
| 159 | +### VSCode Settings |
| 160 | + |
| 161 | +```json |
| 162 | +"[rust]": { "editor.defaultFormatter": "rust-lang.rust-analyzer", "editor.formatOnSave": true }, |
| 163 | +"rust-analyzer.check.command": "clippy", |
| 164 | +"rust-analyzer.check.extraArgs": ["--", "-Dwarnings", "-Dclippy::expect_used", "-Dclippy::pedantic", "-Dclippy::unwrap_used", "-Dclippy::clone_on_ref_ptr", "-Aclippy::module_name_repetitions"] |
| 165 | +``` |
| 166 | + |
| 167 | +### PR Process |
| 168 | + |
| 169 | +- Branch from `trunk`, link issue, add tests |
| 170 | +- Ensure clippy passes with no warnings |
| 171 | +- Run `cargo fmt` before committing |
| 172 | +- Add integration tests for new functionality |
| 173 | + |
| 174 | +## User-Facing Error Messages |
| 175 | + |
| 176 | +Format: `Failed to {action}: {specific_error}` |
| 177 | + |
| 178 | +1. Simple but specific language |
| 179 | +2. Provide actionable context |
| 180 | +3. Exclude internal implementation details |
| 181 | + |
| 182 | +```rust |
| 183 | +#[snafu(display("Failed to connect to Spice.ai at {endpoint}: {source}"))] |
| 184 | +ConnectionFailed { endpoint: String, source: tonic::transport::Error }, |
| 185 | +``` |
| 186 | + |
| 187 | +## Gotchas |
| 188 | + |
| 189 | +1. Don't use `stream!` macro - breaks rust-analyzer |
| 190 | +2. Workspace uses Rust edition 2024 |
| 191 | +3. Integration tests need `SCP_SPICEAI_TPCH_API_KEY` environment variable |
| 192 | +4. Local tests require a running Spice runtime at `localhost:50051` |
| 193 | +5. Use `rustls` with `aws-lc-rs` crypto provider (not ring by default) |
| 194 | +6. Arrow Flight requires proper TLS configuration for cloud endpoints |
| 195 | + |
| 196 | +## Testing |
| 197 | + |
| 198 | +### Environment Variables |
| 199 | + |
| 200 | +- `SCP_SPICEAI_TPCH_API_KEY`: API key for cloud TPCH dataset tests |
| 201 | +- Local tests connect to `localhost:50051` |
| 202 | + |
| 203 | +### Test Categories |
| 204 | + |
| 205 | +- Unit tests: `cargo test --lib` |
| 206 | +- Integration tests: `cargo test --test client_test` |
| 207 | + - Cloud tests require API key |
| 208 | + - Local tests require running Spice runtime |
| 209 | + |
| 210 | +## Key Dependencies |
| 211 | + |
| 212 | +| Crate | Purpose | |
| 213 | +| -------------- | ----------------------------- | |
| 214 | +| `arrow` | Apache Arrow arrays and types | |
| 215 | +| `arrow-flight` | Arrow Flight SQL protocol | |
| 216 | +| `tonic` | gRPC client | |
| 217 | +| `rustls` | TLS implementation | |
| 218 | +| `tokio` | Async runtime | |
| 219 | +| `snafu` | Error handling | |
| 220 | +| `tracing` | Logging | |
| 221 | + |
| 222 | +## References |
| 223 | + |
| 224 | +- [Spice.ai Docs](https://spiceai.org/docs) |
| 225 | +- [Arrow Flight SQL](https://arrow.apache.org/docs/format/FlightSql.html) |
| 226 | +- [Rust SDK on crates.io](https://crates.io/crates/spiceai) |
0 commit comments