Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions site/docs/expressions/dynamic_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,133 @@ A dynamic parameter expression includes the following properties:
|-----------------------|-------------------------------------------------------------------------------|----------|
| `type` | Specifies the expected data type of the dynamic parameter. | Yes |
| `parameter_reference` | A surrogate key used within a plan to reference a specific parameter binding. | Yes |

## Parameter Bindings in Plans

Dynamic parameters are referenced in expressions using a `parameter_reference` anchor. The actual values for these parameters are provided at the plan level using the `parameter_bindings` field in the `Plan` message.

### DynamicParameterBinding

Each binding maps a parameter anchor to a concrete literal value:

```protobuf
Plan {
parameter_bindings: [
DynamicParameterBinding {
parameter_anchor: 1
value: Literal { i32: 100 }
},
DynamicParameterBinding {
parameter_anchor: 2
value: Literal { string: "example" }
}
]
relations: [
// Relations containing DynamicParameter expressions with parameter_reference 1 and 2
]
}
```

### Properties

| Property | Description | Required |
|----------|-------------|----------|
| `parameter_anchor` | The anchor that identifies the dynamic parameter reference (must match a `parameter_reference` in a `DynamicParameter` expression) | Yes |
| `value` | The literal value assigned to the parameter at runtime. The type of the literal must match the type of the corresponding `DynamicParameter` expression. | Yes |

## Use Cases

### Parameterized Queries

Dynamic parameters enable the same plan to be used with different input values:

```sql
-- SQL query
SELECT * FROM users WHERE age > ? AND city = ?

-- Substrait plan uses DynamicParameter expressions with parameter_reference 1 and 2
-- Different executions provide different parameter_bindings
```

**Execution 1:**
```protobuf
parameter_bindings: [
{ parameter_anchor: 1, value: { i32: 25 } },
{ parameter_anchor: 2, value: { string: "New York" } }
]
```

**Execution 2:**
```protobuf
parameter_bindings: [
{ parameter_anchor: 1, value: { i32: 30 } },
{ parameter_anchor: 2, value: { string: "San Francisco" } }
]
```

### Plan Sharing Without Sensitive Data

Plans can be shared without embedding sensitive values:

```sql
-- SQL query with sensitive value
SELECT * FROM accounts WHERE ssn = '123-45-6789'

-- Substrait plan uses DynamicParameter instead of embedding SSN
-- The actual SSN value is provided separately via parameter_bindings
```

This allows the plan to be:
- Cached and reused
- Logged without exposing sensitive data
- Shared across system boundaries without security concerns

## Validation

When using dynamic parameters, consumers must validate:

1. **Type Matching**: The type of the literal in `parameter_bindings` must match the type specified in the `DynamicParameter` expression
2. **Completeness**: All `parameter_reference` values used in expressions must have corresponding bindings in `parameter_bindings`
3. **Uniqueness**: Each `parameter_anchor` should appear at most once in `parameter_bindings`

## Example

Complete example showing dynamic parameters in a filter expression:

```protobuf
Plan {
parameter_bindings: [
DynamicParameterBinding {
parameter_anchor: 100
value: Literal { i32: 42 }
}
]
relations: [
PlanRel {
root: RelRoot {
input: Rel {
filter: FilterRel {
input: Rel { read: ReadRel { ... } }
condition: Expression {
scalar_function: ScalarFunction {
function_reference: 0 // greater_than function
arguments: [
Expression { selection: FieldReference { ... } },
Expression {
dynamic_parameter: DynamicParameter {
type: { i32: { nullability: REQUIRED } }
parameter_reference: 100
}
}
]
}
}
}
}
}
}
]
}
```

In this example, the filter condition compares a field to a dynamic parameter with anchor 100, which is bound to the value 42.
11 changes: 11 additions & 0 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,17 @@ The write operator is an operator that consumes one input and writes it to stora
| Create Mode | This determines what should happen if the table already exists (ERROR/REPLACE/IGNORE) | Required only for CTAS |
| Output Mode | For views that modify a DB it is important to control which records to "return". Common default is NO_OUTPUT where we return nothing. Alternatively, we can return MODIFIED_RECORDS, that can be further manipulated by layering more rels ontop of this WriteRel (e.g., to "count how many records were updated"). This also allows to return the after-image of the change. To return before-image (or both) one can use the reference mechanisms and have multiple return values. | Required for VIEW CREATE/CREATE_OR_REPLACE/ALTER |

### CreateMode Values

When using CTAS (CREATE TABLE AS), the CreateMode determines what happens if the table already exists:

| Mode | Value | Description |
|---------------------------|-------|------------------------------------------------------------------------------|
| CREATE_MODE_UNSPECIFIED | 0 | Behavior is unspecified and should not be used |
| CREATE_MODE_APPEND_IF_EXISTS | 1 | If the table exists, append the new data to it |
| CREATE_MODE_REPLACE_IF_EXISTS | 2 | If the table exists, replace it entirely (equivalent to DROP + CREATE) |
| CREATE_MODE_IGNORE_IF_EXISTS | 3 | If the table exists, do nothing (equivalent to "IF NOT EXISTS") |
| CREATE_MODE_ERROR_IF_EXISTS | 4 | If the table exists, throw an error (default behavior) |

### Write Definition Types

Expand Down
38 changes: 37 additions & 1 deletion site/docs/relations/physical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,48 @@ The hash equijoin join operator will build a hash table out of one input (defaul
|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------|
| Left Input | A relational input. | Required |
| Right Input | A relational input. | Required |
| Build Input | Specifies which input is the `Build`. | Optional, defaults to build `Right`, probe `Left`. |
| Build Input | Specifies which input side to build the hash table from. See [Build Input Details](#build-input-details) below. | Optional, defaults to `BUILD_INPUT_RIGHT` |
| Left Keys | References to the fields to join on in the left input. | Required |
| Right Keys | References to the fields to join on in the right input. | Required |
| Post Join Predicate | An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. | Optional, defaults true. |
| Join Type | One of the join types defined in the Join operator. | Required |

### Build Input Details

The `build_input` field specifies which side of the join to use for building the hash table. The hash join algorithm consists of two phases:

1. **Build Phase**: Read one input and build a hash table on the join keys
2. **Probe Phase**: Read the other input and probe the hash table for matching records

The choice of build side can significantly impact performance:

| Value | Description | When to Use |
|-------|-------------|-------------|
| BUILD_INPUT_UNSPECIFIED | Behavior is unspecified; consumers may choose either side | Default behavior, typically builds on right |
| BUILD_INPUT_LEFT | Build hash table from left input, probe with right | When left input is smaller than right |
| BUILD_INPUT_RIGHT | Build hash table from right input, probe with left | When right input is smaller than left (common case) |

**Performance Considerations:**

* The build side should typically be the smaller input to minimize memory usage
* The build side must fit entirely in memory (or be spilled to disk)
* For very large builds, memory pressure may cause degraded performance
* Some join types have natural build side preferences:
- **Left Semi/Anti Join**: Build on right is more efficient
- **Right Semi/Anti Join**: Build on left is more efficient
- **Inner/Outer Joins**: Build on smaller side

**Example:**
```
HashJoinRel {
left: scan_large_table // 1M rows
right: scan_small_table // 10K rows
build_input: BUILD_INPUT_RIGHT // Build hash table from smaller right side
keys: [...]
type: INNER
}
```


## NLJ (Nested Loop Join) Operator

Expand Down
1 change: 1 addition & 0 deletions site/docs/spec/_config
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ arrange:
- specification.md
- technology_principles.md
- extending.md
- dialects.md
Loading
Loading